11. Toolkit of useful scripts
As you have seen, VECTRI has a non-smart interface, in that it expects the provision of datasets all on the same grid. In order to help you set up your input, we have put a number of simple scripts together in the utils
sub-directory to perform certain common tasks of data pre or post-processing.
Warning
Note that all scripts are provided on an “as-is”, non-maintained basis, and many of them may need editing or updating to make them work for your particular purpose, or may break without warning. The VECTRI team is very small and we are unable to provide a help-desk type service to respond to enquiries about this script suite, but we will try to fix breakages when reported as soon as we can.
11.1. Getting population data
script name: download_population_data
Note
In order to run this script you will to ensure you have gdal_translate and nco
installed. gdal_translate
converts the file from tif to netcdf format and nco
is used to change the population field name to something more appropriate than band.
Just run the following script to get population data on your local directory. Please don’t run it in your $VECTRI repository
$VECTRI/utils/download_population_data
Warning
This script gets data from the Worldpop ftp servers. Such methods go out of date notoriously quickly as data providers move their datasets around. If you find this script is broken as a result of this, please get in touch, or better still find a solution and email it to us :-)
Check out the World pop website for other datasets on the country level that you can download and convert in a similar way. Another useful resource for population data is the Grump v4 population of the world dataset.
11.2. Getting ERA5 driver files
script name: vectri_era5_driver.bash
Warning
September 2024: This script is currently broken due to the update to the CDS system, which changed the retrieval mechanism in a non-backwards compatible way and also abandoned CF-compliance for the time dimension (ECMWF, what were you thinking?!) We are currently working on a fix.
11.3. Converting CSV station data to netcdf
script name: vectri_point_input
One of the most common questions that arises with VECTRI is how to drive it using CSV or text file climate data input for a point. In the early days of VECTRI the wrapper script had an option to directly read text files as an alternative to netcdf, but we quickly abandoned this as it was impossible to maintain for the infinite possible choices of column orders and date formats and so on (indeed, this is the reason netcdf and other self-describing data formats were invented!). While it is no longer possible to read text files directly, we devised a bash shell script to try and conveniently convert any text file to netcdf. This was fixed and upgraded in v1.11.4 in order to make it more portable across systems.
Let’s say we have a datafile, whose first few lines, revealed with head myfile.txt
look like this:
year |
mm |
dd |
rain |
tmax |
tmin |
---|---|---|---|---|---|
2011 |
1 |
1 |
0 |
29.4 |
15.5 |
2011 |
1 |
2 |
10.2 |
31.2 |
15.7 |
2011 |
1 |
3 |
4.5 |
30.2 |
15.4 |
and so on. First of all, note that this particular file has no information about units, we are left to assume that rainfall is in mm/day and temperature in degree C; again this is the advantage of netcdf self-describing files with their metadata for each field for things like units.
Anyway, what do we need to do with our script? Essentially, we need it to discard N header lines (just one in this case) and then extract the date, temperature and rainfall information from the appropriate columns and save this to a netcdf file. In order to do this we need to use a number of options which pass this information to the script. Let’s introduce this now in the following table. You can get this information by typing
vectri_point_input --help
option |
definition |
---|---|
-h –help |
|
–lat |
|
–lon |
|
–temp |
|
–rain |
|
–pop |
|
–climfile |
|
–datafile |
|
–cdate |
|
–cyyyy |
|
–cmm |
|
–cdd |
|
–ctmin |
|
–ctmax |
|
–crain |
|
–cpop |
|
–tmiss |
|
–rmiss |
|
–nday |
|
–nhead |
|
–nheadt |
|
–nheadr |
|
–nheadp |
|
Most of the entries are straightforward and self-explanatory, so let’s just demonstrate how we would convert the example file above:
./vectri_point_input --cyyyy=1 --cmm=2 --cdd=3 --crain=4 --ctmax=5 --ctmin=6 --pop=100 --temp=data.txt --rain=data.txt --nhead=1
Here we introduce the options in the same order of the columns to avoid confusion but the options can be in any order you like. Note that we need to specify the input file twice, once for rainfall and once for the temperature. This provides flexibility in the case that you have separate input files for the two fields.
FAQ
My rain file and temperature file have a different number of header lines! Note that nhead
is a global header option, but if you are running with two input files, you might need to specify separate header lengths using nheadr
and nheadt
if each file has different header lengths.
Can I also have time-evolving population data?
Yes! Here we are specifying a constant value for population (units are km -2), rather than a file name. If you have time-evolving population data, you can run with a population file name, but note that the data needs to be daily! Thus if you have annual data, you will need to separately convert it to netcdf (see tip below) and then use cdo
to interpolate it to daily. In most use-cases, a single time-invariant population value will be used.
What if my input file already has the daily mean rather than tmin and tmax? Not a problem, simply point both ctmin
and ctmax
to this same column. The script simply takes the mean of ctmin and ctmax so this will write the same value to the netcdf file.
Notice that this script can not handle other optional inputs yet, such as bednet distributions or soil textures. These will be incorporated at a later date asap.
Tip
Note that the year, month and day column information is assumed to be the same for all input files (temperature, rainfall and population). For the moment, if you have different orders you may need to run the script 3 times and use this fudge: When you want to process temperature, you set the columns for the temperature file and call with rain=0. This produces a climate file with your temperature data inside and zero rainfall. You then repeat this “inverted”, that is specify columns for rainfall and set temp=0
. Beware that when you run vectri you will then need to run with the two input files, vectri --c temp_file --p precip_file
and beware not to mix them up!
11.4. Creating a constant climate driver file
script name: vectri_point_input
In the previous section, we saw how the temperature, rainfall and population information could all be set to a constant value by simply specifying this value instead of an input text filename. This can be used for generating idealized input data, i.e. with changing rainfall but constant temperature to investigate the climate drivers separately.
One can even create a file with all inputs constant to see how the EIR responds in the model as a function of temperature, rainfall and population. In this case you need to specify the length of your desired driver file using the nday=X
option. Note that the dates in the output file are set to start in the year 2000 but are obviously completely arbitrary! Here is how to generate a driver file of 1000 days, with temperature of 27 o C, rain of 5 mm/day and a separate population data file with a density set to 200 km -2.
vectri_point_input --temp=27 --rain=5 --pop=200 --nday=1000
11.5. Running large ensembles in parallel
script name: burst_submit.py
For advanced users with knowledge of python.
This script is not a working script, but demonstrates how you can run large vectri ensembles efficiently in parallel on a shared memory architecture. The example shows how various climate models and RCP climate scenarios can be combined into a long vector of simulation experiments. Then on a shared memory node with N cpus, the script uses python’s starmap from the multiprocessing package to farm out N jobs. As soon as one job completes, the next job in the vector is submitted. Each simulation is written to its own directory to avoid conflicts with input and output files.