11. Toolkit of useful scripts

As you have seen, VECTRI has a non-smart interface, in that it expects the provision of datasets all on the same grid. In order to help you set up your input, we have put a number of simple scripts together in the utils sub-directory to perform certain common tasks of data pre or post-processing.

Warning

Note that all scripts are provided on an “as-is”, non-maintained basis, and many of them may need editing or updating to make them work for your particular purpose, or may break without warning. The VECTRI team is very small and we are unable to provide a help-desk type service to respond to enquiries about this script suite, but we will try to fix breakages when reported as soon as we can.

11.1. Getting population data

script name: download_population_data

Note

In order to run this script you will to ensure you have gdal_translate and nco installed. gdal_translate converts the file from tif to netcdf format and nco is used to change the population field name to something more appropriate than band.

Just run the following script to get population data on your local directory. Please don’t run it in your $VECTRI repository

$VECTRI/utils/download_population_data

Warning

This script gets data from the Worldpop ftp servers. Such methods go out of date notoriously quickly as data providers move their datasets around. If you find this script is broken as a result of this, please get in touch, or better still find a solution and email it to us :-)

Check out the World pop website for other datasets on the country level that you can download and convert in a similar way. Another useful resource for population data is the Grump v4 population of the world dataset.

11.2. Getting ERA5 driver files

script name: vectri_era5_driver.bash

Warning

September 2024: This script is currently broken due to the update to the CDS system, which changed the retrieval mechanism in a non-backwards compatible way and also abandoned CF-compliance for the time dimension (ECMWF, what were you thinking?!) We are currently working on a fix.

11.3. Converting CSV station data to netcdf

script name: vectri_point_input

One of the most common questions that arises with VECTRI is how to drive it using CSV or text file climate data input for a point. In the early days of VECTRI the wrapper script had an option to directly read text files as an alternative to netcdf, but we quickly abandoned this as it was impossible to maintain for the infinite possible choices of column orders and date formats and so on (indeed, this is the reason netcdf and other self-describing data formats were invented!). While it is no longer possible to read text files directly, we devised a bash shell script to try and conveniently convert any text file to netcdf. This was fixed and upgraded in v1.11.4 in order to make it more portable across systems.

Let’s say we have a datafile, whose first few lines, revealed with head myfile.txt look like this:

example CSV/text file

year

mm

dd

rain

tmax

tmin

2011

1

1

0

29.4

15.5

2011

1

2

10.2

31.2

15.7

2011

1

3

4.5

30.2

15.4

and so on. First of all, note that this particular file has no information about units, we are left to assume that rainfall is in mm/day and temperature in degree C; again this is the advantage of netcdf self-describing files with their metadata for each field for things like units.

Anyway, what do we need to do with our script? Essentially, we need it to discard N header lines (just one in this case) and then extract the date, temperature and rainfall information from the appropriate columns and save this to a netcdf file. In order to do this we need to use a number of options which pass this information to the script. Let’s introduce this now in the following table. You can get this information by typing

vectri_point_input --help
options

option

definition

-h –help

print this usage message

–lat

latitude of station (default=10)

–lon

longitude of station (default=0)

–temp

filename of temperature data OR specified constant value

–rain

filename of rainfall data OR specified constant value

–pop

filename of population data OR specified constant value (default 100 per km**2)

–climfile

OUTPUT, filename of vectri climate file (default is vectri_clim.nc)

–datafile

OUTPUT, filename of vectri data file (default is vectri_data.nc)

–cdate

column of date data - in format overrides next 3 options

–cyyyy

column of year in format yyyy

–cmm

column of month in format mm

–cdd

column of day in format dd

–ctmin

column of tmin (or tmean) data

–ctmax

column of tmax (or tmean) data

–crain

column of rain data

–cpop

column of population data

–tmiss

missing value for temperature data

–rmiss

missing value for rainfall data

–nday

number of days in file if constant values are used (start date is set to arbitrary value

–nhead

number of header lines to ignore in files, global setting (default 0)

–nheadt

number of header lines to ignore in temp file (defaults to nhead if not specified)

–nheadr

number of header lines to ignore in rain file (defaults to nhead if not specified)

–nheadp

number of header lines to ignore in pop file (defaults to nhead if not specified)

Most of the entries are straightforward and self-explanatory, so let’s just demonstrate how we would convert the example file above:

./vectri_point_input --cyyyy=1 --cmm=2 --cdd=3 --crain=4 --ctmax=5 --ctmin=6  --pop=100 --temp=data.txt --rain=data.txt --nhead=1

Here we introduce the options in the same order of the columns to avoid confusion but the options can be in any order you like. Note that we need to specify the input file twice, once for rainfall and once for the temperature. This provides flexibility in the case that you have separate input files for the two fields.

FAQ

My rain file and temperature file have a different number of header lines! Note that nhead is a global header option, but if you are running with two input files, you might need to specify separate header lengths using nheadr and nheadt if each file has different header lengths.

Can I also have time-evolving population data? Yes! Here we are specifying a constant value for population (units are km -2), rather than a file name. If you have time-evolving population data, you can run with a population file name, but note that the data needs to be daily! Thus if you have annual data, you will need to separately convert it to netcdf (see tip below) and then use cdo to interpolate it to daily. In most use-cases, a single time-invariant population value will be used.

What if my input file already has the daily mean rather than tmin and tmax? Not a problem, simply point both ctmin and ctmax to this same column. The script simply takes the mean of ctmin and ctmax so this will write the same value to the netcdf file.

Notice that this script can not handle other optional inputs yet, such as bednet distributions or soil textures. These will be incorporated at a later date asap.

Tip

Note that the year, month and day column information is assumed to be the same for all input files (temperature, rainfall and population). For the moment, if you have different orders you may need to run the script 3 times and use this fudge: When you want to process temperature, you set the columns for the temperature file and call with rain=0. This produces a climate file with your temperature data inside and zero rainfall. You then repeat this “inverted”, that is specify columns for rainfall and set temp=0. Beware that when you run vectri you will then need to run with the two input files, vectri --c temp_file --p precip_file and beware not to mix them up!

11.4. Creating a constant climate driver file

script name: vectri_point_input

In the previous section, we saw how the temperature, rainfall and population information could all be set to a constant value by simply specifying this value instead of an input text filename. This can be used for generating idealized input data, i.e. with changing rainfall but constant temperature to investigate the climate drivers separately.

One can even create a file with all inputs constant to see how the EIR responds in the model as a function of temperature, rainfall and population. In this case you need to specify the length of your desired driver file using the nday=X option. Note that the dates in the output file are set to start in the year 2000 but are obviously completely arbitrary! Here is how to generate a driver file of 1000 days, with temperature of 27 o C, rain of 5 mm/day and a separate population data file with a density set to 200 km -2.

vectri_point_input --temp=27 --rain=5 --pop=200 --nday=1000

11.5. Running large ensembles in parallel

script name: burst_submit.py

For advanced users with knowledge of python.

This script is not a working script, but demonstrates how you can run large vectri ensembles efficiently in parallel on a shared memory architecture. The example shows how various climate models and RCP climate scenarios can be combined into a long vector of simulation experiments. Then on a shared memory node with N cpus, the script uses python’s starmap from the multiprocessing package to farm out N jobs. As soon as one job completes, the next job in the vector is submitted. Each simulation is written to its own directory to avoid conflicts with input and output files.