This tutorial shows you how to go from vfld / vobs files to a full ensemble verification using harp packages. To make use of the data used in the tutorial you should be on ecgate and a member of the hirald group.

Part 1 - Preparing the data

Convert ensemble forecasts in vfld format to sqlite

Harmonie prepares data for verification in the vfld ascii format. One file is produced for each lead time and member for each forecast cycle. Since it takes time to read in multiple text files, we don’t want to have to do this every time we run a verification. Therefore the first step for harp is to convert these files to sqlite format. SQLite is a portable file based database format that used SQL queries to access and filter the data - this is much faster than reading text files.

For the purposes of this tutorial we will take data from 3 models - model1, model2 and model3. The vfld files are stored under /scratch/ms/no/fa1m/vfld on ecgate. model1 has 5 members (numbered 0-4), and model2 and model3 have 8 members (numbered 0-7). The available forecasts are from 00:00 UTC on the 30th and 31st May 2017. To read the data we need to attach the harpIO package, and then set some metadata.

library(harpIO)
vfld_path       <- "/scratch/ms/no/fa1m/vfld"
first_fcst      <- 2017053000   
last_fcst       <- 2017053100
fcst_freq       <- "1d"
fcst_models     <- c("model1", "model2", "model3")

In the above code, we are setting the path to the vfld files in vfld_path, the first and last forecasts we want to read in in first_fcst and last_fcst using a YYYYMMDDhh format, the frequency of the forecast runs in fcst_freq - here “1d” means every 1 day, and finally the names of the forecast models in fcst_model.

There are still a couple of things we need to set before we can read the data. We need to know which members we want to read for each model and which lead times.

fcst_members    <- list(model1 = seq(0, 4), model2 = seq(0, 7), model3 = seq(0, 7))
fcst_lead_times <- seq(0, 36, 3)

We are using the R function \(seq\) to generate a sequence of numbers. For the ensemble members for each model we generate the sequence 0, 1, 2, 3, 4 for model1 and 0, 1, 2, 3, 4, 5, 6, 7 for model2 and model3. fcst_members must be set as a list with the number of members for each model. The exception is when there is only 1 model - then a single sequence of numbers can be set - e.g. if we only wanted model1 we would set fcst_members <- seq(0, 4). We are setting the lead times in fcst_lead_times as a sequence of numbers from 0 to 36 at every third number, i.e. 0, 3, 6, …., 36.

Now we have set all the metadata, we are ready to read the data in using the read_eps_interpolate function from the harpIO package.

fcst_data <- read_eps_interpolate(
  start_date  = first_fcst,
  end_date    = last_fcst,
  eps_model   = fcst_models,
  parameter   = NULL,
  lead_time   = fcst_lead_times,
  members_in  = fcst_members,
  by          = fcst_freq,
  file_path   = vfld_path,
  return_data = TRUE
)

Of course we needn’t have set the arguments to read_eps_interpolate as variables, but it helps when we want to use the same information later.

By default read_eps_interpolate does not return any data - this is because it can be quite a lot of data to store - so we set return_data = TRUE. We also set parameter = NULL - this means that all parameters in the vfld files are read in.

You may have noticed some warnings about files not found. This is not a problem at this stage, but will be dealt with later. However, we can now take a look at the data using the R function head to show the first 6 rows.

head(fcst_data)
## # A tibble: 6 x 18
##   eps_model sub_model fcdate lead_time member members_out validdate   SID
##   <chr>     <chr>      <dbl>     <dbl> <chr>  <chr>           <dbl> <dbl>
## 1 model1    model1    1.50e9         0 mbr000 mbr000         1.50e9  1001
## 2 model1    model1    1.50e9         0 mbr000 mbr000         1.50e9  1010
## 3 model1    model1    1.50e9         0 mbr000 mbr000         1.50e9  1014
## 4 model1    model1    1.50e9         0 mbr000 mbr000         1.50e9  1015
## 5 model1    model1    1.50e9         0 mbr000 mbr000         1.50e9  1018
## 6 model1    model1    1.50e9         0 mbr000 mbr000         1.50e9  1023
## # … with 10 more variables: lat <dbl>, lon <dbl>, model_elevation <dbl>,
## #   lat.temp <dbl>, lon.temp <dbl>, model_elevation.temp <dbl>,
## #   elev <dbl>, name <chr>, parameter <chr>, forecast <dbl>

There are a few things to notice here. The fcdate column and validdate column have some large looking numbers - the fcdate column is the forecast start time, and the validdate column is the time for which the forecast is valid. The large numbers are becuase harp stores time in unix epoch format - that is the number of seconds since 00:00 UTC 1st January 1970. Also there is a sub_model column - this is used in the case of multi model ensembles. There is also a members_out column - it is possible for you to change the numbers in the call to read_eps_interpolate, but that is a more advanced topic. Finally there are column names ending with .temp - these refer to TEMP stations, or stations that have veritical profile data.

What we actually want to do is convert these data to sqlite format - we do this at the same time as reading the data by setting the argument sqlite_path to the path we wish to write to. So let’s run read_eps_interpolate again, this time setting our output path, and also only doing it for 2m temperature, which we can do by setting parameter = “T2m”. We don’t include an argument for return_data this time - the defulat value is FALSE and we don’t want to return the data to our session.

read_eps_interpolate(
  start_date  = first_fcst,
  end_date    = last_fcst,
  eps_model   = fcst_models,
  parameter   = "T2m",
  lead_time   = fcst_lead_times,
  members_in  = fcst_members,
  by          = fcst_freq,
  file_path   = vfld_path,
  sqlite_path = "/scratch/ms/no/fa1m/harp_output/FCTABLE"
)

When you run the above make sure you set sqlite_path a path you have permission to write to. If the full path doesn’t exist, the directories will auomatically be created. In this example, the lowest level directory is called FCTABLE. This is because the output files of this process are known as FCTABLE files and it is recommended to use this structure.

If you look in the directory you set for sqlite_path you will see directories for model1, model2 and model3. Under these directories are directories for the year (2017) and the month (05), where you will find the data files. You will see that there is one file for each lead time. This structure and filename are the defualt behaviour, but can be changed if desired using the sqlite_tempate argument. You will also see that there files for T2m and T2m_uncorrected. By default 2m tempearture is height corrected based on the elavtion difference between the model orography and the actual orography. See the help for read_eps_interpolate for how to change this behaviour.

Convert observations in vobs format to sqlite

Now we have the forecasts in sqlite format, we have to do the same for the observations. The process here is a little bit simpler, though we have to make sure we have enough observations to verify against. So lets keep the start date, but add 2 days to the end date making it 2 June 2017. We should also make sure we get until the end of 2 June 2017 so we give the hour as well, making the end date 2017060221. Now we use the read_obs_convert function to read the vobs format observations and convert them to sqlite - the function will convert all parameters inside the vobs file andoutput them to a single sqlite file.

read_obs_convert(
  start_date  = first_fcst,
  end_date    = 2017060221,
  obs_path    = "/scratch/ms/dk/nhz/oprint/OBS4",
  sqlite_path = "/scratch/ms/no/fa1m/harp_output/OBSTABLE"
)

We know have FCTABLE files and an OBSTABLE file. Note that like the FCTABLE files, the OBSTABLE filename can be changed using the sqlite_template argument.

In Part 2, we will run the verification…

^Index Part 2 >