An introduction to harp

Andrew Singleton

SUV verifikasjonsmøte 19 March 2021

What is harp?

  • A “framework” for working with meteorological / climatological data in R
  • Comprises several packages:
    • harpIO - for reading / writing data
    • harpPoint - for point verification
    • harpSpatial - for spatial verification
    • harpVis - for visualization
    • (harpMethods) - for underlying classes, methods and common functionality
    • harp - meta package for functionality that includes more than 1 harp pacakge
    • harpData - for example data

Philisophy

  • harp is built with “tidy” data principles in mind
  • No matter where the data come from or what they represent the structure is (nearly) always the same
    • Data are stored as data frames
    • Every column is a “variable”
    • Every row is an “observation”
    • Every cell is a “value”
  • Designed to integrate with the “tidyverse” in R
    • data wrangling with dplyr and tidyr verbs
    • plotting with ggplot2 (where appropriate)
    • functional programming with purrr

Installation

  • The harp code is stored on Github under “harphub”
  • Can be installed in R with the following:
install.packages("remotes")
library(remotes)
install_github("harphub/harp")
  • Installation can take some time as harp has many (too many?) dependencies

Reading forecast data

read_forecast() can be used to read forecasts from any file format.

library(harp)

read_forecast(
  start_date,
  end_date,
  fcst_model,
  (parameter),
  (file_path),
  (file_template),
  (file_format_opts),
  (lead_time),
  (members)
)

File name templates

  • File names for read_forecast() are generated from templates. Dynamic parts of the file name are in { }
file_path  <- "/my/path"
fcst_model <- "MEPS"
start_date <- 2021031900 
end_date   <- 2021031912
by         <- "6h"
lead_time  <- seq(0, 3, 3)
members    <- c(0, 2)
template   <- "{fcst_model}/{YYYY}/{MM}/{DD}/{fcst_model}_fc{YYYY}{MM}{DD}T{HH}Z_mbr{MBR3}+{LDT3}"

File name templates

  • File names for read_forecast() are generated from templates. Dynamic parts of the file name are in { }
file_path  <- "/my/path"
fcst_model <- "MEPS"
start_date <- 2021031900 
end_date   <- 2021031912
by         <- "6h"
lead_time  <- seq(0, 3, 3)
members    <- c(0, 2)
template   <- "{fcst_model}/{YYYY}/{MM}/{DD}/{fcst_model}_fc{YYYY}{MM}{DD}T{HH}Z_mbr{MBR3}+{LDT3}"
## /my/path/MEPS/2021/03/19/MEPS_fc20210319T00Z_mbr000+000
## /my/path/MEPS/2021/03/19/MEPS_fc20210319T00Z_mbr002+000
## /my/path/MEPS/2021/03/19/MEPS_fc20210319T00Z_mbr000+003
## /my/path/MEPS/2021/03/19/MEPS_fc20210319T00Z_mbr002+003
## /my/path/MEPS/2021/03/19/MEPS_fc20210319T06Z_mbr000+000
## /my/path/MEPS/2021/03/19/MEPS_fc20210319T06Z_mbr002+000
## /my/path/MEPS/2021/03/19/MEPS_fc20210319T06Z_mbr000+003
## /my/path/MEPS/2021/03/19/MEPS_fc20210319T06Z_mbr002+003

Built in templates

  • For “known” file name structures there are built in templates
show_file_templates()
## # A tibble: 33 x 2
##    template_name        template                                                
##    <chr>                <chr>                                                   
##  1 arome_arctic_extrac… /lustre/storeB/immutable/archive/projects/metproduction…
##  2 arome_arctic_full    /lustre/storeB/immutable/archive/projects/metproduction…
##  3 arome_arctic_sfx     /lustre/storeB/immutable/archive/projects/metproduction…
##  4 fctable              {file_path}/{fcst_model}/{YYYY}/{MM}/FCTABLE_{parameter…
##  5 fctable_det          {file_path}/{det_model}/{YYYY}/{MM}/FCTABLE_{parameter}…
##  6 fctable_eps          {file_path}/{eps_model}/{YYYY}/{MM}/FCTABLE_{parameter}…
##  7 fctable_eps_all_cyc… {file_path}/{eps_model}/{YYYY}/{MM}/FCTABLE_{parameter}…
##  8 fctable_eps_all_lea… {file_path}/{eps_model}/{YYYY}/{MM}/FCTABLE_{parameter}…
##  9 glameps_grib         {file_path}/{eps_model}/{sub_model}/{YYYY}/{MM}/{DD}/{H…
## 10 harmoneps_grib       {file_path}/{YYYY}/{MM}/{DD}/{HH}/mbr{MBR3}/fc{YYYY}{MM…
## 11 harmoneps_grib_fp    {file_path}/{YYYY}/{MM}/{DD}/{HH}/mbr{MBR3}/fc{YYYY}{MM…
## 12 harmoneps_grib_sfx   {file_path}/{YYYY}/{MM}/{DD}/{HH}/mbr{MBR3}/fc{YYYY}{MM…
## 13 harmonie_grib        {file_path}/{YYYY}/{MM}/{DD}/{HH}/fc{YYYY}{MM}{DD}{HH}+…
## 14 harmonie_grib_fp     {file_path}/{YYYY}/{MM}/{DD}/{HH}/fc{YYYY}{MM}{DD}{HH}+…
## 15 harmonie_grib_sfx    {file_path}/{YYYY}/{MM}/{DD}/{HH}/fc{YYYY}{MM}{DD}{HH}+…
## 16 meps_cntrl_extracted /lustre/storeB/immutable/archive/projects/metproduction…
## 17 meps_cntrl_sfx       /lustre/storeB/immutable/archive/projects/metproduction…
## 18 meps_det             /lustre/storeB/immutable/archive/projects/metproduction…
## 19 meps_extracted       /lustre/storeB/immutable/archive/projects/metproduction…
## 20 meps_full            /lustre/storeB/immutable/archive/projects/metproduction…
## 21 meps_lagged_6h_subs… /lustre/storeB/immutable/archive/projects/metproduction…
## 22 meps_sfx             /lustre/storeB/immutable/archive/projects/metproduction…
## 23 meps_subset          /lustre/storeB/immutable/archive/projects/metproduction…
## 24 obstable             {file_path}/OBSTABLE_{YYYY}.sqlite                      
## 25 vfld                 {file_path}/{fcst_model}/vfld{fcst_model}{YYYY}{MM}{DD}…
## 26 vfld_det             {file_path}/{det_model}/vfld{det_model}{YYYY}{MM}{DD}{H…
## 27 vfld_det_noexp       {file_path}/{det_model}/vfld{YYYY}{MM}{DD}{HH}{LDT2}    
## 28 vfld_eps             {file_path}/{sub_model}/vfld{sub_model}mbr{MBR3}{YYYY}{…
## 29 vfld_eps_noexp       {file_path}/{sub_model}/vfldmbr{MBR3}{YYYY}{MM}{DD}{HH}…
## 30 vfld_multimodel      {file_path}/{sub_model}/vfld{sub_model}mbr{MBR3}{YYYY}{…
## 31 vfld_multimodel_noe… {file_path}/{sub_model}/vfldmbr{MBR3}{YYYY}{MM}{DD}{HH}…
## 32 vfld_noexp           {file_path}/{fcst_model}/vfldmbr{MBR3}{YYYY}{MM}{DD}{HH…
## 33 vobs                 {file_path}/vobs{YYYY}{MM}{DD}{HH}

Built in templates

  • To show a full template, pass the row from the output to show_file_templates().
show_file_templates(21)
## 
## template_name:
##  meps_lagged_6h_subset 
##  
## template:
##  /lustre/storeB/immutable/archive/projects/metproduction/MEPS/{YYYY}/{MM}/{DD}/meps_lagged_6_h_subset_2_5km_{YYYY}{MM}{DD}T{HH}Z.nc

The harp_fcst object

  • The output of read_forecast() is a harp_fcst object - named list of data frames - one for each fcst_model
fcst <- read_forecast(
  2021031200,
  2021031212,
  "MEPS",
  "Pcp",
  lead_time        = seq(12, 24, 12),
  members          = c(0, 10, 20),
  file_template    = "meps_lagged_6h_subset",
  file_format_opts = netcdf_opts("met_norway_eps"),
  show_progress    = TRUE, 
  return_data      = TRUE
)

The harp_fcst object

## ● MEPS
## # A tibble: 6 x 11
##   fcdate              lead_time parameter validdate           level_type level
##   <dttm>                  <dbl> <chr>     <dttm>              <chr>      <dbl>
## 1 2021-03-12 00:00:00        12 Pcp       2021-03-12 12:00:00 unknown        0
## 2 2021-03-12 00:00:00        24 Pcp       2021-03-13 00:00:00 unknown        0
## 3 2021-03-12 06:00:00        12 Pcp       2021-03-12 18:00:00 unknown        0
## 4 2021-03-12 06:00:00        24 Pcp       2021-03-13 06:00:00 unknown        0
## 5 2021-03-12 12:00:00        12 Pcp       2021-03-13 00:00:00 unknown        0
## 6 2021-03-12 12:00:00        24 Pcp       2021-03-13 12:00:00 unknown        0
## # … with 5 more variables: units <chr>, fcst_cycle <chr>,
## #   MEPS_mbr000 <geolist>, MEPS_mbr010 <geolist>, MEPS_mbr020 <geolist>

dplyr verbs on harp_fcst objects

fcst %>% 
  select(fcdate, lead_time, starts_with("MEPS"))
## ● MEPS
## # A tibble: 6 x 5
##   fcdate              lead_time MEPS_mbr000      MEPS_mbr010     MEPS_mbr020    
##   <dttm>                  <dbl> <geolist>        <geolist>       <geolist>      
## 1 2021-03-12 00:00:00        12 <geofield [949 … <geofield [949… <geofield [949…
## 2 2021-03-12 00:00:00        24 <geofield [949 … <geofield [949… <geofield [949…
## 3 2021-03-12 06:00:00        12 <geofield [949 … <geofield [949… <geofield [949…
## 4 2021-03-12 06:00:00        24 <geofield [949 … <geofield [949… <geofield [949…
## 5 2021-03-12 12:00:00        12 <geofield [949 … <geofield [949… <geofield [949…
## 6 2021-03-12 12:00:00        24 <geofield [949 … <geofield [949… <geofield [949…
fcst %>% 
  select(fcdate, lead_time, MEPS_mbr000) %>% 
  filter(lead_time == 12)
## ● MEPS
## # A tibble: 3 x 3
##   fcdate              lead_time MEPS_mbr000             
##   <dttm>                  <dbl> <geolist>               
## 1 2021-03-12 00:00:00        12 <geofield [949 × 1,069]>
## 2 2021-03-12 06:00:00        12 <geofield [949 × 1,069]>
## 3 2021-03-12 12:00:00        12 <geofield [949 × 1,069]>

Plotting a field

plot_field(
  fcst, fcst_model = "MEPS", plot_col = MEPS_mbr000, 
  fcdate = 2021031200, lead_time = 12
)

Plotting a field

plot_field(
  fcst, fcst_model = "MEPS", plot_col = MEPS_mbr000, 
  fcdate = 2021031200, lead_time = 12,
  palette = c("transparent", viridis::viridis(255)),
  breaks = c(0, 0.125, 0.25, 0.5, 1, 2, 4, 8, 16, 32, 64)
)

Transformations

  • read_forecast() can transform the data by interpolating to points, to a new grid (and / or projection), or to a vertical cross section (soonish!)
fcst <- read_forecast(
  2021031200,
  2021031212,
  c("MEPS", "AROME_ARCTIC"),
  "T2m",
  lead_time        = seq(0, 12, 3),
  file_template    = list(
    MEPS         = "meps_det", 
    AROME_ARCTIC = "arome_arctic_extracted"
  ),
  file_format_opts = netcdf_opts("met_norway_det"),
  transformation   = "interpolate", 
  show_progress    = TRUE, 
  return_data      = TRUE
)

Interpolated data

fcst %>% 
  select(fcdate, lead_time, SID, ends_with("_det"))
## ● AROME_ARCTIC
## # A tibble: 3,630 x 4
##    fcdate              lead_time   SID AROME_ARCTIC_det
##    <dttm>                  <dbl> <int>            <dbl>
##  1 2021-03-12 00:00:00         0  1001             269.
##  2 2021-03-12 00:00:00         0  1010             273.
##  3 2021-03-12 00:00:00         0  1014             271.
##  4 2021-03-12 00:00:00         0  1015             272.
##  5 2021-03-12 00:00:00         0  1018             268.
##  6 2021-03-12 00:00:00         0  1023             267.
##  7 2021-03-12 00:00:00         0  1025             271.
##  8 2021-03-12 00:00:00         0  1026             270.
##  9 2021-03-12 00:00:00         0  1027             271.
## 10 2021-03-12 00:00:00         0  1028             266.
## # … with 3,620 more rows
## 
## ● MEPS
## # A tibble: 20,895 x 4
##    fcdate              lead_time   SID MEPS_det
##    <dttm>                  <dbl> <int>    <dbl>
##  1 2021-03-12 00:00:00         0  1001     271.
##  2 2021-03-12 00:00:00         0  1010     273.
##  3 2021-03-12 00:00:00         0  1014     271.
##  4 2021-03-12 00:00:00         0  1015     272.
##  5 2021-03-12 00:00:00         0  1018     269.
##  6 2021-03-12 00:00:00         0  1023     267.
##  7 2021-03-12 00:00:00         0  1025     270.
##  8 2021-03-12 00:00:00         0  1026     270.
##  9 2021-03-12 00:00:00         0  1027     270.
## 10 2021-03-12 00:00:00         0  1028     266.
## # … with 20,885 more rows

Saving interpolated data

  • It takes a long time to read and interpolate full model fields, so the interpolated data can be saved to SQLite files
  • Simply add output_file_opts = sqlite_opts(path = <path>)
read_forecast(
  2021031200,
  2021031212,
  c("MEPS", "AROME_ARCTIC"),
  "T2m",
  lead_time        = seq(0, 12, 3),
  file_template    = list(
    MEPS         = "meps_det", 
    AROME_ARCTIC = "arome_arctic_extracted"
  ),
  file_format_opts = netcdf_opts("met_norway_det"),
  transformation   = "interpolate", 
  output_file_opts = sqlite_opts(path = "/path/to/output"),
  show_progress    = TRUE
)

Reading the interpolated forecasts

  • The function read_point_forecast() is used to read interpolated data
fcst <- read_point_forecast(
  2021031200, 
  2021031212, 
  fcst_model = c("MEPS", "AROME_ARCTIC"), 
  fcst_type  = "det", 
  parameter  = "T2m", 
  file_path  = "suv_verifikasjonsmøte_202103"
)

Reading the interpolated forecasts

fcst %>% 
  select(fcdate, leadtime, SID, ends_with("det"))
## ● MEPS
## # A tibble: 20,895 x 4
##    fcdate              leadtime   SID MEPS_det
##    <dttm>                 <int> <int>    <dbl>
##  1 2021-03-12 00:00:00        0  1001     271.
##  2 2021-03-12 00:00:00        0  1010     273.
##  3 2021-03-12 00:00:00        0  1014     271.
##  4 2021-03-12 00:00:00        0  1015     272.
##  5 2021-03-12 00:00:00        0  1018     269.
##  6 2021-03-12 00:00:00        0  1023     267.
##  7 2021-03-12 00:00:00        0  1025     270.
##  8 2021-03-12 00:00:00        0  1026     270.
##  9 2021-03-12 00:00:00        0  1027     270.
## 10 2021-03-12 00:00:00        0  1028     266.
## # … with 20,885 more rows
## 
## ● AROME_ARCTIC
## # A tibble: 3,630 x 4
##    fcdate              leadtime   SID AROME_ARCTIC_det
##    <dttm>                 <int> <int>            <dbl>
##  1 2021-03-12 00:00:00        0  1001             269.
##  2 2021-03-12 00:00:00        0  1010             273.
##  3 2021-03-12 00:00:00        0  1014             271.
##  4 2021-03-12 00:00:00        0  1015             272.
##  5 2021-03-12 00:00:00        0  1018             268.
##  6 2021-03-12 00:00:00        0  1023             267.
##  7 2021-03-12 00:00:00        0  1025             271.
##  8 2021-03-12 00:00:00        0  1026             270.
##  9 2021-03-12 00:00:00        0  1027             271.
## 10 2021-03-12 00:00:00        0  1028             266.
## # … with 3,620 more rows

Observations

  • harp is designed to work with vobs data
    • text files used with “monitor” for operational verification in HIRLAM institutes
    • available on:
      • ecgb at /scratch/ms/dk/nhz/oprint/OBS4
      • lustre for AROME-ARCTIC
      • Extreme security metcoop servers for MEPS (Also in SQLite)
  • Interface with frost coming soon(ish) [based on miIO]
  • Convert vobs to SQLite with read_obs_convert()
  • Read observations from SQLite with read_point_obs()

Point verification workflow

  • Read forecasts
  • Read observations
  • Join
  • Verify
  • Visualize

Point verification workflow

  • Read forecasts : read_point_forecast()
  • Read observations
  • Join
  • Verify
  • Visualize

Point verification workflow

  • Read forecasts : read_point_forecast()
  • Read observations : read_point_obs()
  • Join
  • Verify
  • Visualize

Point verification workflow

  • Read forecasts : read_point_forecast()
  • Read observations : read_point_obs()
  • Join : join_to_forecast()
  • Verify
  • Visualize

Point verification workflow

  • Read forecasts : read_point_forecast()
  • Read observations : read_point_obs()
  • Join : join_to_forecast()
  • Verify : det_verify() / ens_verify()
  • Visualize

Point verification workflow

  • Read forecasts : read_point_forecast()
  • Read observations : read_point_obs()
  • Join : join_to_forecast()
  • Verify : det_verify() / ens_verify()
  • Visualize : plot_point_verif()

Point verification workflow

  • Read forecasts : read_point_forecast()
  • Read observations : read_point_obs()
  • Join : join_to_forecast()
  • Verify : det_verify() / ens_verify()
  • Visualize : plot_point_verif()
  • Save : save_point_verif()
  • Explore : shiny_plot_point_verif()

Demonstration

Resources

This presentation

https://andrew-met.github.io/harp_presentations/suv_verif_202103.html