Dataset: era5

import xarray as xr
import matplotlib.pyplot as plt
import cartopy.crs as ccrs

import src.utils as ut
from src.cds_era5 import request_thread, days_of_month

# Setup the root path of the application
project_path = ut.project_path()

# Load the metadata

meta_filename = [
    f"{ut.project_path(1)}/meta/weather/era5h.json",
    f"{ut.project_path(1)}/meta_ipynb/era5h.html",
]
metadata = ut.load_metadata(meta_filename)

# Get contentUrl from metadata file
ut.info_meta(metadata)

1. Distribution by Climate Data Store (CDS) API

Here we provide an example for how to download a time-chunked dataset for a given set of era5-variables. The CDS-API offers a great variety of variables and grid-size options, thus the user is incoraged to view the query options on the API’ homepage where some additional examples are given.

# Get the _era5_ dataset relevant information
contentUrl, dataset_name, distr_name = ut.get_meta(
    metadata, idx_distribution=0, idx_hasPart=None
)

# Make folders for data download
path = f"{project_path}/data/{dataset_name}/{distr_name}"
ut.makedirs(path)

Before we are able to request data you need an API-key to get access. In order to get the key, one should get an account on the CDS’ homepage.

Note

There is an alternative method to setup the API within a configuration file. Just follow the instraction on its GitHub homepage.

# Insert the CDS API-key
API_KEY = input("Enter your CDS api-key: ")

Below we run a set of threads that makes multiple requests to the API-server in order to speed-up the download. In order to avoid long queue and processing times on the API server side, it is advises to perform many small requests of (e.g. one variable and month at a time) rather than just to download a big dataset. This is because data is stored in small chunks on the server and any big request involves merging operations.

As an example we get the ERA5’ worldwide dataset on single levels for a relatively short time period for wind, air temperature, dewpoint temperature and total precipitation.

Note

Era5 has two types of space coverages: worldwide (sea and land) and land only. The first one is released very often (5 days delay) but with a grid resolution of 0.25 degrees, while the second one is released only every 3 months but with a grid of 0.1 degrees.

Note

Once a request for data is done within the CDS-api, it could take few seconds or even hours to get back the data since our request is put in a queue. Check the status of your requests by logging the CDS website

In order to reduce storage needs, we perform masking to “remove” sea data and apply compression over all the variables. This reduces data storage by about 80%. We only store the masked netCDF-files and remove the API-downloaded files.

# Setup for a two year period dataset
# first_year = 2018
# last_year = 2019
# years = range(first_year, last_year+1)
# months = range(1, 12+1)

# Setup a shorter dataset time period (take only one day)
years = [2023]
months = [9]
end_day = 1
# ERA5 dataset name
name = "reanalysis-era5-single-levels"
name = "reanalysis-era5-single-levels-monthly-means"

# Get datasets related to the following variables
variables = [
    "10m_u_component_of_wind",
    "10m_v_component_of_wind",
    "2m_dewpoint_temperature",
    "2m_temperature",
    "total_precipitation",
]

# Get data for every hour of the day
all_hours = [str(i).zfill(2) + ":00" for i in range(0, 23 + 1)]

# Build the API query list
api_request_list = []
for year in years:
    for month in months:
        if name == "reanalysis-era5-single-levels":
            for d in days_of_month(year, month)[slice(0, end_day)]:
                for var in variables:
                    api_request_list.append(
                        {
                            "product_type": "reanalysis",
                            "variable": var,
                            "date": d,
                            "time": all_hours,
                            "format": "netcdf",
                        }
                    )
        elif name == "reanalysis-era5-single-levels-monthly-means":
            for var in variables:
                api_request_list.append(
                    {
                        "product_type": "monthly_averaged_reanalysis",
                        "variable": var,
                        "year": str(year),
                        "month": str(month).zfill(2),
                        "time": "00:00",
                        "format": "netcdf",
                    }
                )

# Mask dataset should be an array
mask_land = xr.open_dataset("./src/mask_land.nc").mask_land

# Request data from
request_thread(
    api_request_list,
    name,
    url=contentUrl,
    key=API_KEY,
    path=path,
    mask=mask_land,
    max_workers=10,
)

Finally, we plot a masked air temperature dataset. Note that masking looks unsuccessful for high latitudes but this is probably due to the fact that in arctic regions it is unclear the difference between sea and land (i.e. ice).

# Lazy-load of all the downloaded netCDF-files
ds = xr.open_mfdataset(f"{path}/masked_2m_temperature_*.nc", mask_and_scale=True)

# Select a timedate to plot
data = ds.t2m.isel(time=0)

# Make a raster figure with a costline
fig = plt.figure(figsize=(20, 10))
ax = plt.axes(projection=ccrs.Robinson())
ax.coastlines(resolution="50m")

plot = data.plot.imshow(
    cmap=plt.cm.coolwarm, transform=ccrs.PlateCarree(), cbar_kwargs={"shrink": 0.8}
)

plt.savefig(f"{path}/air_temp_example.png", dpi=150, format="png", transparent=True)
Air temperature at 2m from ERA5 dataset