pys5p package

Subpackages

pys5p.lib package
- Module contents

Submodules

pys5p.ckd_io module

This module contains the class CKDio.

class pys5p.ckd_io.CKDio(ckd_dir: Path | None = None, ckd_version: int = 1, ckd_file: Path | None = None)[source]

Bases: object

Read Tropomi CKD from the Static CKD product or from dynamic CKD products.

Parameters:

ckd_dir (Path, default=Path('/nfs/Tropomi/share/ckd')) – Directory where the CKD files are stored
ckd_version (int, default=1) – Version of the CKD
ckd_file (str, optional) – Name of the CKD file, default=None then the CKD file is searched in the directory ckd_dir with ckd_version in the glob-string

Notes

Not all CKD are defined or derived for all bands. You can request a CKD for one band or for a channel (bands: ‘12’, ‘34’, ‘56’, ‘78’). Do not mix bands from different channels

The option to have dynamic CKD is not used for the Tropomi mission, only for S/W version 1 a dynamic CKD product is defined. This product contained the OCAL CKD and was not updated automatically. For version 2, all CKD are stored in one product, where some CKD have a time-axis to correct any in-flight degradation.

Therefore, the logic to find a CKD is implemented as follows:

ckd_dir, defines the base directory to search for the CKD products (see below).
ckd_file, defines the full path to (static) CKD product; (version 1) any product with dynamic CKD has to be in the same directory.

Version 1:

Static CKD are stored in one file: glob(’_AUX_L1_CKD_’)
Dynamic CKD are stored in two files:
- UVN, use glob(’_ICM_CKDUVN_’)
- SWIR, use glob(’_ICM_CKDSIR_’)

Version 2+:

All CKD in one file: glob(’_AUX_L1_CKD_’)
Dynamic CKD are empty

absirr(qvd: int = 1, bands: str = '78') → Dataset[source]

Return absolute irradiance responsivity.

Parameters:

qvd (int, default: 1) – Tropomi QVD identifier. Valid values are 1 or 2
bands (str, default: '78') – Tropomi bands [1..8] or channels [‘12’, ‘34’, ‘56’, ‘78’]

absrad(bands: str = '78') → Dataset[source]

Return absolute radiance responsivity.

Parameters:: bands (str, default: '78') – Tropomi bands [1..8] or channels [‘12’, ‘34’, ‘56’, ‘78’]

close() → None[source]: Make sure that we close all resources.

creation_time() → str[source]: Return datetime when the L1b product was created.

creator_version() → str[source]: Return version of Tropomi L01B processor.

darkflux(bands: str = '78') → Dataset[source]

Return dark-flux CKD, SWIR only.

Parameters:: bands (str, default: '78') – Tropomi SWIR bands ‘7’, ‘8’ or both ‘78’

dn2v_factors() → ndarray[source]

Return digital number to Volt CKD, SWIR only.

Notes

The DN2V factor has no error attached to it.

dpqf(threshold: float | None = None, bands: str = '78') → Dataset[source]

Return detector pixel-quality flags (boolean), SWIR only.

Parameters:

threshold (float, optional) – Value between [0..1], default is to read the threshold from CKD
bands (str, default='78') – Tropomi SWIR bands ‘7’, ‘8’, or both ‘78’

Return type:

numpy ndarray

get_param(ds_name: str, band: str = '7') → ndarray | float[source]

Return value(s) of a CKD parameter from the Static CKD product.

Parameters:

ds_name (str) – Name of the HDF5 dataset, default=’pixel_full_well’
band (str, default='7') – Band identifier ‘1’, ‘2’, …, ‘8’

Returns:

CKD parameter value

Return type:

numpy.ndarray or scalar

Notes

Datasets of size=1 are return as scalar

Handy function for scalar HDF5 datasets, such as:

dc_reference_temp

dpqf_threshold

pixel_full_well

pixel_fw_flag_thresh

memory() → Dataset[source]: Return memory CKD, SWIR only.

noise(bands: str = '78') → Dataset[source]

Return readout-noise CKD, SWIR only.

Parameters:: bands (str, default: '78') – Tropomi bands [1..8] or channels [‘12’, ‘34’, ‘56’, ‘78’]

offset(bands: str = '78') → Dataset[source]

Return offset CKD, SWIR only.

Parameters:: bands (str, default: '78') – Tropomi SWIR bands ‘7’, ‘8’ or both ‘78’

pixel_quality(bands: str = '78') → Dataset[source]

Return detector pixel-quality mask (float [0, 1]), SWIR only.

Parameters:: bands (str, default: '78') – Tropomi SWIR bands ‘7’, ‘8’ or both ‘78’

prnu(bands: str = '78') → Dataset[source]

Return Pixel Response Non-Uniformity (PRNU).

Parameters:: bands (str, default: '78') – Tropomi bands [1..8] or channels [‘12’, ‘34’, ‘56’, ‘78’]

relirr(qvd: int = 1, bands: str = '78') → tuple[dict] | None[source]

Return relative irradiance correction.

Parameters:

bands (str, default: '78') – Tropomi bands [1..8] or channels [‘12’, ‘34’, ‘56’, ‘78’]
qvd (int) – Tropomi QVD identifier. Valid values are 1 or 2, default: 1

Returns:

CKD for relative irradiance correction as dictionaries with keys:

band: Tropomi spectral band ID

mapping_cols: coarse irregular mapping of the columns

mapping_rows: coarse irregular mapping of the rows

cheb_coefs: Chebyshev parameters for elevation and azimuth for pixels on a coarse irregular grid

Return type:

dict

saa() → dict[source]: Return definition of the SAA region.

saturation() → Dataset[source]: Return pixel-saturation values (pre-offset), SWIR only.

v2c_factors() → ndarray[source]

Return Voltage to Charge CKD, SWIR only.

Notes

The V2C factor has no error attached to it.

wavelength(bands: str = '78') → Dataset[source]

Return wavelength CKD.

Parameters:: bands (str, default: '78') – Tropomi bands [1..8] or channels [‘12’, ‘34’, ‘56’, ‘78’]

Notes

The wavelength CKD has no error attached to it.

pys5p.error_propagation module

This module contains routines to divide or add uncertainties.

pys5p.error_propagation.unc_div(value_a: ndarray, sigma_a: ndarray, value_b: ndarray, sigma_b: ndarray) → ndarray[source]: Absolute error for parameter a divided by b.

\[(a / b) * \sqrt{(\sigma_a / a)^2 + (\sigma_b / b)^2}\]

pys5p.error_propagation.unc_sum(sigma_a: ndarray, sigma_b: ndarray) → ndarray[source]: Absolute error for the sum of the parameters a and b.

\[\sqrt{\sigma_a^2 + \sigma_b^2}\]

pys5p.get_data_dir module

This module contains routine get_data_dir.

pys5p.get_data_dir.get_data_dir()[source]

Obtain directory with test datasets.

Limited to UNIX/Linux/macOS operating systems

This module checks if the following directories are available:

/data/$USER/pys5p-data
/Users/$USER/pys5p-data
environment variable PYS5P_DATA_DIR

It expects the data to be organized in the subdirectories:

CKD which should contain the SWIR dpqf CKD
OCM which should contain at least one directory of an on-ground calibration measurement with one or more OCAL LX products.
L1B which should contain at least one offline calibration, irradiance and radiance product.
ICM which contain at least one in-flight calibration product.

pys5p.icm_io module

pys5p.l1b_io module

pys5p.l1b_patch module

pys5p.lv2_io module

This module contains the class LV2io to access Tropomi level-2 products.

class pys5p.lv2_io.LV2io(lv2_product: Path)[source]

Bases: object

A class to read Tropomi Level-2 (offline) products.

Parameters:: lv2_product (Path) – full path to S5P Tropomi level 2 product

Notes

The Python h5py module can read the operational netCDF4 products without any problems, however, the SRON science products contain incompatible attributes. Thus, should be fixed when more up-to-date netCDF software is used to generate the products. Currently, the Python netCDF4 module is used to read the science products.

property algorithm_version: str | None: Return version of the level 2 algorithm.

close()[source]: Close the product.

property coverage_time: tuple: Return start and end of the measurement coverage time.

property creation_time: str: Return creation date/time of the level 2 product.

get_attr(attr_name: str, ds_name: str | None = None) → ndarray | None[source]

Obtain value of an HDF5 file attribute or dataset attribute.

Parameters:

attr_name (str) – name of the attribute
ds_name (str, optional) – name of dataset, default is to read the product attributes

get_data_as_xds(name: str, data_sel: tuple[slice | int] | None = None) → DataArray[source]

Read dataset from group PRODUCT/target_product group.

Parameters:

name (str) – name of dataset with level 2 data
data_sel (numpy slice) – a 3-dimensional numpy slice: time, scan_line, ground_pixel

Return type:

xarray.DataArray

get_dataset(name: str, data_sel: tuple[slice | int] | slice | None = None, fill_as_nan: bool = True) → ndarray[source]

Read level 2 dataset from PRODUCT group.

Parameters:

name (string) – name of dataset with level 2 data
data_sel (numpy slice) – a 3-dimensional numpy slice: time, scan_line, ground_pixel
fill_as_nan (boolean) – Replace (float) FillValues with Nan’s, when True

Return type:

numpy.ndarray

get_geo_bounds(extent: list[float, float, float, float] | None, data_sel: tuple[slice | int] | None) → ndarray | tuple[source]

Return bounds of latitude/longitude as a mesh for plotting.

Parameters:

extent (list) –

select data to cover a region with geolocation defined by:
lon_min, lon_max, lat_min, lat_max and return numpy slice
data_sel (numpy slice) – a 3-dimensional numpy slice: time, scan_line, ground_pixel Note ‘data_sel’ will be overwritten when ‘extent’ is defined

Returns:

data_sel (numpy slice) – Select slice of data which covers geolocation defined by extent. Only provided if extent is not None.
out (dictionary) – With numpy arrays for latitude and longitude

get_geo_data(geo_dsets: str | None = None) → dict[source]

Return data of selected datasets from the GEOLOCATIONS group.

Parameters:

geo_dsets (str, optional) –

Name(s) of datasets, comma separated. Default:

operational: ‘latitude,longitude’
science: ‘latitude_center,longitude_center’

Returns:

dictionary with arrays of selected datasets

Return type:

dict

get_time() → ndarray | None[source]: Return start time of measurement per scan-line.

property orbit: int: Return reference orbit number.

property processor_version: str | None: Return version of the level 2 processor.

property product_version: str: Return version of the level 2 product.

property ref_time: datetime | None: Return reference start time of measurements.

property science_product: bool: Check if product is a science product.

pys5p.ocm_io module

This module contain class OCMio to access on-ground calibration data.

class pys5p.ocm_io.OCMio(ocm_product: Path)[source]

Bases: object

A class to read Tropomi on-ground calibration products (Lx).

Parameters:: ocm_product (Path) – Full path to on-ground calibration measurement

close()[source]: Close resources.

get_attr(attr_name) → ndarray | None[source]

Obtain value of an HDF5 file attribute.

Parameters:: attr_name (string) – name of the attribute

get_coverage_time() → tuple[str, str][source]: Return start and end of the measurement coverage time.

get_delta_time() → ndarray | None[source]: Return offset from the reference start time of measurement.

get_exposure_time() → ndarray | None[source]: Return the exact pixel exposure time of the measurements.

get_gse_stimuli() → ndarray | None[source]: Return GSE stimuli parameters.

get_housekeeping_data() → ndarray | None[source]: Return housekeeping data of measurements.

get_instrument_settings() → ndarray | None[source]: Return instrument settings of measurement.

get_msm_attr(msm_dset: str, attr_name: str) → str | None[source]

Return attribute of measurement dataset “msm_dset”.

Parameters:

msm_dset (str) – name of measurement dataset
attr_name (str) – name of the attribute

Returns:

value of attribute “attr_name”

Return type:

scalar or numpy.ndarray

get_msm_data(msm_dset: str, fill_as_nan: bool = True, frames: list[int, int] | None = None, columns: list[int, int] | None = None) → dict[source]

Return data of measurement dataset msm_dset.

Parameters:

msm_dset (str) – name of measurement dataset if msm_dset is None then show names of available datasets
columns ([i, j]) – Slice data on fastest axis (columns) as, from index ‘i’ to ‘j’
frames ([i, j]) – Slice data on slowest axis (time) as, from index ‘i’ to ‘j’
fill_as_nan (boolean) – replace (float) FillValues with Nan’s

Returns:

Python dictionary with names of msm_groups as keys

Return type:

dict

get_processor_version() → str[source]: Return version of the L01b processor.

get_ref_time() → ndarray | None[source]: Return reference start time of measurements.

read_direct_msm(msm_dset: str, dest_sel: tuple[slice | int] | None = None, dest_dtype: type[Any] | None = None, fill_as_nan: bool = False) → dict | None[source]

Return data of measurement dataset msm_dset (fast implementation).

Parameters:

msm_dset (string) – Name of measurement dataset
dest_sel (numpy slice) – Selection must be the output of numpy.s_[<args>].
dest_dtype (numpy dtype) – Perform type conversion
fill_as_nan (boolean) – Replace (float) FillValues with Nan’s, when True

Returns:

Python dictionary with names of msm_groups as keys

Return type:

dict

select(ic_id: int | None = None, *, msm_grp: str | None = None) → int[source]

Select a measurement as BAND%/ICID_<ic_id>_GROUP_%.

Parameters:

ic_id (int) – used as “BAND%/ICID_{}_GROUP_%”.format(ic_id)
msm_grp (str) – select measurement group with name msm_grp
None (All measurements groups are shown when ic_id and msm_grp are) –

Returns:

Number of measurements found

Return type:

scalar

Notes

Updated object attributes:

bands : available spectral bands

pys5p.rls module

Implementation of the Relative Least-Squares regression (RLS).

The RLS regression is used to find the linear dependence y(x) = c0 + c1 * x that best describes the data before and after correction, using absolute residuals y_i - (c0 + c1 * x_i) divided by the expected signals c1 * x_i in the least-squares sum. Offset c0 has an arbitrary size and should not affect the fit result. Weight factors are determined such to effectively spread the data points evenly over the whole range of x, making the result less sensitive to the actual spacing between the data points.

pys5p.rls.rls_fit(xdata: ndarray, ydata: ndarray | MaskedArray) → tuple[source]

Perform RLS regression finding linear dependence y(x) = c0 + c1 * x.

Parameters:

xdata (ndarray, shape (M,)) – X-coordinates of the M sample points (xdata[i], ydata[…, i]) The array values have to be positive and increasing
ydata (MaskedArray or ndarray, shape (..., M)) – Y-coordinates of the sample points

Returns:

c0, c1, std_c0, std_c1 – coefficients of the linear dependence and their standard deviations

Return type:

tuple of ndarrays

Notes

Calling a rls-function with MaskedArrays is much slower than with plain ndarrays.

The coefficients are set to NaN when the number of samples are less than 2.

The standard deviations can only be calculated when the number of samples are larger than two, else the standard deviations are equal to zero.

pys5p.rls.rls_fit0(xdata: ndarray, ydata: ndarray | MaskedArray) → tuple[source]

Perform RLS regression finding linear dependence y(x) = c1 * x.

Parameters:

xdata (ndarray, shape (M,)) – X-coordinates of the M sample points (xdata[i], ydata[…, i]) The array values have to be positive and increasing
ydata (MaskedArray or ndarray, shape (..., M)) – Y-coordinates of the sample points

Returns:

c1, std_c1 – coefficients of the linear dependence and their standard deviations

Return type:

tuple of ndarrays

Notes

The coefficients are set to NaN when the number of samples are less than 2.

The standard deviations can only be calculated when the number of samples are larger than two, else the standard deviations are equal to zero.

pys5p.s5p_msm module

This module contains the class S5Pmsm.

Warning

Depreciated, this module is no longer maintained.

class pys5p.s5p_msm.S5Pmsm(dset: Dataset | ndarray, data_sel: tuple[slice | int] | None = None, datapoint: bool = False)[source]

Bases: object

A class to hold a HDF5 dataset and its attributes.

Parameters:

dset (h5py.Dataset or ndarray) – h5py dataset from which the data is read, data is used to initialize S5Pmsm object
data_sel (numpy slice) – a numpy slice generated for example numpy.s_
datapoint (bool) – to indicate that the dataset is a compound of type datapoint

Returns:

numpy structure with dataset data and attributes, including data,
fillvalue, coordinates, units, …

biweight(data_sel: tuple[slice | int] | None = None, axis: int = 0, keepdims: bool = False) → S5Pmsm[source]

Reduce this S5Pmsm data by applying biweight along some dimension.

Parameters:

data_sel (numpy slice) – A numpy slice generated for example numpy.s_. Can be used to skip the first and/or last frame
axis (int, default=0) – Axis or axes along which the medians are computed.
keepdims (bool, default=False) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.

Returns:

S5Pmsm object with its data (value & error) replaced by its biweight
medians along one axis. The coordinates are adjusted, accordingly.

concatenate(msm: S5Pmsm, axis: int = 0) → S5Pmsm[source]

Concatenate two measurement datasets, the current with another.

Parameters:

msm (pys5p.S5Pmsm) – an S5Pmsm object
axis (int, default=0) – The axis for which the array will be joined.

Returns:

The data of the new dataset is concatenated to the existing data along
an existing axis. The affected coordinate is also extended.
Note –
- The arrays must have the same shape, except in the dimension
corresponding to axis (the first, by default).

coord_name(axis: int)[source]: Return name of coordinate.

coord_replace(key: str, dims: ndarray)[source]: Change values of a coordinate.

copy()[source]: Return a deep copy of the current object.

fill_as_nan()[source]

Replace fillvalues in data with NaN’s.

Works only on datasets with HDF5 datatype ‘float’ or ‘datapoints’

nanmean(data_sel: tuple[slice | int] | None = None, axis: int = 0, keepdims: bool = False) → S5Pmsm[source]

Reduce this S5Pmsm data by applying mean along some dimension.

Parameters:

data_sel (numpy slice, optional) – A numpy slice generated for example numpy.s_. Can be used to skip the first and/or last frame
axis (int, default=0) – Axis or axes along which the mean are computed.
keepdims (bool, default=False) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.

Returns:

S5Pmsm object with its data (value & error) replaced by its nanmean
and standard deviation along one axis.
The coordinates are adjusted, accordingly.

nanmedian(data_sel: tuple[slice | int] | None = None, axis: int = 0, keepdims: bool = False) → S5Pmsm[source]

Reduce this S5Pmsm data by applying median along some dimension.

Parameters:

data_sel (numpy slice, optional) – A numpy slice generated for example numpy.s_. Can be used to skip the first and/or last frame
axis (int, default=0) – Axis or axes along which the medians are computed.
keepdims (bool, default=False) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.

Returns:

S5Pmsm object with its data (value & error) replaced by its nanmedian
and standard deviation along one axis.
The coordinates are adjusted, accordingly.

nanpercentile(vperc: int | list[float], data_sel: tuple[slice | int] | None = None, axis: int = 0, keepdims: bool = False) → S5Pmsm[source]

Return percentile(s) of the data in the S5Pmsm.

Parameters:

vperc (list) – range to normalize luminance data between percentiles min and max of array data.
data_sel (numpy slice) – A numpy slice generated for example numpy.s_. Can be used to skip the first and/or last frame
axis (int, default=0) – Axis or axes along which the medians are computed.
keepdims (bool, default=False) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.

Returns:

S5Pmsm object with the original data replaced by the percentiles along
one of the axis, see below. The coordinates are adjusted, accordingly.
You should at least supply one percentile and at most three. –

vperc is instance ‘int’ or len(vperc) == 1:
’value’ is replaced by its (nan-)percentile vperc ‘error’ is unchanged

len(vperc) == 2:
’vperc’ is sorted ‘value’ is replaced by its (nan-)median ‘error’ is replaced by percentile(‘value’, (vperc[0], vperc[1]))

len(vperc) == 3:
’vperc’ is sorted ‘value’ is replaced by percentile(‘value’, vperc[1]) ‘error’ is replaced by percentile(‘value’, (vperc[0], vperc[2]))

set_coords(coords_data: list[ndarray], coords_name: list[str] | None = None)[source]

Set coordinates of data.

Parameters:

coords_data (list of ndarrays) – list with coordinates data for each dimension
coords_name (list of strings) – list with the names of each dimension

set_coverage(coverage: tuple[str, str], force: bool = False)[source]

Set the coverage attribute, as (coverageStart, coverageEnd).

Parameters:

coverage (tuple[str, str]) –
force (bool, default=False) – overwrite when force is true

Notes

Both elements are expected to be datatime objects.

set_fillvalue()[source]: Set fillvalue to KNMI undefined.

set_long_name(name: str, force: bool = False)[source]: Set the long_name attribute, overwrite when force is true.

set_units(units: str | None, force: bool = False)[source]: Set the units attribute, overwrite when force is true.

sort(axis: int = 0)[source]

Sort data and its coordinate along a given axis.

Parameters:: axis (int, default=0) – axis for which the array will be sorted.

transpose() → S5Pmsm[source]: Transpose data and coordinates of an S5Pmsm object.

pys5p.swir_region module

Return the usable area on the SWIR detector.

There are two definitions:

'illuminated':
   Detector area illuminated by external sources, defined as
   a rectangular area where the signal is at least 50% of the
   maximum signal. Coordinates: rows [11:228], columns [16:991].

'level2':
   A smaller area used in official SWIR level 1B (ir)radiance
   products. Coordinates: rows [12:227], columns [20:980].

Notes

Row 257 of the SWIR detector is neglected.

pys5p.swir_region.coords(mode: str = 'illuminated', band: str = '78') → tuple[slice, slice][source]

Return slice defining the illuminated region on the SWIR detector.

Parameters:

mode ({'illuminated', 'level2'}, optional) – default is ‘illuminated’
band (str, optional) – select band 7 or 8, default is both bands

pys5p.swir_region.mask(mode: str = 'illuminated', band: str = '78') → ndarray[source]

Return mask of the illuminated region.

Parameters:

mode ({'illuminated', 'level2'}, optional) – default is ‘illuminated’
band (str, optional) – select band 7 or 8, default is both bands

Notes

Pixels within the illuminated region are set to True.

pys5p.swir_texp module

Calculate the Tropomi SWIR exposure time from detector settings.

pys5p.swir_texp.swir_exp_time(int_delay: int, int_hold: int) → float[source]

Calculate the correct SWIR exposure time from detector settings.

Parameters:

int_delay (int) – parameters int_delay from the instrument_settings
int_hold (int) – parameters int_holt from the instrument_settings

Returns:

exact (SWIR) pixel exposure time

Return type:

float

pys5p.version module

Provide access to the software version as obtained from git.

pys5p.version.pys5p_version(full=False, githash=False) → str[source]

Return the software version as obtained from git.

Examples

Show the software version of the module pys5p:

> from pys5p.version import pys5p_version
> pys5p_version()

‘2.1.5’

Module contents

This is the SRON Python package pys5p.

It contains software to read Sentinel-5p Tropomi ICM, L1B and L2 products.