Overview

The module datahub is the data manager in GeospaceLab, including three class-based core components:

  • DataHub manages a set of datasets docked or added to the datahub.

  • Dataset manages a set of variables loaded from a data source.

  • Variable records the value, error, and various attributes
    (e.g., name, label, unit, depends, ndim, …) of a variable.

Datahub

To create a DataHub object, either call the function create_datahub or the class DataHub. The former provides an option (datahub_class) to create based a DataHub subclass.

create_datahub(dt_fr, dt_to, visual='off', datahub_class=None, **kwargs)

Create a datahub object.

Parameters
  • dt_fr (datetime.datetime) – The starting time.

  • dt_to (datetime.datetime) – The stopping time.

  • visual ({'off', 'on'}, default: 'off') – If “on”, a Visual object is aggregated to the Variable object.

  • datahub_class (DataHub or its subclass) – If None, create a datahub object based on the default DataHub class.

  • kwargs – Other optional keyword arguments as inputs to DataHub.

Returns

dh

Return type

DataHub object

Example

>>> import geospacelab.datahub as datahub
>>> import datetime
>>> dt_fr = datetime.datetime.strptime('20210309' + '0000', '%Y%m%d%H%M')
>>> dt_to = datetime.datetime.strptime('20210309' + '2359', '%Y%m%d%H%M')
>>> dh = datahub.create_datahub(dt_fr, dt_to)
Seealso:

DataHub

class DataHub(dt_fr=None, dt_to=None, visual='off', **kwargs)

The class DataHub manage a set of datasets from various data sources.

Variables
  • dt_fr (datetime.datetime) – The starting time.

  • dt_to (datetime.datetime) – The ending time.

  • visual (str, {'off', 'on'}) – If “on”, a Visual object will be aggregated to the Variable object.

  • datasets (dict) – A dict stores the datasets added (add_dataset()) or docked (dock()) to the datahub.

  • variables (dict) – A dict stores the variables assigned from their aggregated datasets. Typically used for the dashboards or the I/O configuration.

Usage:

  • Create a DataHub object:

Example

Import the datahub module and create a DataHub object

>>> import geospacelab.datahub as datahub
>>> import datetime
>>> dt_fr = datetime.datetime.strptime('20210309' + '0000', '%Y%m%d%H%M')
>>> dt_to = datetime.datetime.strptime('20210309' + '2359', '%Y%m%d%H%M')
>>> dh = datahub.DataHub(dt_fr, dt_to)
seealso:

create_datahub

  • Dock a built-in dataset:

Example

Dock a EISCAT dataset

>>> database_name = 'madrigal'      # built-in sourced database name
>>> facility_name = 'eiscat'
>>> site = 'UHF'      # facility attributes required, check from the eiscat schedule page
>>> antenna = 'UHF'
>>> modulation = 'ant'
>>> ds_1 = dh.dock(datasource_contents=[database_name, facility_name], site=site, antenna=antenna, modulation=modulation, data_file_type='eiscat-hdf5')
seealso:

dock()

  • How to know datasource_contents and required inputs?

Example

List the buit-in data sources

>>> dh.list_sourced_datasets()
seealso:

list_sourced_datasets()

__init__(dt_fr=None, dt_to=None, visual='off', **kwargs)
Parameters
  • dt_fr (datetime.datetime) – The starting time.

  • dt_to (datetime.datetime) – The stopping time.

  • visual ({'off', 'on'}, default: 'off') – If “on”, a Visual object is aggregated to the Variable object.

  • kwargs – other keyword arguments forwarded to the inherited class.

dock(datasource_contents, **kwargs) geospacelab.datahub.__dataset_base__.DatasetSourced

Dock a built-in or registered dataset.

Parameters
  • datasource_contents (list) – the contents that required for docking a sourced dataset. To look up the sourced dataset and the associated contents, call list_sourced_datasets().

  • dt_fr (datetime.datetime) – starting time, optional, use datahub.dt_fr if not specified.

  • dt_to (datetime.datetime) – stopping time, optional, use datahub.dt_to if not specified.

  • visual (str) – variable attribute, use datahub.visual if not specified.

Returns

dataset

Return type

Dataset object

Seealso:

add_dataset()

Note:

The difference between the methods dock() and add_dataset() is that dock() adds a built-in data source, while add_dataset() add a temporary or user-defined dataset, which is not initially included in the package.

add_dataset(*args, kind='temporary', dataset_class=None, **kwargs) geospacelab.datahub.__dataset_base__.DatasetBase

Add one or more datasets, which can be a “temporary” or “user-defined” dataset.

Parameters
  • args (list(dataset)) – A list of the datasets.

  • kind ({'temporary', 'user-defined'}, default: 'temporary') – The type of a dataset. If temporary, a new dataset will be created from the DatasetModel.

  • dataset_class (DatasetModel or its subclass) – If None, the default class is DatasetModel. Used when kind='temporary'.

  • kwargs – Other keyword arguments forwarded to dataset_class

Returns

None

Seealso:

dock()

set_current_dataset(dataset=None, dataset_index=None)

Set the current dataset.

Parameters
  • dataset – A Dataset object.

  • dataset_index (int) – The index of the dataset in .datasets.

Return type

None

get_current_dataset(index=False)

Get the current dataset.

Parameters

index (bool) – The index of the dataset.

Returns

If index=False, dataset object, else dataset_index.

get_variable(var_name, dataset=None, dataset_index=None) geospacelab.datahub.__variable_base__.VariableBase

To get a variable from the docked or added dataset.

Parameters
  • var_name (str) – the name of the queried variable

  • dataset (DatasetBase object) – the dataset storing the queried variable.

  • dataset_index (int) – the index of the dataset in datahub.datasets. if both dataset or dataset_index are not specified, the function will get the variable from the current dataset.

Returns

var

Return type

VariableModel object or None

Seealso

assign_variable()

Note:

Both get_variable() and assign_variable() return a variable object assigned from a dataset. The former only returns the object, and the latter also assign the variable to the DataHub.variables.

assign_variable(var_name, dataset=None, dataset_index=None, add_new=False, **kwargs) geospacelab.datahub.__variable_base__.VariableBase

Assign a variable to DataHub.variables from the docked or added dataset.

Parameters
  • var_name – The name of the variable

  • dataset – The dataset that stores the variable

  • dataset_index – The index of the dataset in the datahub.datasets.

  • add_new – if True, add the variable to the specified dataset and assign to the datahub

  • kwargs – other keywords to configure the attributes of the variable.

Returns

object of VariableModel

Seealso:

get_variable()

static list_sourced_datasets()

List all the bult-in data sources this package

The list will be printed in the python console in a “tree” view.

list_datasets()

List all the datasets that have been docked or added to the datahub

The list will be printed in the console as a table

list_assigned_variables()

List all the assigned variables that have been docked or added to the datahub

The list will be printed in the console as a table

Dataset

All the datasets added to DataHub are the objects of DatasetBase or its subclasses. DatasetBase is the base class, providing the essential attributes and methods to manage a data source. See details below:

class DatasetBase(dt_fr: Optional[datetime.datetime] = None, dt_to: Optional[datetime.datetime] = None, name: str = '', kind: str = '', visual: str = 'off', label_fields: list = ('name', 'kind'), **kwargs)

A dataset is a dictionary-like object used for downloading and loading data from a data source. The items in the dataset are the variables loaded from the data files. The parameters listed below are the general attributes used for the dataset class and its inheritances.

Variables
  • name (str) – The name of the dataset.

  • kind (str) – The type of the dataset. ‘sourced’: the data source has been added in the package, ‘temporary’: a dataset added temporarily, ‘user-defined’: a data source defined by the user.

  • dt_fr (datetime.datetime or None) – the starting time of the data records.

  • dt_fr – the starting time of the data records.

  • visual (str, {"on", "off"}) – If “on”, append the Visual object to the Variable object.

  • label_fields (list) – A list of strings, indicating the fields used for generating the dataset label.

VariableModel

alias of geospacelab.datahub.__variable_base__.VariableBase

add_variable(var_name: str, configured_variables=None, variable_class=None, **kwargs) geospacelab.datahub.__variable_base__.VariableBase

Add a variable to the dataset.

Parameters
  • var_name

  • configured_variables

  • variable_class

  • kwargs

Returns

label(fields=None, separator=' | ', lowercase=True, num_to_str=True) str

Return a label of the data set. :param fields: The attribute names for the label. :param separator: A separator between two attributes. :param lowercase: Show lowercase letters only. :return: label

config(logging: bool = True, **kwargs) None

Configure the attributes of the dataset.

Parameters
  • logging – Show logging if True.

  • kwargs

Returns

class DatasetSourced(dt_fr: Optional[datetime.datetime] = None, dt_to: Optional[datetime.datetime] = None, name: str = '', visual: str = 'off', label_fields: list = ('name', 'kind'), **kwargs)
search_data_files(initial_file_dir=None, search_pattern='*', recursive=None, direct_append=True, allow_multiple_files=False, include_extension=True, **kwargs) list

Search the data files by the input pattern in the file name. The search method is based on pathlib.glob. For a dataset inheritance, a wrapper can be added for a custom setting.

Parameters
  • initial_file_dir (str or pathlib.Path, default: DatasetModel.data_root_dir.) – The initial file directory for searching.

  • search_pattern (str.) – Unix style pathname pattern, see also pathlib.glob.

  • recursive (bool, default: DatasetModel.data_search_recursive.) – Search recursively if True.

  • allow_multiple_files – Allow multiple files as a result.

  • direct_append (bool, default: True) – Append directly the searched results to Dataset.data_file_paths. If False, the file path list is returned only.

Returns

a list of the file paths.

open_dialog(initial_file_dir: Optional[str] = None, data_file_num: Optional[int] = None, **kwargs)

Open a dialog to select the data files.

check_data_files(load_mode: Optional[str] = None, **kwargs)

Check the existing of the data files before loading the data, depending on the loading mode (load_mode). This methods still needs to be improved as different datasets may have different variables as epochs. Two kinds of things can be done: 1. write a wrapper in the new dataset inheritance. 2. Add a script to recognize the epoch variables.

time_filter_by_range(var_datetime=None, var_datetime_name=None)

Clip the times. :param var_datetime: :param var_datetime_name: :return:

VariableModel

alias of geospacelab.datahub.__variable_base__.VariableBase

add_variable(var_name: str, configured_variables=None, variable_class=None, **kwargs) geospacelab.datahub.__variable_base__.VariableBase

Add a variable to the dataset.

Parameters
  • var_name

  • configured_variables

  • variable_class

  • kwargs

Returns

config(logging: bool = True, **kwargs) None

Configure the attributes of the dataset.

Parameters
  • logging – Show logging if True.

  • kwargs

Returns

label(fields=None, separator=' | ', lowercase=True, num_to_str=True) str

Return a label of the data set. :param fields: The attribute names for the label. :param separator: A separator between two attributes. :param lowercase: Show lowercase letters only. :return: label

Variable