Datahub#

To create a DataHub object, either call the function create_datahub or the class DataHub. The former provides an option (datahub_class) to create based a DataHub subclass.

create_datahub(dt_fr, dt_to, visual='off', datahub_class=None, **kwargs)#

Create a datahub object.

Parameters:
  • dt_fr (datetime.datetime) – The starting time.

  • dt_to (datetime.datetime) – The stopping time.

  • visual ({'off', 'on'}, default: 'off') – If “on”, a Visual object is aggregated to the Variable object.

  • datahub_class (DataHub or its subclass) – If None, create a datahub object based on the default DataHub class.

  • kwargs – Other optional keyword arguments as inputs to DataHub.

Returns:

dh

Return type:

DataHub object

Example:

>>> import geospacelab.datahub as datahub
>>> import datetime
>>> dt_fr = datetime.datetime.strptime('20210309' + '0000', '%Y%m%d%H%M')
>>> dt_to = datetime.datetime.strptime('20210309' + '2359', '%Y%m%d%H%M')
>>> dh = datahub.create_datahub(dt_fr, dt_to)
Seealso::

DataHub

class DataHub(dt_fr=None, dt_to=None, visual='off', **kwargs)#

The class DataHub manage a set of datasets from various data sources.

Variables:
  • dt_fr (datetime.datetime) – The starting time.

  • dt_to (datetime.datetime) – The ending time.

  • visual (str, {'off', 'on'}) – If “on”, a Visual object will be aggregated to the Variable object.

  • datasets (dict) – A dict stores the datasets added (add_dataset()) or docked (dock()) to the datahub.

  • variables (dict) – A dict stores the variables assigned from their aggregated datasets. Typically used for the dashboards or the I/O configuration.

Usage:

  • Create a DataHub object:

Example:

Import the datahub module and create a DataHub object

>>> import geospacelab.datahub as datahub
>>> import datetime
>>> dt_fr = datetime.datetime.strptime('20210309' + '0000', '%Y%m%d%H%M')
>>> dt_to = datetime.datetime.strptime('20210309' + '2359', '%Y%m%d%H%M')
>>> dh = datahub.DataHub(dt_fr, dt_to)
seealso::

create_datahub

  • Dock a built-in dataset:

Example:

Dock a EISCAT dataset

>>> database_name = 'madrigal'      # built-in sourced database name
>>> facility_name = 'eiscat'
>>> site = 'UHF'      # facility attributes required, check from the eiscat schedule page
>>> antenna = 'UHF'
>>> modulation = 'ant'
>>> ds_1 = dh.dock(datasource_contents=[database_name, facility_name], site=site, antenna=antenna, modulation=modulation, data_file_type='eiscat-hdf5')
seealso::

dock()

  • How to know datasource_contents and required inputs?

Example:

List the buit-in data sources

>>> dh.list_sourced_datasets()
seealso::

list_sourced_datasets()

Attributes:
host_dataset

Methods

add_dataset(*args[, kind, dataset_class])

Add one or more datasets, which can be a "temporary" or "user-defined" dataset.

assign_variable(var_name[, dataset, ...])

Assign a variable to DataHub.variables from the docked or added dataset.

dock(datasource_contents, **kwargs)

Dock a built-in or registered dataset.

get_current_dataset([index])

Get the current dataset.

get_variable(var_name[, dataset, dataset_index])

To get a variable from the docked or added dataset.

list_assigned_variables()

List all the assigned variables that have been docked or added to the datahub

list_datasets()

List all the datasets that have been docked or added to the datahub

list_sourced_datasets()

List all the bult-in data sources this package

set_current_dataset([dataset, dataset_index])

Set the current dataset.

save_to_cdf

save_to_pickle

__init__(dt_fr=None, dt_to=None, visual='off', **kwargs)#
Parameters:
  • dt_fr (datetime.datetime) – The starting time.

  • dt_to (datetime.datetime) – The stopping time.

  • visual ({'off', 'on'}, default: 'off') – If “on”, a Visual object is aggregated to the Variable object.

  • kwargs – other keyword arguments forwarded to the inherited class.

dock(datasource_contents, **kwargs) DatasetSourced#

Dock a built-in or registered dataset.

Parameters:
  • datasource_contents (list) – the contents that required for docking a sourced dataset. To look up the sourced dataset and the associated contents, call list_sourced_datasets().

  • dt_fr (datetime.datetime) – starting time, optional, use datahub.dt_fr if not specified.

  • dt_to (datetime.datetime) – stopping time, optional, use datahub.dt_to if not specified.

  • visual (str) – variable attribute, use datahub.visual if not specified.

Returns:

dataset

Return type:

Dataset object

Seealso::

add_dataset()

Note::

The difference between the methods dock() and add_dataset() is that dock() adds a built-in data source, while add_dataset() add a temporary or user-defined dataset, which is not initially included in the package.


add_dataset(*args, kind='temporary', dataset_class=None, **kwargs) DatasetBase#

Add one or more datasets, which can be a “temporary” or “user-defined” dataset.

Parameters:
  • args (list(dataset)) – A list of the datasets.

  • kind ({'temporary', 'user-defined'}, default: 'temporary') – The type of a dataset. If temporary, a new dataset will be created from the DatasetModel.

  • dataset_class (DatasetModel or its subclass) – If None, the default class is DatasetModel. Used when kind='temporary'.

  • kwargs – Other keyword arguments forwarded to dataset_class

Returns:

None

Seealso::

dock()

set_current_dataset(dataset=None, dataset_index=None)#

Set the current dataset.

Parameters:
  • dataset – A Dataset object.

  • dataset_index (int) – The index of the dataset in .datasets.

Return type:

None

get_current_dataset(index=False)#

Get the current dataset.

Parameters:

index (bool) – The index of the dataset.

Returns:

If index=False, dataset object, else dataset_index.

get_variable(var_name, dataset=None, dataset_index=None) VariableBase#

To get a variable from the docked or added dataset.

Parameters:
  • var_name (str) – the name of the queried variable

  • dataset (DatasetBase object) – the dataset storing the queried variable.

  • dataset_index (int) – the index of the dataset in datahub.datasets. if both dataset or dataset_index are not specified, the function will get the variable from the current dataset.

Returns:

var

Return type:

VariableModel object or None

Seealso:

assign_variable()

Note::

Both get_variable() and assign_variable() return a variable object assigned from a dataset. The former only returns the object, and the latter also assign the variable to the DataHub.variables.

assign_variable(var_name, dataset=None, dataset_index=None, add_new=False, **kwargs) VariableBase#

Assign a variable to DataHub.variables from the docked or added dataset.

Parameters:
  • var_name – The name of the variable

  • dataset – The dataset that stores the variable

  • dataset_index – The index of the dataset in the datahub.datasets.

  • add_new – if True, add the variable to the specified dataset and assign to the datahub

  • kwargs – other keywords to configure the attributes of the variable.

Returns:

object of VariableModel

Seealso::

get_variable()

static list_sourced_datasets()#

List all the bult-in data sources this package

The list will be printed in the python console in a “tree” view.

list_datasets()#

List all the datasets that have been docked or added to the datahub

The list will be printed in the console as a table

list_assigned_variables()#

List all the assigned variables that have been docked or added to the datahub

The list will be printed in the console as a table

Dataset#

All the datasets added to DataHub are the objects of DatasetBase or its subclasses. DatasetBase is the base class, providing the essential attributes and methods to manage a data source. See details below:

class DatasetBase(dt_fr: datetime = None, dt_to: datetime = None, name: str = '', kind: str = '', visual: str = 'off', label_fields: list = ('name', 'kind'), **kwargs)#

A dataset is a dictionary-like object used for downloading and loading data from a data source. The items in the dataset are the variables loaded from the data files. The parameters listed below are the general attributes used for the dataset class and its inheritances.

Variables:
  • name (str) – The name of the dataset.

  • kind (str) – The type of the dataset. ‘sourced’: the data source has been added in the package, ‘temporary’: a dataset added temporarily, ‘user-defined’: a data source defined by the user.

  • dt_fr (datetime.datetime or None) – the starting time of the data records.

  • dt_fr – the starting time of the data records.

  • visual (str, {"on", "off"}) – If “on”, append the Visual object to the Variable object.

  • label_fields (list) – A list of strings, indicating the fields used for generating the dataset label.

Methods

VariableModel

add_variable(var_name[, ...])

Add a variable to the dataset.

config([logging])

Configure the attributes of the dataset.

exist(var_name)

label([fields, separator, lowercase, num_to_str])

Return a label of the data set.

add_attr

attrs_to_dict

clone_variables

get_variable_names

items

keys

list_all_variables

register_method

remove_variable

VariableModel#

alias of VariableBase

add_variable(var_name: str, configured_variables=None, configured_variable_name=None, variable_class=None, **kwargs) VariableBase#

Add a variable to the dataset.

Parameters:
  • var_name

  • configured_variables

  • variable_class

  • kwargs

Returns:

label(fields=None, separator=' | ', lowercase=True, num_to_str=True) str#

Return a label of the data set. :param fields: The attribute names for the label. :param separator: A separator between two attributes. :param lowercase: Show lowercase letters only. :return: label

config(logging: bool = True, **kwargs) None#

Configure the attributes of the dataset.

Parameters:
  • logging – Show logging if True.

  • kwargs

Returns:

class DatasetSourced(dt_fr: datetime = None, dt_to: datetime = None, name: str = '', visual: str = 'off', label_fields: list = ('name', 'kind'), **kwargs)#
Attributes:
data_root_dir

Methods

VariableModel

add_variable(var_name[, ...])

Add a variable to the dataset.

check_data_files([load_mode])

Check the existing of the data files before loading the data, depending on the loading mode (load_mode).

config([logging])

Configure the attributes of the dataset.

exist(var_name)

label([fields, separator, lowercase, num_to_str])

Return a label of the data set.

open_dialog([initial_file_dir, data_file_num])

Open a dialog to select the data files.

search_data_files([initial_file_dir, ...])

Search the data files by the input pattern in the file name.

time_filter_by_range([var_datetime, ...])

Clip the times.

add_attr

attrs_to_dict

clone_variables

get_time_ind

get_variable_names

items

keys

list_all_variables

register_method

remove_variable

time_filter_by_inds

search_data_files(initial_file_dir=None, search_pattern='*', recursive=None, direct_append=True, allow_multiple_files=False, include_extension=True, **kwargs) list#

Search the data files by the input pattern in the file name. The search method is based on pathlib.glob. For a dataset inheritance, a wrapper can be added for a custom setting.

Parameters:
  • initial_file_dir (str or pathlib.Path, default: DatasetModel.data_root_dir.) – The initial file directory for searching.

  • search_pattern (str.) – Unix style pathname pattern, see also pathlib.glob.

  • recursive (bool, default: DatasetModel.data_search_recursive.) – Search recursively if True.

  • allow_multiple_files – Allow multiple files as a result.

  • direct_append (bool, default: True) – Append directly the searched results to Dataset.data_file_paths. If False, the file path list is returned only.

Returns:

a list of the file paths.

open_dialog(initial_file_dir: str = None, data_file_num: int = None, **kwargs)#

Open a dialog to select the data files.

check_data_files(load_mode: str = None, **kwargs)#

Check the existing of the data files before loading the data, depending on the loading mode (load_mode). This methods still needs to be improved as different datasets may have different variables as epochs. Two kinds of things can be done: 1. write a wrapper in the new dataset inheritance. 2. Add a script to recognize the epoch variables.

time_filter_by_range(var_datetime=None, var_datetime_name=None, var_names=None)#

Clip the times. :param var_datetime: :param var_datetime_name: :return:

VariableModel#

alias of VariableBase

add_variable(var_name: str, configured_variables=None, configured_variable_name=None, variable_class=None, **kwargs) VariableBase#

Add a variable to the dataset.

Parameters:
  • var_name

  • configured_variables

  • variable_class

  • kwargs

Returns:

config(logging: bool = True, **kwargs) None#

Configure the attributes of the dataset.

Parameters:
  • logging – Show logging if True.

  • kwargs

Returns:

label(fields=None, separator=' | ', lowercase=True, num_to_str=True) str#

Return a label of the data set. :param fields: The attribute names for the label. :param separator: A separator between two attributes. :param lowercase: Show lowercase letters only. :return: label

Variable#