User Manual¶
Data Manager (geospacelab.datahub
)¶
Overview
The module datahub is the data manager in GeospaceLab, including three class-based core components:
DataHub
manages a set of datasets docked or added to the datahub.Dataset
manages a set of variables loaded from a data source.Variable
records the value, error, and various attributes(e.g., name, label, unit, depends, ndim, …) of a variable.
Datahub¶
To create a DataHub object, either call the function create_datahub
or
the class DataHub
. The former provides an option (datahub_class
)
to create based a DataHub subclass.
- create_datahub(dt_fr, dt_to, visual='off', datahub_class=None, **kwargs)¶
Create a datahub object.
- Parameters
dt_fr (datetime.datetime) – The starting time.
dt_to (datetime.datetime) – The stopping time.
visual ({'off', 'on'}, default: 'off') – If “on”, a Visual object is aggregated to the Variable object.
datahub_class (DataHub or its subclass) – If
None
, create a datahub object based on the defaultDataHub
class.kwargs – Other optional keyword arguments as inputs to DataHub.
- Returns
dh
- Return type
DataHub object
- Example
>>> import geospacelab.datahub as datahub >>> import datetime >>> dt_fr = datetime.datetime.strptime('20210309' + '0000', '%Y%m%d%H%M') >>> dt_to = datetime.datetime.strptime('20210309' + '2359', '%Y%m%d%H%M') >>> dh = datahub.create_datahub(dt_fr, dt_to)
- Seealso:
- class DataHub(dt_fr=None, dt_to=None, visual='off', **kwargs)¶
The class DataHub manage a set of datasets from various data sources.
- Variables
dt_fr (datetime.datetime) – The starting time.
dt_to (datetime.datetime) – The ending time.
visual (str, {'off', 'on'}) – If “on”, a Visual object will be aggregated to the Variable object.
datasets (dict) – A dict stores the datasets added (
add_dataset()
) or docked (dock()
) to the datahub.variables (dict) – A dict stores the variables assigned from their aggregated datasets. Typically used for the dashboards or the I/O configuration.
Usage:
Create a DataHub object:
- Example
Import the datahub module and create a DataHub object
>>> import geospacelab.datahub as datahub >>> import datetime >>> dt_fr = datetime.datetime.strptime('20210309' + '0000', '%Y%m%d%H%M') >>> dt_to = datetime.datetime.strptime('20210309' + '2359', '%Y%m%d%H%M') >>> dh = datahub.DataHub(dt_fr, dt_to)
- seealso:
Dock a built-in dataset:
- Example
Dock a EISCAT dataset
>>> database_name = 'madrigal' # built-in sourced database name >>> facility_name = 'eiscat' >>> site = 'UHF' # facility attributes required, check from the eiscat schedule page >>> antenna = 'UHF' >>> modulation = 'ant' >>> ds_1 = dh.dock(datasource_contents=[database_name, facility_name], site=site, antenna=antenna, modulation=modulation, data_file_type='eiscat-hdf5')
- seealso:
How to know
datasource_contents
and required inputs?
- Example
List the buit-in data sources
>>> dh.list_sourced_datasets()
- seealso:
- __init__(dt_fr=None, dt_to=None, visual='off', **kwargs)¶
- Parameters
dt_fr (datetime.datetime) – The starting time.
dt_to (datetime.datetime) – The stopping time.
visual ({'off', 'on'}, default: 'off') – If “on”, a Visual object is aggregated to the Variable object.
kwargs – other keyword arguments forwarded to the inherited class.
- dock(datasource_contents, **kwargs) geospacelab.datahub.__dataset_base__.DatasetSourced ¶
Dock a built-in or registered dataset.
- Parameters
datasource_contents (list) – the contents that required for docking a sourced dataset. To look up the sourced dataset and the associated contents, call
list_sourced_datasets()
.dt_fr (datetime.datetime) – starting time, optional, use datahub.dt_fr if not specified.
dt_to (datetime.datetime) – stopping time, optional, use datahub.dt_to if not specified.
visual (str) – variable attribute, use datahub.visual if not specified.
- Returns
dataset
- Return type
Dataset
object- Seealso:
- Note:
The difference between the methods
dock()
andadd_dataset()
is thatdock()
adds a built-in data source, whileadd_dataset()
add a temporary or user-defined dataset, which is not initially included in the package.
- add_dataset(*args, kind='temporary', dataset_class=None, **kwargs) geospacelab.datahub.__dataset_base__.DatasetBase ¶
Add one or more datasets, which can be a “temporary” or “user-defined” dataset.
- Parameters
args (list(dataset)) – A list of the datasets.
kind ({'temporary', 'user-defined'}, default: 'temporary') – The type of a dataset. If temporary, a new dataset will be created from the DatasetModel.
dataset_class (DatasetModel or its subclass) – If None, the default class is DatasetModel. Used when
kind='temporary'
.kwargs – Other keyword arguments forwarded to
dataset_class
- Returns
None
- Seealso:
- set_current_dataset(dataset=None, dataset_index=None)¶
Set the current dataset.
- Parameters
dataset – A Dataset object.
dataset_index (int) – The index of the dataset in
.datasets
.
- Return type
None
- get_current_dataset(index=False)¶
Get the current dataset.
- Parameters
index (bool) – The index of the dataset.
- Returns
If
index=False
, dataset object, else dataset_index.
- get_variable(var_name, dataset=None, dataset_index=None) geospacelab.datahub.__variable_base__.VariableBase ¶
To get a variable from the docked or added dataset.
- Parameters
var_name (str) – the name of the queried variable
dataset (DatasetBase object) – the dataset storing the queried variable.
dataset_index (int) – the index of the dataset in datahub.datasets. if both dataset or dataset_index are not specified, the function will get the variable from the current dataset.
- Returns
var
- Return type
VariableModel
object or None- Seealso
- Note:
Both
get_variable()
andassign_variable()
return a variable object assigned from a dataset. The former only returns the object, and the latter also assign the variable to theDataHub.variables
.
- assign_variable(var_name, dataset=None, dataset_index=None, add_new=False, **kwargs) geospacelab.datahub.__variable_base__.VariableBase ¶
Assign a variable to DataHub.variables from the docked or added dataset.
- Parameters
var_name – The name of the variable
dataset – The dataset that stores the variable
dataset_index – The index of the dataset in the datahub.datasets.
add_new – if True, add the variable to the specified dataset and assign to the datahub
kwargs – other keywords to configure the attributes of the variable.
- Returns
object of
VariableModel
- Seealso:
- static list_sourced_datasets()¶
List all the bult-in data sources this package
The list will be printed in the python console in a “tree” view.
- list_datasets()¶
List all the datasets that have been docked or added to the datahub
The list will be printed in the console as a table
- list_assigned_variables()¶
List all the assigned variables that have been docked or added to the datahub
The list will be printed in the console as a table
Dataset¶
All the datasets added to DataHub
are the objects of DatasetBase
or its subclasses.
DatasetBase
is the base class, providing the essential attributes and methods to manage a data source. See details below:
- class DatasetBase(dt_fr: Optional[datetime.datetime] = None, dt_to: Optional[datetime.datetime] = None, name: str = '', kind: str = '', visual: str = 'off', label_fields: list = ('name', 'kind'), **kwargs)¶
A dataset is a dictionary-like object used for downloading and loading data from a data source. The items in the dataset are the variables loaded from the data files. The parameters listed below are the general attributes used for the dataset class and its inheritances.
- Variables
name (str) – The name of the dataset.
kind (str) – The type of the dataset. ‘sourced’: the data source has been added in the package, ‘temporary’: a dataset added temporarily, ‘user-defined’: a data source defined by the user.
dt_fr (datetime.datetime or None) – the starting time of the data records.
dt_fr – the starting time of the data records.
visual (str, {"on", "off"}) – If “on”, append the Visual object to the Variable object.
label_fields (list) – A list of strings, indicating the fields used for generating the dataset label.
- VariableModel¶
alias of
geospacelab.datahub.__variable_base__.VariableBase
- add_variable(var_name: str, configured_variables=None, configured_variable_name=None, variable_class=None, **kwargs) geospacelab.datahub.__variable_base__.VariableBase ¶
Add a variable to the dataset.
- Parameters
var_name –
configured_variables –
variable_class –
kwargs –
- Returns
- label(fields=None, separator=' | ', lowercase=True, num_to_str=True) str ¶
Return a label of the data set. :param fields: The attribute names for the label. :param separator: A separator between two attributes. :param lowercase: Show lowercase letters only. :return: label
- config(logging: bool = True, **kwargs) None ¶
Configure the attributes of the dataset.
- Parameters
logging – Show logging if True.
kwargs –
- Returns
- class DatasetSourced(dt_fr: Optional[datetime.datetime] = None, dt_to: Optional[datetime.datetime] = None, name: str = '', visual: str = 'off', label_fields: list = ('name', 'kind'), **kwargs)¶
- search_data_files(initial_file_dir=None, search_pattern='*', recursive=None, direct_append=True, allow_multiple_files=False, include_extension=True, **kwargs) list ¶
Search the data files by the input pattern in the file name. The search method is based on pathlib.glob. For a dataset inheritance, a wrapper can be added for a custom setting.
- Parameters
initial_file_dir (str or pathlib.Path, default: DatasetModel.data_root_dir.) – The initial file directory for searching.
search_pattern (str.) – Unix style pathname pattern, see also pathlib.glob.
recursive (bool, default: DatasetModel.data_search_recursive.) – Search recursively if True.
allow_multiple_files – Allow multiple files as a result.
direct_append (bool, default: True) – Append directly the searched results to Dataset.data_file_paths. If False, the file path list is returned only.
- Returns
a list of the file paths.
- open_dialog(initial_file_dir: Optional[str] = None, data_file_num: Optional[int] = None, **kwargs)¶
Open a dialog to select the data files.
- check_data_files(load_mode: Optional[str] = None, **kwargs)¶
Check the existing of the data files before loading the data, depending on the loading mode (
load_mode
). This methods still needs to be improved as different datasets may have different variables as epochs. Two kinds of things can be done: 1. write a wrapper in the new dataset inheritance. 2. Add a script to recognize the epoch variables.
- time_filter_by_range(var_datetime=None, var_datetime_name=None)¶
Clip the times. :param var_datetime: :param var_datetime_name: :return:
- VariableModel¶
alias of
geospacelab.datahub.__variable_base__.VariableBase
- add_variable(var_name: str, configured_variables=None, configured_variable_name=None, variable_class=None, **kwargs) geospacelab.datahub.__variable_base__.VariableBase ¶
Add a variable to the dataset.
- Parameters
var_name –
configured_variables –
variable_class –
kwargs –
- Returns
- config(logging: bool = True, **kwargs) None ¶
Configure the attributes of the dataset.
- Parameters
logging – Show logging if True.
kwargs –
- Returns
- label(fields=None, separator=' | ', lowercase=True, num_to_str=True) str ¶
Return a label of the data set. :param fields: The attribute names for the label. :param separator: A separator between two attributes. :param lowercase: Show lowercase letters only. :return: label