Datahub#
To create a DataHub object, either call the function create_datahub or
the class DataHub. The former provides an option (datahub_class)
to create based a DataHub subclass.
- create_datahub(dt_fr, dt_to, visual='off', datahub_class=None, **kwargs)#
Create a datahub object.
- Parameters:
dt_fr (datetime.datetime) – The starting time.
dt_to (datetime.datetime) – The stopping time.
visual ({'off', 'on'}, default: 'off') – If “on”, a Visual object is aggregated to the Variable object.
datahub_class (DataHub or its subclass) – If
None, create a datahub object based on the defaultDataHubclass.kwargs – Other optional keyword arguments as inputs to DataHub.
- Returns:
dh
- Return type:
DataHub object
- Example:
>>> import geospacelab.datahub as datahub >>> import datetime >>> dt_fr = datetime.datetime.strptime('20210309' + '0000', '%Y%m%d%H%M') >>> dt_to = datetime.datetime.strptime('20210309' + '2359', '%Y%m%d%H%M') >>> dh = datahub.create_datahub(dt_fr, dt_to)
- Seealso::
- class DataHub(dt_fr=None, dt_to=None, visual='off', **kwargs)#
The class DataHub manage a set of datasets from various data sources.
- Variables:
dt_fr (datetime.datetime) – The starting time.
dt_to (datetime.datetime) – The ending time.
visual (str, {'off', 'on'}) – If “on”, a Visual object will be aggregated to the Variable object.
datasets (dict) – A dict stores the datasets added (
add_dataset()) or docked (dock()) to the datahub.variables (dict) – A dict stores the variables assigned from their aggregated datasets. Typically used for the dashboards or the I/O configuration.
Usage:
Create a DataHub object:
- Example:
Import the datahub module and create a DataHub object
>>> import geospacelab.datahub as datahub >>> import datetime >>> dt_fr = datetime.datetime.strptime('20210309' + '0000', '%Y%m%d%H%M') >>> dt_to = datetime.datetime.strptime('20210309' + '2359', '%Y%m%d%H%M') >>> dh = datahub.DataHub(dt_fr, dt_to)
- seealso::
Dock a built-in dataset:
- Example:
Dock a EISCAT dataset
>>> database_name = 'madrigal' # built-in sourced database name >>> facility_name = 'eiscat' >>> site = 'UHF' # facility attributes required, check from the eiscat schedule page >>> antenna = 'UHF' >>> modulation = 'ant' >>> ds_1 = dh.dock(datasource_contents=[database_name, facility_name], site=site, antenna=antenna, modulation=modulation, data_file_type='eiscat-hdf5')
- seealso::
How to know
datasource_contentsand required inputs?
- Example:
List the buit-in data sources
>>> dh.list_sourced_datasets()
- seealso::
- Attributes:
- host_dataset
Methods
add_dataset(*args[, kind, dataset_class])Add one or more datasets, which can be a "temporary" or "user-defined" dataset.
assign_variable(var_name[, dataset, ...])Assign a variable to DataHub.variables from the docked or added dataset.
dock(datasource_contents, **kwargs)Dock a built-in or registered dataset.
get_current_dataset([index])Get the current dataset.
get_variable(var_name[, dataset, dataset_index])To get a variable from the docked or added dataset.
List all the assigned variables that have been docked or added to the datahub
List all the datasets that have been docked or added to the datahub
List all the bult-in data sources this package
set_current_dataset([dataset, dataset_index])Set the current dataset.
save_to_cdf
save_to_pickle
- __init__(dt_fr=None, dt_to=None, visual='off', **kwargs)#
- Parameters:
dt_fr (datetime.datetime) – The starting time.
dt_to (datetime.datetime) – The stopping time.
visual ({'off', 'on'}, default: 'off') – If “on”, a Visual object is aggregated to the Variable object.
kwargs – other keyword arguments forwarded to the inherited class.
- dock(datasource_contents, **kwargs) DatasetSourced#
Dock a built-in or registered dataset.
- Parameters:
datasource_contents (list) – the contents that required for docking a sourced dataset. To look up the sourced dataset and the associated contents, call
list_sourced_datasets().dt_fr (datetime.datetime) – starting time, optional, use datahub.dt_fr if not specified.
dt_to (datetime.datetime) – stopping time, optional, use datahub.dt_to if not specified.
visual (str) – variable attribute, use datahub.visual if not specified.
- Returns:
dataset- Return type:
Datasetobject- Seealso::
- Note::
The difference between the methods
dock()andadd_dataset()is thatdock()adds a built-in data source, whileadd_dataset()add a temporary or user-defined dataset, which is not initially included in the package.
- add_dataset(*args, kind='temporary', dataset_class=None, **kwargs) DatasetBase#
Add one or more datasets, which can be a “temporary” or “user-defined” dataset.
- Parameters:
args (list(dataset)) – A list of the datasets.
kind ({'temporary', 'user-defined'}, default: 'temporary') – The type of a dataset. If temporary, a new dataset will be created from the DatasetModel.
dataset_class (DatasetModel or its subclass) – If None, the default class is DatasetModel. Used when
kind='temporary'.kwargs – Other keyword arguments forwarded to
dataset_class
- Returns:
None
- Seealso::
- set_current_dataset(dataset=None, dataset_index=None)#
Set the current dataset.
- Parameters:
dataset – A Dataset object.
dataset_index (int) – The index of the dataset in
.datasets.
- Return type:
None
- get_current_dataset(index=False)#
Get the current dataset.
- Parameters:
index (bool) – The index of the dataset.
- Returns:
If
index=False, dataset object, else dataset_index.
- get_variable(var_name, dataset=None, dataset_index=None) VariableBase#
To get a variable from the docked or added dataset.
- Parameters:
var_name (str) – the name of the queried variable
dataset (DatasetBase object) – the dataset storing the queried variable.
dataset_index (int) – the index of the dataset in datahub.datasets. if both dataset or dataset_index are not specified, the function will get the variable from the current dataset.
- Returns:
var
- Return type:
VariableModelobject or None- Seealso:
- Note::
Both
get_variable()andassign_variable()return a variable object assigned from a dataset. The former only returns the object, and the latter also assign the variable to theDataHub.variables.
- assign_variable(var_name, dataset=None, dataset_index=None, add_new=False, **kwargs) VariableBase#
Assign a variable to DataHub.variables from the docked or added dataset.
- Parameters:
var_name – The name of the variable
dataset – The dataset that stores the variable
dataset_index – The index of the dataset in the datahub.datasets.
add_new – if True, add the variable to the specified dataset and assign to the datahub
kwargs – other keywords to configure the attributes of the variable.
- Returns:
object of
VariableModel- Seealso::
- static list_sourced_datasets()#
List all the bult-in data sources this package
The list will be printed in the python console in a “tree” view.
- list_datasets()#
List all the datasets that have been docked or added to the datahub
The list will be printed in the console as a table
- list_assigned_variables()#
List all the assigned variables that have been docked or added to the datahub
The list will be printed in the console as a table
Dataset#
All the datasets added to DataHub are the objects of DatasetBase or its subclasses.
DatasetBase is the base class, providing the essential attributes and methods to manage a data source. See details below:
- class DatasetBase(dt_fr: datetime = None, dt_to: datetime = None, name: str = '', kind: str = '', visual: str = 'off', label_fields: list = ('name', 'kind'), **kwargs)#
A dataset is a dictionary-like object used for downloading and loading data from a data source. The items in the dataset are the variables loaded from the data files. The parameters listed below are the general attributes used for the dataset class and its inheritances.
- Variables:
name (str) – The name of the dataset.
kind (str) – The type of the dataset. ‘sourced’: the data source has been added in the package, ‘temporary’: a dataset added temporarily, ‘user-defined’: a data source defined by the user.
dt_fr (datetime.datetime or None) – the starting time of the data records.
dt_fr – the starting time of the data records.
visual (str, {"on", "off"}) – If “on”, append the Visual object to the Variable object.
label_fields (list) – A list of strings, indicating the fields used for generating the dataset label.
Methods
add_variable(var_name[, ...])Add a variable to the dataset.
config([logging])Configure the attributes of the dataset.
exist(var_name)label([fields, separator, lowercase, num_to_str])Return a label of the data set.
add_attr
attrs_to_dict
clone_variables
get_variable_names
items
keys
list_all_variables
register_method
remove_variable
- VariableModel#
alias of
VariableBase
- add_variable(var_name: str, configured_variables=None, configured_variable_name=None, variable_class=None, **kwargs) VariableBase#
Add a variable to the dataset.
- Parameters:
var_name
configured_variables
variable_class
kwargs
- Returns:
- label(fields=None, separator=' | ', lowercase=True, num_to_str=True) str#
Return a label of the data set. :param fields: The attribute names for the label. :param separator: A separator between two attributes. :param lowercase: Show lowercase letters only. :return: label
- config(logging: bool = True, **kwargs) None#
Configure the attributes of the dataset.
- Parameters:
logging – Show logging if True.
kwargs
- Returns:
- class DatasetSourced(dt_fr: datetime = None, dt_to: datetime = None, name: str = '', visual: str = 'off', label_fields: list = ('name', 'kind'), **kwargs)#
- Attributes:
- data_root_dir
Methods
add_variable(var_name[, ...])Add a variable to the dataset.
check_data_files([load_mode])Check the existing of the data files before loading the data, depending on the loading mode (
load_mode).config([logging])Configure the attributes of the dataset.
exist(var_name)label([fields, separator, lowercase, num_to_str])Return a label of the data set.
open_dialog([initial_file_dir, data_file_num])Open a dialog to select the data files.
search_data_files([initial_file_dir, ...])Search the data files by the input pattern in the file name.
time_filter_by_range([var_datetime, ...])Clip the times.
add_attr
attrs_to_dict
clone_variables
get_time_ind
get_variable_names
items
keys
list_all_variables
register_method
remove_variable
time_filter_by_inds
- search_data_files(initial_file_dir=None, search_pattern='*', recursive=None, direct_append=True, allow_multiple_files=False, include_extension=True, **kwargs) list#
Search the data files by the input pattern in the file name. The search method is based on pathlib.glob. For a dataset inheritance, a wrapper can be added for a custom setting.
- Parameters:
initial_file_dir (str or pathlib.Path, default: DatasetModel.data_root_dir.) – The initial file directory for searching.
search_pattern (str.) – Unix style pathname pattern, see also pathlib.glob.
recursive (bool, default: DatasetModel.data_search_recursive.) – Search recursively if True.
allow_multiple_files – Allow multiple files as a result.
direct_append (bool, default: True) – Append directly the searched results to Dataset.data_file_paths. If False, the file path list is returned only.
- Returns:
a list of the file paths.
- open_dialog(initial_file_dir: str = None, data_file_num: int = None, **kwargs)#
Open a dialog to select the data files.
- check_data_files(load_mode: str = None, **kwargs)#
Check the existing of the data files before loading the data, depending on the loading mode (
load_mode). This methods still needs to be improved as different datasets may have different variables as epochs. Two kinds of things can be done: 1. write a wrapper in the new dataset inheritance. 2. Add a script to recognize the epoch variables.
- time_filter_by_range(var_datetime=None, var_datetime_name=None, var_names=None)#
Clip the times. :param var_datetime: :param var_datetime_name: :return:
- VariableModel#
alias of
VariableBase
- add_variable(var_name: str, configured_variables=None, configured_variable_name=None, variable_class=None, **kwargs) VariableBase#
Add a variable to the dataset.
- Parameters:
var_name
configured_variables
variable_class
kwargs
- Returns:
- config(logging: bool = True, **kwargs) None#
Configure the attributes of the dataset.
- Parameters:
logging – Show logging if True.
kwargs
- Returns:
- label(fields=None, separator=' | ', lowercase=True, num_to_str=True) str#
Return a label of the data set. :param fields: The attribute names for the label. :param separator: A separator between two attributes. :param lowercase: Show lowercase letters only. :return: label