mlcolvar.io.create_dataset_from_files¶

class mlcolvar.io.create_dataset_from_files(file_names: list | str, folder: str = None, create_labels: bool = None, load_args: List[dict] = None, filter_args: dict = None, modifier_function=None, return_dataframe: bool = False, verbose: bool = True, **kwargs)[source]¶

Bases:

Initialize a dataset from (a list of) files. Suitable for supervised/unsupervised tasks.

Parameters:

file_names (list) – Names of files from which import the data
folder (str, optional) – Common path for the files to be imported, by default None. If set, filenames become ‘folder/file_name’.
create_labels (bool, optional) – Assign a label to each file, default True if more than a file is given, otherwise False
load_args (list[dict], optional) – List of dictionaries with the arguments passed to load_dataframe function for each file (keys: start,stop,stride and pandas.read_csv options), by default None
filter_args (dict, optional) – Dictionary of arguments which are passed to df.filter() to select descriptors (keys: items, like, regex), by default None Note that ‘time’ and ‘*.bias’ columns are always discarded.
return_dataframe (bool, optional) – Return also the imported Pandas dataframe for convenience, by default False
modifier_function (function, optional) – Function to be applied to the input data, by default None.
verbose (bool, optional) – Print info on the datasets, by default True
kwargs (optional) – args passed to mlcolvar.io.load_dataframe

Returns:

torch.Dataset – Torch labeled dataset of the given data
optional, pandas.Dataframe – Pandas dataframe of the given data #TODO improve