mlcolvar.io.create_dataset_from_files

class mlcolvar.io.create_dataset_from_files(file_names: list | str, folder: str = None, create_labels: bool = None, load_args: List[dict] = None, filter_args: dict = None, modifier_function=None, return_dataframe: bool = False, verbose: bool = True, **kwargs)[source]

Bases:

Initialize a dataset from (a list of) files. Suitable for supervised/unsupervised tasks.

Parameters:
  • file_names (list) – Names of files from which import the data

  • folder (str, optional) – Common path for the files to be imported, by default None. If set, filenames become ‘folder/file_name’.

  • create_labels (bool, optional) – Assign a label to each file, default True if more than a file is given, otherwise False

  • load_args (list[dict], optional) – List of dictionaries with the arguments passed to load_dataframe function for each file (keys: start,stop,stride and pandas.read_csv options), by default None

  • filter_args (dict, optional) – Dictionary of arguments which are passed to df.filter() to select descriptors (keys: items, like, regex), by default None Note that ‘time’ and ‘*.bias’ columns are always discarded.

  • return_dataframe (bool, optional) – Return also the imported Pandas dataframe for convenience, by default False

  • modifier_function (function, optional) – Function to be applied to the input data, by default None.

  • verbose (bool, optional) – Print info on the datasets, by default True

  • kwargs (optional) – args passed to mlcolvar.io.load_dataframe

Returns:

  • torch.Dataset – Torch labeled dataset of the given data

  • optional, pandas.Dataframe – Pandas dataframe of the given data #TODO improve

See also

mlcolvar.io.load_dataframe

Function that is used to load the files

__init__(**kwargs)