API Reference¶

This document describe the API of the collectionbatchtool module. The following sections are included:

Module-level functions
The TableDataset class
The TreeDataset class
TableDataset subclasses

Module-level functions¶

apply_specify_context(collection_name, specify_user, quiet=True)¶

Set up the Specify context.

Parameters:	collection_name (str) – Name of an existing Specify collection. specify_user (str) – Username for an existing Specify user. quiet (bool, default True) – If True, no output will be written to standard output.

apply_user_settings(filepath, quiet=True)¶

Read and apply user settings in a configuration file.

Parameters:	filepath (str) – Path to the configuration file. quiet (bool, default True) – If True, no output will be written to standard output.

initiate_database(database, host, user, passwd, quiet=True)¶

Initiate the database.

Parameters:	database (str) – Name of a MySQL database. host (str) – Database host. user (str) – MySQL user name. passwd (str) – MySQL password. quiet (bool, default True) – If True, no output will be written to standard output.

query_to_dataframe(database, query)¶: Return result from a peewee SelectQuery as a pandas.DataFrame.

The TableDataset class¶

class TableDataset(model, key_columns, static_content, where_clause, frame)¶

Bases: object

Store a dataset corresponding to a database table.

model¶

peewee.BaseModel

A Specify data model corresponding to a table.

key_columns¶

dict

Key-fields and SourceID-columns for the model.

static_content¶

dict

Data to automatically inserted for the model.

where_clause¶

peewee.Expression

Condition for getting relevant data from the database.

describe_columns()¶: Return a pandas.DataFrame describing the columns in the current model.

from_csv(filepath, quiet=True, **kwargs)¶

Read dataset from a CSV file.

Parameters:	filepath (str) – File path or object. quiet (bool, default True) – If True, no output will be written to standard output. **kwargs – Arbitrary keyword arguments available in `pandas.read_csv()`.

from_database(quiet=True)¶

Read table data from the database.

Parameters:	quiet (bool, default True) – If True, no output will be written to standard output.

get_match_count(target_column, match_columns)¶

Return counts for matches and possible matches.

Parameters:	target_column (str) – Column that should have a value if any value in match_columns is not null. match_columns (str or List[str]) – Column or columns used for updating values in target_column.
Returns:	matches, possible matches
Return type:	tuple

get_mismatches(target_column, match_columns)¶

Return a pandas.Series or a pandas.DataFrame with non-matching values.

Parameters:	target_column (str) – Column that should have a value if any value in match_columns is not null. match_columns (str or List[str]) – Column or columns used for updating values in target_column.

match_database_records(match_columns, quiet=True)¶

Update primary key values for records that match database.

Parameters:	match_columns (str or List[str]) – Columns to be matched against the database. quiet (bool, default False) – If True, no output will be written to standard output.

to_csv(filepath, update_sourceid=False, drop_empty_columns=False, quiet=True, encoding='utf-8', float_format='%g', index=False, **kwargs)¶

Write dataset a comma-separated values (CSV) file.

Parameters:

filepath (str) – File path or object.
update_sourceid (bool, default False) – If True, copying ID-columns to SourceID-columns before writing to the CSV file.
drop_empty_columns (bool, default False) – Drop columns that does not contain any data.
quiet (bool, default True) – If True, no output will be written to standard output.
encoding (str, default 'utf-8') – A string representing the encoding to use in the output file.
float_format (str or None, default '%g') – Format string for floating point numbers.
index (bool, default False) – Write row names (index).
**kwargs – Arbitrary keyword arguments available in pandas.DataFrame.to_csv().

to_database(defaults=None, update_record_metadata=True, chunksize=10000, quiet=True)¶

Load a dataset into the corresponding table and update the dataset’s primary key column from the database.

Parameters:	defaults (dict) – Column name and value to insert instead of nulls. update_record_metadata (bool, default True) – If True, record metadata will be generated during import, otherwise the metadata will be loaded from the dataset. chunksize (int) – Size of chunks being uploaded. quiet (bool, default True) – If True, no output will be written to standard output.

update_database_records(columns, update_record_metadata=True, chunksize=10000, quiet=True)¶

Update records in database with matching primary key values.

Parameters:

columns (str or List[str]) – Column or columns with new values.
update_record_metadata (bool, default True) – If True, record metadata will be generated during import, otherwise the metadata will be updated from the dataset.
chunksize (int) – Size of chunks being updated; default 1000.
quiet (bool, default True) – If True, no output will be written to standard output.

update_foreign_keys(from_datasets, quiet=False)¶

Update foreign key values from a related dataset based on sourceid values.

Parameters:	from_datasets (`TableDataset` or List[`TableDataset`]) – Dataset(s) from which foreign key values will be updated. quiet (bool, default False) – If True, no output will be written to standard output.

update_sourceid(quiet=True)¶

Copy values from ID-columns to SourceID-columns.

Parameters:	quiet (bool, default True) – If True, no output will be written to standard output.

write_mapping_to_csv(filepath, quiet=True, float_format='%g', index=False, **kwargs)¶

Write ID-column mapping a comma-separated values (CSV) file.

Parameters:	filepath (str) – File path or object. quiet (bool, default True) – If True, no output will be written to standard output. float_format (str or None, default '%g') – Format string for floating point numbers. index (bool, default False) – Write row names (index). **kwargs – Arbitrary keyword arguments available in `pandas.DataFrame.to_csv()`.

all_columns¶: List containing all columns in the dataset.

database_columns¶: List with available database columns.

database_query¶: Database query for reading the data from the database.

file_columns¶: List containing only the columns that can be written to or read from a file.

frame¶: A pandas.DataFrame to hold the data.

primary_key_column¶: Name of the primary key column.

The TreeDataset class¶

class TreeDataset¶

Bases: object

A dataset corresponding to a tree table in Specify.

update_rankid_column(dataset, quiet=True)¶

Update RankID based on SourceID-column.

Parameters:	dataset (`TableDataset`) – A treedefitem-dataset from which RankID should be updated. quiet (bool, default True) – If True, no output will be written to standard output.