API Reference¶
This document describe the API of the collectionbatchtool
module. The
following sections are included:
Module-level functions¶
-
apply_specify_context
(collection_name, specify_user, quiet=True)¶ Set up the Specify context.
Parameters:
-
apply_user_settings
(filepath, quiet=True)¶ Read and apply user settings in a configuration file.
Parameters: - filepath (str) – Path to the configuration file.
- quiet (bool, default True) – If True, no output will be written to standard output.
-
initiate_database
(database, host, user, passwd, quiet=True)¶ Initiate the database.
Parameters:
-
query_to_dataframe
(database, query)¶ Return result from a peewee
SelectQuery
as apandas.DataFrame
.
The TableDataset class¶
-
class
TableDataset
(model, key_columns, static_content, where_clause, frame)¶ Bases:
object
Store a dataset corresponding to a database table.
-
model
¶ peewee.BaseModel
A Specify data model corresponding to a table.
-
key_columns
¶ dict
Key-fields and SourceID-columns for the model.
-
static_content
¶ dict
Data to automatically inserted for the model.
-
where_clause
¶ peewee.Expression
Condition for getting relevant data from the database.
-
describe_columns
()¶ Return a
pandas.DataFrame
describing the columns in the current model.
-
from_csv
(filepath, quiet=True, **kwargs)¶ Read dataset from a CSV file.
Parameters: - filepath (str) – File path or object.
- quiet (bool, default True) – If True, no output will be written to standard output.
- **kwargs – Arbitrary keyword arguments available in
pandas.read_csv()
.
-
from_database
(quiet=True)¶ Read table data from the database.
Parameters: quiet (bool, default True) – If True, no output will be written to standard output.
-
get_match_count
(target_column, match_columns)¶ Return counts for matches and possible matches.
Parameters: - target_column (str) – Column that should have a value if any value in match_columns is not null.
- match_columns (str or List[str]) – Column or columns used for updating values in target_column.
Returns: matches, possible matches
Return type:
-
get_mismatches
(target_column, match_columns)¶ Return a
pandas.Series
or apandas.DataFrame
with non-matching values.Parameters: - target_column (str) – Column that should have a value if any value in match_columns is not null.
- match_columns (str or List[str]) – Column or columns used for updating values in target_column.
-
match_database_records
(match_columns, quiet=True)¶ Update primary key values for records that match database.
Parameters: - match_columns (str or List[str]) – Columns to be matched against the database.
- quiet (bool, default False) – If True, no output will be written to standard output.
-
to_csv
(filepath, update_sourceid=False, drop_empty_columns=False, quiet=True, encoding='utf-8', float_format='%g', index=False, **kwargs)¶ Write dataset a comma-separated values (CSV) file.
Parameters: - filepath (str) – File path or object.
- update_sourceid (bool, default False) – If True, copying ID-columns to SourceID-columns before writing to the CSV file.
- drop_empty_columns (bool, default False) – Drop columns that does not contain any data.
- quiet (bool, default True) – If True, no output will be written to standard output.
- encoding (str, default 'utf-8') – A string representing the encoding to use in the output file.
- float_format (str or None, default '%g') – Format string for floating point numbers.
- index (bool, default False) – Write row names (index).
- **kwargs – Arbitrary keyword arguments available in
pandas.DataFrame.to_csv()
.
-
to_database
(defaults=None, update_record_metadata=True, chunksize=10000, quiet=True)¶ Load a dataset into the corresponding table and update the dataset’s primary key column from the database.
Parameters: - defaults (dict) – Column name and value to insert instead of nulls.
- update_record_metadata (bool, default True) – If True, record metadata will be generated during import, otherwise the metadata will be loaded from the dataset.
- chunksize (int) – Size of chunks being uploaded.
- quiet (bool, default True) – If True, no output will be written to standard output.
-
update_database_records
(columns, update_record_metadata=True, chunksize=10000, quiet=True)¶ Update records in database with matching primary key values.
Parameters: - columns (str or List[str]) – Column or columns with new values.
- update_record_metadata (bool, default True) – If True, record metadata will be generated during import, otherwise the metadata will be updated from the dataset.
- chunksize (int) – Size of chunks being updated; default 1000.
- quiet (bool, default True) – If True, no output will be written to standard output.
-
update_foreign_keys
(from_datasets, quiet=False)¶ Update foreign key values from a related dataset based on sourceid values.
Parameters: - from_datasets (
TableDataset
or List[TableDataset
]) – Dataset(s) from which foreign key values will be updated. - quiet (bool, default False) – If True, no output will be written to standard output.
- from_datasets (
-
update_sourceid
(quiet=True)¶ Copy values from ID-columns to SourceID-columns.
Parameters: quiet (bool, default True) – If True, no output will be written to standard output.
-
write_mapping_to_csv
(filepath, quiet=True, float_format='%g', index=False, **kwargs)¶ Write ID-column mapping a comma-separated values (CSV) file.
Parameters: - filepath (str) – File path or object.
- quiet (bool, default True) – If True, no output will be written to standard output.
- float_format (str or None, default '%g') – Format string for floating point numbers.
- index (bool, default False) – Write row names (index).
- **kwargs – Arbitrary keyword arguments available in
pandas.DataFrame.to_csv()
.
-
all_columns
¶ List containing all columns in the dataset.
-
database_columns
¶ List with available database columns.
-
database_query
¶ Database query for reading the data from the database.
-
file_columns
¶ List containing only the columns that can be written to or read from a file.
-
frame
¶ A
pandas.DataFrame
to hold the data.
-
primary_key_column
¶ Name of the primary key column.
-
The TreeDataset class¶
-
class
TreeDataset
¶ Bases:
object
A dataset corresponding to a tree table in Specify.
-
update_rankid_column
(dataset, quiet=True)¶ Update RankID based on SourceID-column.
Parameters: - dataset (
TableDataset
) – A treedefitem-dataset from which RankID should be updated. - quiet (bool, default True) – If True, no output will be written to standard output.
Notes
This method exists in order to update the redundant RankID-columns in
TreeDataset
dataframes.- dataset (
-
TableDataset subclasses¶
-
class
AgentDataset
¶ Bases:
collectionbatchtool.TableDataset
Dataset corresponding to the agent-table.
-
class
CollectingeventattributeDataset
¶ Bases:
collectionbatchtool.TableDataset
Dataset corresponding to the collectingeventattribute-table.
-
class
CollectingeventDataset
¶ Bases:
collectionbatchtool.TableDataset
Dataset corresponding to the collectingevent-table.
-
class
CollectionobjectattributeDataset
¶ Bases:
collectionbatchtool.TableDataset
Dataset corresponding to the collectionobjectattribute-table.
-
class
CollectionobjectDataset
¶ Bases:
collectionbatchtool.TableDataset
Dataset corresponding to the collectionobject-table.
-
class
CollectorDataset
¶ Bases:
collectionbatchtool.TableDataset
Dataset corresponding to the collector-table.
-
class
DeterminationDataset
¶ Bases:
collectionbatchtool.TableDataset
Dataset corresponding to the determination-table.
-
class
GeographyDataset
¶ Bases:
collectionbatchtool.TableDataset
,collectionbatchtool.TreeDataset
Dataset corresponding to the geography-table.
-
class
GeographytreedefitemDataset
¶ Bases:
collectionbatchtool.TableDataset
Dataset corresponding to the geographytreedefitem-table.
-
class
LocalityDataset
¶ Bases:
collectionbatchtool.TableDataset
Dataset corresponding to the locality-table.
-
class
StorageDataset
¶ Bases:
collectionbatchtool.TableDataset
,collectionbatchtool.TreeDataset
Dataset corresponding to the storage-table.
-
class
StoragetreedefitemDataset
¶ Bases:
collectionbatchtool.TableDataset
Dataset corresponding to the storagetreedefitem-table.
-
class
PreparationDataset
¶ Bases:
collectionbatchtool.TableDataset
Dataset corresponding to the preparation-table.
-
class
PreptypeDataset
¶ Bases:
collectionbatchtool.TableDataset
Dataset corresponding to the preptype-table.
-
class
TaxonDataset
¶ Bases:
collectionbatchtool.TableDataset
,collectionbatchtool.TreeDataset
Dataset corresponding to the taxon-table.
-
class
TaxontreedefitemDataset
¶ Bases:
collectionbatchtool.TableDataset
Dataset corresponding to the taxontreedefitem-table.