autometa.config package

Submodules

autometa.config.databases module

This file contains the Databases class responsible for configuration handling of Autometa Databases.

class autometa.config.databases.Databases(config=<configparser.ConfigParser object>, dryrun=False, nproc=2, update=False)

Bases: object

Database class containing methods to allow downloading/formatting/updating Autometa database dependencies.

Parameters:
  • config (config.ConfigParser) – Config containing database dependency information. (the default is DEFAULT_CONFIG).
  • dryrun (bool) – Run through database checking without performing downloads/formatting (the default is False).
  • nproc (int) – Number of processors to use to perform database formatting. (the default is mp.cpu_count()).
  • update (bool) – Overwrite existing databases with more up-to-date database files. (the default is False).
ncbi_dir

</path/to/databases/markers> SECTIONS : dict keys are sections respective to database config sections and values are options within the sections.

Type:str </path/to/databases/ncbi> markers_dir : str
SECTIONS = {'markers': ['bacteria_single_copy', 'bacteria_single_copy_cutoffs', 'archaea_single_copy', 'archaea_single_copy_cutoffs'], 'ncbi': ['nodes', 'names', 'merged', 'accession2taxid', 'nr']}
compare_checksums(section: str = None) → Dict[str, Dict[KT, VT]]

Get all invalid database files in options from section in config. An md5 checksum comparison will be performed between the current and file’s remote md5 to ensure file integrity prior to checking the respective file as valid.

Parameters:section (str, optional Configure provided section Choices include) – ‘markers’ and ‘ncbi’. (default will download/format all database directories)
Returns:dict {section
Return type:{option, option,..}, section:{…}, ..}
configure(section: str = None, no_checksum: bool = False) → configparser.ConfigParser

Configures Autometa’s database dependencies by first checking missing dependencies then comparing checksums to ensure integrity of files.

Download and format databases for all options in each section.

This will only perform the download and formatting if self.dryrun is False. This will update out-of-date databases if self.update is True.

Parameters:section (str, optional Configure provided section. Choices include) – ‘markers’ and ‘ncbi’. (default will download/format all database directories) no_checksum : bool, optional Do not perform checksum comparisons (Default is False).
Returns:databases sections.
Return type:configparser.ConfigParser config with updated options in respective
Raises:ValueError Provided `section` does not match ‘ncbi’ or ‘markers’. – ConnectionError A connection issue occurred when connecting to NCBI or GitHub.
download_markers(options: Iterable[T_co]) → None

Download markers database files and amend user config to reflect this.

Parameters:options (iterable) – iterable containing options in ‘markers’ section to download.
Returns:Will update provided options in self.config.
Return type:NoneType
Raises:ConnectionError – marker file download failed.
download_missing(section: str = None) → None

Download missing Autometa database dependencies from provided section. If no section is provided will check all sections.

Parameters:section (str, optional) – Section to check for missing database files (the default is None). Choices include ‘ncbi’ and ‘markers’.
Returns:Will update provided section in self.config.
Return type:NoneType
Raises:ValueError – Provided section does not match ‘ncbi’ and ‘markers’.
download_ncbi_files(options: Iterable[T_co]) → None

Download NCBI database files.

Parameters:

options (iterable) – iterable containing options in ‘ncbi’ section to download.

Returns:

Will update provided options in self.config.

Return type:

NoneType

Raises:
  • subprocess.CalledProcessError – NCBI file download with rsync failed.
  • ConnectionError – NCBI file checksums do not match after file transfer.
extract_taxdump() → None

Extract autometa required files from ncbi taxdump.tar.gz archive into ncbi databases directory and update user config with extracted paths.

This only extracts nodes.dmp, names.dmp and merged.dmp from taxdump.tar.gz if the files do not already exist. If update was originally supplied as True to the Databases instance, then the previous files will be replaced by the new taxdump files.

After successful extraction of the files, a checksum will be written of the archive for future checking.

Returns:Will update self.config section ncbi with options ‘nodes’, ‘names’,’merged’
Return type:NoneType
fix_invalid_checksums(section: str = None) → None

Download/Update/Format databases where checksums are out-of-date.

Parameters:section (str, optional) – Configure provided section. Choices include ‘markers’ and ‘ncbi’. (default will download/format all database directories)
Returns:Will update provided options in self.config.
Return type:NoneType
Raises:ConnectionError – Failed to connect to section host site.
format_nr() → None

Construct a diamond formatted database (nr.dmnd) from nr option in ncbi section in user config.

NOTE: The checksum ‘nr.dmnd.md5’ will only be generated if nr.dmnd construction is successful. If the provided nr option in ncbi is ‘nr.gz’ the database will be removed after successful database formatting.

Returns:config updated option:’nr’ in section:’ncbi’.
Return type:NoneType
get_missing(section: str = None) → Dict[str, Dict[KT, VT]]

Get all missing database files in options from sections in config.

Parameters:section (str, optional) – Configure provided section. Choices include ‘markers’ and ‘ncbi’. (default will download/format all database directories)
Returns:{section:{option, option,…}, section:{…}, …}
Return type:dict
get_remote_checksum(section: str, option: str) → str
Get the checksum from provided section respective to option in

self.config.

section : str
section to retrieve for checksums section. Choices include: ‘ncbi’ and ‘markers’.
option : str
option in checksums section corresponding to the section checksum file.
str
checksum of remote md5 file. e.g. ‘hash filename

ValueError
‘section’ must be ‘ncbi’ or ‘markers’
ConnectionError
No internet connection available.
ConnectionError
Failed to connect to host for provided option.
internet_is_connected(host: str = '8.8.8.8', port: int = 53, timeout: int = 2) → bool
press_hmms() → None

hmmpress markers hmm database files.

Returns:
Return type:NoneType
satisfied(section: str = None, compare_checksums: bool = False) → bool

Determines whether all database dependencies are satisfied.

Parameters:
  • section (str) – section to retrieve for checksums section. Choices include: ‘ncbi’ and ‘markers’.
  • compare_checksums (bool, optional) – Also check if database information is up-to-date with current hosted databases. (default is False).
Returns:

True if all database dependencies are satisfied, otherwise False.

Return type:

bool

autometa.config.databases.main()

autometa.config.environ module

Configuration handling for Autometa environment.

autometa.config.environ.bedtools()

Get bedtools version.

Returns:version of bedtools
Return type:str
autometa.config.environ.bowtie2()

Get bowtie2 version.

Returns:version of bowtie2
Return type:str
autometa.config.environ.configure(config: configparser.ConfigParser) → Tuple[configparser.ConfigParser, bool]

Checks executable dependencies necessary to run autometa. Will update config with executable dependencies with details: 1. presence/absence of dependency and its location 2. versions

Parameters:config (configparser.ConfigParser) – Description of parameter config.
Returns:(config, satisfied) config updated with executables details Details: 1. location of executable 2. version of executable config : configparser.ConfigParser satisfied : bool
Return type:2-tuple
autometa.config.environ.diamond()

Get diamond version.

Returns:version of diamond
Return type:str
autometa.config.environ.find_executables()

Retrieves executable file paths by looking in Autometa dependent executables.

Returns:{executable:</path/to/executable>, …}
Return type:dict
autometa.config.environ.get_versions(program: str = None) → Union[Dict[str, str], str]

Retrieve versions from all required executable dependencies. If program is provided will only return version for program.

See: https://stackoverflow.com/a/834451/12671809

Parameters:

program (str, optional) – the program to retrieve the version, by default None

Returns:

if program is None: dict - {program:version, …} if program: str - version

Return type:

dict or str

Raises:
  • ValueErrorprogram is not a string
  • KeyErrorprogram is not an executable dependency.
autometa.config.environ.hmmpress()

Get hmmpress version.

Returns:version of hmmpress
Return type:str
autometa.config.environ.hmmscan()

Get hmmscan version.

Returns:version of hmmscan
Return type:str
autometa.config.environ.hmmsearch()

Get hmmsearch version.

Returns:version of hmmsearch
Return type:str
autometa.config.environ.prodigal()

Get prodigal version.

Returns:version of prodigal
Return type:str
autometa.config.environ.samtools()

Get samtools version.

Returns:version of samtools
Return type:str

autometa.config.utilities module

autometa.config.utilities.get_config(fpath: str) → configparser.ConfigParser

Load the config provided at fpath.

Parameters:fpath (str) – </path/to/file.config>
Returns:interpolated config object parsed from fpath.
Return type:config.ConfigParser
Raises:FileNotFoundError – Provided fpath does not exist.
autometa.config.utilities.main()
autometa.config.utilities.parse_args(fpath: str = None) → argparse.Namespace

Generate argparse namespace (args) from config file.

Parameters:fpath (str) – </path/to/file.config> (default is DEFAULT_CONFIG in autometa.config)
Returns:namespace typical to parser.parse_args() method from argparse
Return type:argparse.Namespace
Raises:FileNotFoundError – provided fpath does not exist.
autometa.config.utilities.put_config(config: configparser.ConfigParser, out: str) → None

Writes config to out and updates checkpoints checksum.

Parameters:
  • config (config.ConfigParser) – configuration containing user provided parameters and files information.
  • out (str) – </path/to/output/file.config>
Returns:

Return type:

NoneType

autometa.config.utilities.set_home_dir() → str

Set the home_dir in autometa’s default configuration (default.config) based on autometa’s current location. If the home_dir variable is already set, then this will be used as the home_dir location.

Returns:</path/to/package/autometa>
Return type:str
autometa.config.utilities.update_config(section: str, option: str, value: str, fpath: str = '/home/docs/checkouts/readthedocs.org/user_builds/sidd-autometa/checkouts/latest/autometa/config/default.config') → None

Update fpath in section for option with value.

Parameters:
  • fpath (str) – </path/to/file.config>
  • section (str) – section header to update within fpath.
  • option (str) – option to update within section.
  • value (str) – value to update option.
Returns:

Return type:

NoneType

Module contents