Expand description
Sequence-based species identification for Mycobacteriaceae.
§Generation of species identification databases
The database sequences are flagged as type material using the
INSD Collaboration type_material qualifier.
Type material ties formal species names to physical specimens (culture collections for prokaryotes,
museum or herbarium specimens for eukaryotes), as annotated in the
NCBI Taxonomy Database.
See fn fetch_myco_sequences() in build.rs for details on how the sequences were fetched from NCBI at build time.
myco_erm41.fasta is generated at build time but unused; erm41 identification uses
per-subspecies references (erm41_abscessus_ATCC_19977.fasta, erm41_bolletii_CIP_108541.fasta, erm41_massiliense_CCUG_48898.fasta) instead.
Re-exports§
pub use batch::SampleSusceptibilityRecord;
Modules§
Structs§
- Ab1Channels
- Parsed channel intensity data from an AB1 chromatogram.
- Erm41
View State - Chromatogram display parameters for the erm(41) region.
- Gapped
Alignment - Alignment result from
align_to_ref: gapped strings plus reference start position. - RrlNtm
View State - Chromatogram display parameters for the rrl / NTM macrolide-resistance region.
- SeqData
- Top-level result for a processed AB1 read.
- SeqId
Hit - Best-hit result from aligning an AB1 read against the reference sequences.
- Susceptibility
Calls - Susceptibility calls derived from AB1 capillary sequencing, keyed by gene target.
Constants§
- ACC_
GASTRI 🔒 - ACC_
KANSASII 🔒 - ACC_
MARINUM 🔒 - ACC_
ULCERANS 🔒 - DESC_
ABSCESSUS - DESC_
BOLLETII - DESC_
MASSILIENSE - ERM41_
ANCHOR_ 🔒L - ERM41_
ANCHOR_ 🔒R - ERM41_
FWD_ 🔒END - ERM41_
FWD_ 🔒START - KANSASII_
GASTRI_ 🔒ACCS - MARINUM_
ULCERANS_ 🔒ACCS - MIN_
PNCA_ 🔒REF_ LEN - MIN_
RPOB_ 🔒REF_ LEN - MIN_
RRL_ 🔒REF_ LEN - MIN_
RRS_ 🔒REF_ LEN - MIN_
SEQ_ ID_ IDENTITY - PDF_
COL_ 🔒HEADERS - PDF_
COL_ 🔒X - PDF_
MARGIN_ 🔒B - PDF_
MARGIN_ 🔒L - PDF_
MARGIN_ 🔒T - PDF_
PAGE_ 🔒H - PDF_
PAGE_ 🔒W - PDF_
ROW_ 🔒H - PDF_
TABLE_ 🔒W - PNCA_
FWD_ 🔒END - PNCA_
FWD_ 🔒START - REF_
ERM41_ 🔒ABSCESSUS - REF_
ERM41_ 🔒BOLLETII - REF_
ERM41_ 🔒MASSILENSE - REF_
MYCO_ 🔒HSP65 - hsp65 / groEL2 reference sequences — Mycobacteriaceae type strains, fetched from NCBI at build time.
- REF_
MYCO_ 🔒RPOB - rpoB reference sequences — Mycobacteriaceae type strains, fetched from NCBI at build time.
- REF_
MYCO_ 🔒RRL - 23S rRNA (rrl) reference sequences — Mycobacteriaceae type strains, fetched from NCBI at build time.
- REF_
MYCO_ 🔒RRS - 16S rRNA (rrs) reference sequences — Mycobacteriaceae type strains, fetched from NCBI at build time.
- REF_
PNCA 🔒 - pncA CDS + 50bp upstream promoter flank for each M. tuberculosis complex member with a
distinct reference sequence, fetched from NCBI at build time (see
pncamodule docs). Concatenated into one multi-FASTA soidentify_sequence_pnca()can search all of them viaparse_multi_fasta, the same wayidentify_sequence_rrl_ntm()searchesREF_MYCO_RRL. - RRL_
ANCHOR_ 🔒L - RRL_
ANCHOR_ 🔒R
Functions§
- align_
to_ ref - Align
query(Sanger read) againstreference(gene sequence) using semiglobal Smith-Waterman (free reference end-gaps, full query placed within reference). - base_
at_ ref_ pos - Return the query base at a given reference position, or
Noneif the position is outside the aligned region or the query has a deletion ('-') there. - build_
report_ pdf - Build a landscape A4 PDF report from AB1 scan records. Filtered to gene-identified
records with identity ≥
MIN_SEQ_ID_IDENTITY, same as the CSV output, and to samples no older thanreport_max_age_days(seeTBConfig::report_max_age_days). - dedup_
substring_ 🔒same_ desc - Within each
descriptiongroup, remove entries whose sequence (uppercased) is a contiguous substring of a longer entry that shares the same description. Longer entries survive; the shorter entries are redundant for alignment purposes because the aligner will find the same best position inside the longer reference. - format_
pairwise_ 🔒alignment - parse_
ab1_ quality - Tries edited quality scores (PCON tag 2) first, falling back to raw (PCON tag 1). Each byte is a Phred quality score corresponding to the base at the same index in PBAS.
- parse_
ab1_ sequence - Tries the edited basecalls (PBAS tag 2) first, falling back to raw basecalls (PBAS tag 1).
- parse_
fasta_ 🔒seq - Parse a FASTA string, returning just the sequence bytes (ignores header).
- parse_
multi_ 🔒fasta - Parse a multi-FASTA string into
(accession, description, sequence)tuples. - pdf_
current_ 🔒date - pdf_
days_ 🔒to_ ymd - pdf_
hline 🔒 - pdf_sus 🔒
- pdf_
truncate 🔒 - pdf_
write_ 🔒row - reverse_
complement - scan_
window 🔒 - trim_
start_ end - Trim a basecall sequence to the amplicon region defined by a primer pair.
- trim_
to_ min_ quality - Trim leading and trailing low-quality bases using a sliding-window average.