identify_sequence_pnca

Function identify_sequence_pnca 

Source
pub fn identify_sequence_pnca(query: &[u8]) -> Vec<SeqIdHit>
Expand description

Database is REF_PNCA: the pncA CDS plus a 50bp upstream promoter flank for each M. tuberculosis complex member with a distinct sequence (H37Rv Rv2043c, bovis AF2122/97, canettii — see res/sequences/sequences.toml for the rest, including the bovis BCG Pasteur/africanum/mungi/orygis references that were tried and dropped as exact duplicates of one of these three), fetched from NCBI at build time (via fetch_sequences_from_toml() in build.rs).

Like super::rrl::identify_sequence_rrl_ntm, this aligns the query against every reference in the database (forward and reverse-complement) and returns one SeqIdHit per reference, sorted by identity descending, so the caller can compare how well the read matches each member of the complex rather than assuming it’s H37Rv.