.center[ .vertical-center[ # Observing bacterial pathogen evolution with long read sequencing Nicholas Noll Neher Lab Biozentrum, University of Basel ] ] --- # Sequence variation encodes the spread of pathogens .center[![:scale 775](/figs/basel/ext/infection_tree_1.png)] .footnote[Images by Trevor Bedford] --- count: false # Sequence variation encodes the spread of pathogens .center[![:scale 775](/figs/basel/ext/infection_tree_2.png)] .footnote[Images by Trevor Bedford] --- count: false # Sequence variation encodes the spread of pathogens .center[![:scale 775](/figs/basel/ext/infection_tree_3b.png)] .footnote[Images by Trevor Bedford] --- count: false # Sequence variation encodes the spread of pathogens .center[![:scale 775](/figs/basel/ext/infection_tree_4b.png)] .footnote[Images by Trevor Bedford] --- count: false # Sequence variation encodes the spread of pathogens Prerequisites for epidemilogical techniques: * Evolution generates enough variation * <sub><sup>Steele Bound: $n$ leaf tree can be inferred from sequenece of $O(\log N)$ if $\mu \sim .25$<sup>1</sup></sub></sup> * <sub><sup>RNA virus mutates $\sim 10^{-5}$ per site per day. $\sim$ 1 SNP per week</sub></sup> -- * Sequencing samples enough of the population dynamics -- * Molecular substrate is static - i.e. alignable * <sub><sup>Bacteria mutates $\sim 10^{-8}$ per site per day. How static is the substrate? </sub></sup> .footnote[<sup>1</sup>.cite[Daskalakis et al. 2009]] --- # "Understood" regime: successive mutations on a static sequence .center[![:scale 700](/figs/basel/ext/align2.png)] .center[All downstream analyses require sequence alignment from which to define polymorphisms and thus the degrees of freedom under evolution] --- # Only models of mutations of static sequence .left-column[.middle[![:scale 500](/figs/basel/ext/seq_evolve.svg)]] .right-column[.middle[![:scale 500](/figs/basel/ext/muller_plots.jpg)]] .footnote[<sup>1</sup>.cite[Beneficial Mutation-Selection Balance and the Effect of Linkage on Positive Selection. Michael Desai, Daniel Fisher]] -- Theoretical understanding of * scaling of average rate of mutations accumulation on $\mu, N, s$ * coalescent theory: how dynamics are reflected in statistics of underlying tree * how to extract from data: can ![:emph](align) sequences and estimate tree -- .center[No such null models of bacterial evolution.] --- # Microbial evolution is different .middle[.center[![:scale 450](/figs/basel/ext/HGT.png)]] -- .center[Evolution of bacterial AMR doesn't fit mutational competition paradigm] --- # Bacteria evolve by horizontally sharing genes .center[![:scale 1000](/figs/basel/ext/panX_association.png)] --- count: false # Bacteria evolve by horizontally sharing genes .left-column[ .center[ .middle[ ![:scale 325](/figs/basel/carb/kleb_tree.png) ]]] .middle[ .right-column[ .center[ ![:scale 450](/figs/basel/carb/pa_vs_divergence.png) ] ] ] --- # Resolving HGT with long reads .left-column[ Reconstruct history by sequencing - Illumina reads: high coverage, short reads. - Too short to bridge repetitive elements - Fragmented assemblies - Problem! most AMR genes are flanked by repetitive/mobile elements .center[ ![:scale 350](/figs/basel/ext/bad_assembly_graph.png) ] ] .footnote[<sup>1</sup>.cite[.url[github.com/rrwick]]] -- .right-column[ ONT long reads required to resolve structural diversity .center[ ![:scale 300](/figs/basel/carb/minIon.jpg) ![:scale 400](/figs/basel/carb/canu_to_spades.png) ] ] --- # Global carbapenamase outbreak as case study. .left-column[ * Reserve antibiotics used to treat MDR bacteria. * First observed in the late 1980's * Phenotypic resistence is conferred by multiple different genes - <sub><sup>Growing public health problem.</sub></sup> - <sub><sup>Globally heterogeneous prevalence</sub></sup> * Facinating case study into deconvolving spread mediated by horizontal transfer and clonal expansion. ] .right-column[![:scale 500](/figs/basel/ext/carb_prevalence_eu.png)] --- # Long-read sequencing of Carbapenemase producing bacteria .center[![:scale 900](/figs/basel/carb/overview_table.svg)] -- .third1[ ![:scale 350](/figs/basel/carb/contigSizes.svg) ] .twoThirdsRight[ 110 carbapenemase producing bacteria in Basel over $\sim$ 7 years. * Hybrid assemblies resolve structural and nucleotide polymorphism. * Short read contigs containing AMR genes avg. 6 genes long * <sub><sup> Not enough diversity to reconstruct history </sub></sup> * Have to verify assemblies of which no refs exist. ] --- # High-quality genome assemblies .middle[ .center[ ![:scale 1000](/figs/basel/carb/errorCharacterization.png) ] ] --- # Goal: begin to enumerate structural "mutations" How do we reconstruct evolutionary history in the horizontal regime from sequencing data? * Tracking mutations on relevant genes not enough * <sup><sub> Selection over $20$ years. $\sim 1$ kB region </sup></sub>. * <sup><sub> Handful of mutations </sup></sub>. * Most AMR genes are transferred via conjugative plasmids. * <sup><sub> One-to-one correspondence? </sup></sub> * <sup><sub> Are plasmids well approximated by static sequence?</sup></sub> * <sup><sub> Correlations to ST? </sup></sub> * Many AMR genes are embedded within transposable elements. -- .center[First step must be deciphering the ![:emph](relative rates) of each polymorphic generating event.] --- # Genes as a coarse grained unit Assume most bacterial variation on clinical time-scales occurs in both gene content and order (synteny). -- Must computationally recognize orthologous gene clusters in our sample. -- .third1[ .middle[ .center[ ![:scale 360](/figs/basel/ext/aaalign.jpg) Align all ORF pairs w/ DIAMOND ] ] ] -- .third2[ .middle[ .center[ ![:scale 250](/figs/basel/ext/mcl.jpeg) MCL clustering ] ] ] -- .third3[ .middle[ .center[ ![:scale 305](/figs/basel/ext/paralogy.png) Paralogy splitting ] ] ] .footnote[.cite[Ding, W. et al. panX: pan-genome analysis and exploration ] ] --- # Syntenic alignment $\approx$ structural diversity .left-column[ .center[ ![:scale 250](/figs/basel/carb/syntenyCartoon.svg) ![:scale 250](/figs/basel/carb/syntenymatrix.svg) ] .center[ Hierarchically cluster into "structural clades" ] ] -- .right-column[ .center[ ![:scale 300](/figs/basel/carb/kpc_synteny.svg) ] ] .left-column[ * Syntenic changes resolve evolutionary relationships between plasmids * Different $bla_{KPC}$ genes are found in same context * Plasmids promiscuously shared across MLST and species ] --- # Carbapenemases have varying signatures of HGT .third1[ ![:scale 300](/figs/basel/carb/kpc_synteny.svg) ] .third2[ .center[ ![:scale 250](/figs/basel/carb/ndm_synteny.svg) ] ] .third3[ ![:scale 240](/figs/basel/carb/oxa48_synteny.svg) ] -- .block[ * $bla_{KPC}$: plasmid-bound. correlated w/ MLST and clone * $bla_{NDM}$: high transposition rate. genome integration * $bla_{OXA-48}$: high/low conjugation/transposition rate ] --- # Problems with this analysis * Sample size is just large enough to get a qualititative sense of the rates but not large enough to quantitatively measure. * Extreme sensitivity to annotation errors * Syntenic alignment not a proportional measure of evolutionary events -- e.g. inversions -- .center[The next section is very much a work in progress! Thoughts and general grumpiness are welcomed.] --- # Scaling up to a global picture Extend our dataset: * Perform the same comparison against ![:emph](all) carbapenemase carrying plasmids contained in the NCBI pathogen database. * Compare against structural outgroup to estimate transposition -- .center[$bla_{KPC}$] .center[![:scale 360](/figs/basel/carb/kpc_global.png)] .center[Most global structural "clades" are represented by our Basel sample.] --- # Formalizing structural diversity as a graph Generalize away from a fixed linear coordinate system to describe polymorphisms * Each genome is represented as a closed path through a graph. * Alignable regions are simply collinear paths. * Better evolutionary distance measure than synteny alignment score. * Structural variability of a particular locus = # paths. -- .middle[.center[![:scale 1000](/figs/basel/carb/graph.png)]] --- # Future outlook Can we start to make theoretical in-roads into basic questions regarding polymorphism at the molecular architecture level? * How much variation in synteny should one expect given a quickly adapting molecule? * Can we understand the statistics of the resultant structural trees? * How do rearrangement dynamics renormalize the statistics of the underlying gene tree? -- Complementary requirement. We need ![:emph](scalable) algorithms to deal with evolution in this limit. * Multiple "plasmid" alignment in the face of structural rearrangements. * Need a precise definition of a polymorphic degree of freedom to track. --- # Acknowledgements .twoThirdsLeft[![:scale 600](/figs/basel/carb/ackw.png)] .block[ My collaborators * <sup><sub>Eric Ulrich</sub></sup> * <sup><sub>Daniel Wurthrich</sub></sup> * <sup><sub>Vladimira Hinic</sub></sup> * <sup><sub>Adrian Egli</sub></sup> * <sup><sub>Richard Neher</sub></sup> You all for listening ]