# How to understand evolution quantitatively? * What parameterizes the dynamics + environment of evolutionary models? -- * What are the underlying ? * <sub><sup> SNPs in an alignment </sub></sup> * <sub><sup> Polymorphic gene content </sub></sup> * <sub><sup> Structural variation </sub></sup> -- * What can we reasonably ? * <sub><sup> Speed of divergence (in some distance metric)</sub></sup> * <sub><sup> Statistics of geneological trees</sub></sup> * <sub><sup> 1-body mutational stats. - i.e. SFS. Higher order? - i.e. LD</sub></sup> -- * How do we  our predictions? * <sub><sup>Experimental evolution</sub></sup> * <sub><sup>Inference from in-vivo data</sub></sup> --- # Model evolution as successive mutations on static sequence .left-column[.middle[]] .right-column[.middle[]] Have understanding of * dependence of average rate of mutations accumulation on $\mu, N, s$ * coalescent theory: how dynamics are reflected in statistics of underlying tree * how to extract from data: can  sequences and estimate tree .footnote[<sup>1</sup>.cite[Beneficial Mutation-Selection Balance and the Effect of Linkage on Positive Selection. Michael Desai, Daniel Fisher]] --- # Model evolution as successive mutations on static sequence .center[] Critical requirement of analysis! * There is a well-defined sequence alignment from which to define polymorphisms and thus the degrees of freedom under evolution. --- # Interpretable in both lab evolution and in clinic .left-column[ .center[ LTEE  ] ] .right-column[ .center[ Clinical ecosystem  ] ] .footnote[<sup>1</sup>.cite[Dynamics of molecular evolution over 60,000 generations. Ben Good, Rich Lenksi, Michael Desai et al.]] -- .left-column[ * high throughput longitudinal data * known environment * run replicates under "equivalent" conditions. * artifical fossil record ] .right-column[ * large population sizes $N\approx10^{8}-10^{12}$ * strong selection pressure (antibiotics) with time frame * high-throughput data collection * applicable to microbes like HIV, MRSA ] --- # Microbial evolution is different .middle[.center[]] -- .center[Evolution of bacterial AMR doesn't fit mutational competition paradigm] --- # Bacteria evolve by horizontally sharing genes .center[] --- # Empirically determine relevant polymorphic "atom" How do we reconstruct evolutionary history in the horizontal regime from sequencing data? * Tracking mutations on relevant genes not enough * <sup><sub> Selection over $20$ years. 1 kB region </sup></sub>. * <sup><sub> Handful of mutations </sup></sub>. * Most AMR genes are transferred via conjugative plasmids. * <sup><sub> One-to-one correspondence? </sup></sub> * <sup><sub> Are plasmids well approximated by static sequence?</sup></sub> * <sup><sub> Correlations to ST? </sup></sub> * Many AMR genes are embedded within transposable elements. -- .center[First step must be deciphering the  of each polymorphic generating event.] --- # Global carbapenamase outbreak as case study. .left-column[ * Reserve antibiotics used to treat MDR bacteria. * First observed in the late 1980's * Phenotypic resistence is conferred by multiple different genes - <sub><sup>Growing public health problem.</sub></sup> - <sub><sup>Globally heterogeneous prevalence</sub></sup> * Facinating case study into deconvolving spread mediated by horizontal transfer and clonal expansion. ] .right-column[] -- To address structural polymorphism we need full genomes! --- # Resolving HGT with long reads .left-column[ Reconstruct history by sequencing - Illumina reads: high coverage, short reads. - Too short to bridge repetitive elements - Fragmented assemblies - Problem! most AMR genes are flanked by repetitive/mobile elements .center[  ] ] .footnote[<sup>1</sup>.cite[.url[github.com/rrwick]]] -- .right-column[ ONT long reads required to resolve structural diversity .center[   ] ] --- # Long-read sequencing of Carbapenemase producing bacteria .center[] -- .third1[  ] .twoThirdsRight[ 110 carbapenemase producing bacteria in Basel over $\sim$ 7 years. * Hybrid assemblies resolve structural and nucleotide polymorphism. * Short read contigs containing AMR genes avg. 6 genes long * <sub><sup> Not enough diversity to reconstruct history </sub></sup> * Use synteny/rearrangements as epidemiological clock ] --- # Using rearrangements as a "molecular clock" .left-column[ .center[   ] .center[ Hierarchically cluster into "structural clades" ] ] -- .right-column[ .center[  ] ] .left-column[ * Syntenic changes resolve evolutionary relationships between plasmids * Different $bla_{KPC}$ genes are found in same context * Plasmids promiscuously shared across MLST and species ] --- # Synteny alignments of Carbapenemase containing loci .third1[  ] .third2[ .center[  ] ] .third3[  ] -- .block[ * $bla_{KPC}$: plasmid-bound. correlated w/ MLST and clone * $bla_{NDM}$: high transposition rate. genome integration * $bla_{OXA-48}$: high/low conjugation/transposition rate ] --- # Scaling up to a global picture Perform the same comparison against  carbapenemase carrying plasmids contained in the NCBI pathogen database. .third1[ .center[ $bla_{KPC}$  ] ] .third2[ .center[ $bla_{NDM}$  ] ] .third3[ .center[ $bla_{OXA-48}$  ] ] -- .center[Most global structural "clades" are represented by our local sample of the Basel clinic.] --- # Global structural phylogeny estimates rates .left-column[ Current computation: Use rearrangements as molecular clock * Embed our carb. plasmid global sample within all NCBI plasmids * <sub><sup>Look for equivalent molecule sans bla gene</sub></sup> * <sub><sup>Frequency of "mono-resistant gene" clades estimates transposition rate.</sub></sup> * <sub><sup>Frequency of MLST/species transfer on this tree estimates conjugation rate</sub></sup> * Critical bit: Need a sensible distance metric to estimate time! * <sub><sup>Needs to count individual rearrangements + gene gain loss accurately.</sub></sup> * <sub><sup>Check w/ known isolation times <sub><sup> ] .right-column[ .center[  ] .center[  ] ] --- # Theoretical outlook Can we start to make theoretical in-roads into basic questions regarding polymorphism at the molecular architecture level? * How much variation in synteny should one expect given a quickly adapting molecule? * Can we understand the statistics of the resultant structural trees? * How do rearrangement dynamics renormalize the statistics of the underlying gene tree? -- Complementary requirement. We need  algorithms to deal with evolution in this limit. * Multiple "plasmid" alignment in the face of structural rearrangements. * Need a precise definition of a polymorphic degree of freedom to track. --- # Acknowledgements .twoThirdsLeft[] .block[ My collaborators * <sup><sub>Eric Ulrich</sub></sup> * <sup><sub>Daniel Wurthrich</sub></sup> * <sup><sub>Vladimira Hinic</sub></sup> * <sup><sub>Adrian Egli</sub></sup> * <sup><sub>Richard Neher</sub></sup> You all for listening ]