# How to understand evolution quantitatively? * What parameterizes the dynamics + environment of evolutionary models? -- * What are the underlying ![:emph](degrees of freedom)? * <sub><sup> SNPs in an alignment </sub></sup> * <sub><sup> Polymorphic gene content </sub></sup> * <sub><sup> Structural variation </sub></sup> -- * What can we reasonably ![:emph](predict)? * <sub><sup> Speed of divergence (in some distance metric)</sub></sup> * <sub><sup> Statistics of geneological trees</sub></sup> * <sub><sup> 1-body mutational stats. - i.e. SFS. Higher order? - i.e. LD</sub></sup> -- * How do we ![:emph](test) our predictions? * <sub><sup>Experimental evolution</sub></sup> * <sub><sup>Inference from in-vivo data</sub></sup> --- # Model evolution as successive mutations on static sequence .left-column[.middle[![:scale 500](/figs/basel/ext/seq_evolve.svg)]] .right-column[.middle[![:scale 500](/figs/basel/ext/muller_plots.jpg)]] Have understanding of * dependence of average rate of mutations accumulation on $\mu, N, s$ * coalescent theory: how dynamics are reflected in statistics of underlying tree * how to extract from data: can ![:emph](align) sequences and estimate tree .footnote[<sup>1</sup>.cite[Beneficial Mutation-Selection Balance and the Effect of Linkage on Positive Selection. Michael Desai, Daniel Fisher]] --- # Model evolution as successive mutations on static sequence .center[![:scale 900](/figs/basel/ext/align.png)] Critical requirement of analysis! * There is a well-defined sequence alignment from which to define polymorphisms and thus the degrees of freedom under evolution. --- # Interpretable in both lab evolution and in clinic .left-column[ .center[ LTEE ![:scale 550](/figs/basel/ext/LTEE.png) ] ] .right-column[ .center[ Clinical ecosystem ![:scale 350](/figs/basel/ext/flu_tree.png) ] ] .footnote[<sup>1</sup>.cite[Dynamics of molecular evolution over 60,000 generations. Ben Good, Rich Lenksi, Michael Desai et al.]] -- .left-column[ * high throughput longitudinal data * known environment * run replicates under "equivalent" conditions. * artifical fossil record ] .right-column[ * large population sizes $N\approx10^{8}-10^{12}$ * strong selection pressure (antibiotics) with time frame * high-throughput data collection * applicable to microbes like HIV, MRSA ] --- # Microbial evolution is different .middle[.center[![:scale 450](/figs/basel/ext/HGT.png)]] -- .center[Evolution of bacterial AMR doesn't fit mutational competition paradigm] --- # Bacteria evolve by horizontally sharing genes .center[![:scale 1000](/figs/basel/ext/panX_association.png)] --- # Empirically determine relevant polymorphic "atom" How do we reconstruct evolutionary history in the horizontal regime from sequencing data? * Tracking mutations on relevant genes not enough * <sup><sub> Selection over $20$ years. 1 kB region </sup></sub>. * <sup><sub> Handful of mutations </sup></sub>. * Most AMR genes are transferred via conjugative plasmids. * <sup><sub> One-to-one correspondence? </sup></sub> * <sup><sub> Are plasmids well approximated by static sequence?</sup></sub> * <sup><sub> Correlations to ST? </sup></sub> * Many AMR genes are embedded within transposable elements. -- .center[First step must be deciphering the ![:emph](relative rates) of each polymorphic generating event.] --- # Global carbapenamase outbreak as case study. .left-column[ * Reserve antibiotics used to treat MDR bacteria. * First observed in the late 1980's * Phenotypic resistence is conferred by multiple different genes - <sub><sup>Growing public health problem.</sub></sup> - <sub><sup>Globally heterogeneous prevalence</sub></sup> * Facinating case study into deconvolving spread mediated by horizontal transfer and clonal expansion. ] .right-column[![:scale 400](/figs/basel/ext/carb_prevalence_eu.png)] -- To address structural polymorphism we need full genomes! --- # Resolving HGT with long reads .left-column[ Reconstruct history by sequencing - Illumina reads: high coverage, short reads. - Too short to bridge repetitive elements - Fragmented assemblies - Problem! most AMR genes are flanked by repetitive/mobile elements .center[ ![:scale 350](/figs/basel/ext/bad_assembly_graph.png) ] ] .footnote[<sup>1</sup>.cite[.url[github.com/rrwick]]] -- .right-column[ ONT long reads required to resolve structural diversity .center[ ![:scale 300](/figs/basel/carb/minIon.jpg) ![:scale 400](/figs/basel/carb/canu_to_spades.png) ] ] --- # Long-read sequencing of Carbapenemase producing bacteria .center[![:scale 900](/figs/basel/carb/overview_table.svg)] -- .third1[ ![:scale 350](/figs/basel/carb/contigSizes.svg) ] .twoThirdsRight[ 110 carbapenemase producing bacteria in Basel over $\sim$ 7 years. * Hybrid assemblies resolve structural and nucleotide polymorphism. * Short read contigs containing AMR genes avg. 6 genes long * <sub><sup> Not enough diversity to reconstruct history </sub></sup> * Use synteny/rearrangements as epidemiological clock ] --- # Using rearrangements as a "molecular clock" .left-column[ .center[ ![:scale 250](/figs/basel/carb/syntenyCartoon.svg) ![:scale 250](/figs/basel/carb/syntenymatrix.svg) ] .center[ Hierarchically cluster into "structural clades" ] ] -- .right-column[ .center[ ![:scale 300](/figs/basel/carb/kpc_synteny.svg) ] ] .left-column[ * Syntenic changes resolve evolutionary relationships between plasmids * Different $bla_{KPC}$ genes are found in same context * Plasmids promiscuously shared across MLST and species ] --- # Synteny alignments of Carbapenemase containing loci .third1[ ![:scale 300](/figs/basel/carb/kpc_synteny.svg) ] .third2[ .center[ ![:scale 250](/figs/basel/carb/ndm_synteny.svg) ] ] .third3[ ![:scale 240](/figs/basel/carb/oxa48_synteny.svg) ] -- .block[ * $bla_{KPC}$: plasmid-bound. correlated w/ MLST and clone * $bla_{NDM}$: high transposition rate. genome integration * $bla_{OXA-48}$: high/low conjugation/transposition rate ] --- # Scaling up to a global picture Perform the same comparison against ![:emph](all) carbapenemase carrying plasmids contained in the NCBI pathogen database. .third1[ .center[ $bla_{KPC}$ ![:scale 320](/figs/basel/carb/kpc_global.png) ] ] .third2[ .center[ $bla_{NDM}$ ![:scale 320](/figs/basel/carb/ndm_global.png) ] ] .third3[ .center[ $bla_{OXA-48}$ ![:scale 320](/figs/basel/carb/oxa48_global.png) ] ] -- .center[Most global structural "clades" are represented by our local sample of the Basel clinic.] --- # Global structural phylogeny estimates rates .left-column[ Current computation: Use rearrangements as molecular clock * Embed our carb. plasmid global sample within all NCBI plasmids * <sub><sup>Look for equivalent molecule sans bla gene</sub></sup> * <sub><sup>Frequency of "mono-resistant gene" clades estimates transposition rate.</sub></sup> * <sub><sup>Frequency of MLST/species transfer on this tree estimates conjugation rate</sub></sup> * Critical bit: Need a sensible distance metric to estimate time! * <sub><sup>Needs to count individual rearrangements + gene gain loss accurately.</sub></sup> * <sub><sup>Check w/ known isolation times <sub><sup> ] .right-column[ .center[ ![:scale 500](/figs/basel/carb/plasmidTree_global.svg) ] .center[ ![:scale 300](/figs/basel/carb/plasmidTree_zoom.svg) ] ] --- # Theoretical outlook Can we start to make theoretical in-roads into basic questions regarding polymorphism at the molecular architecture level? * How much variation in synteny should one expect given a quickly adapting molecule? * Can we understand the statistics of the resultant structural trees? * How do rearrangement dynamics renormalize the statistics of the underlying gene tree? -- Complementary requirement. We need ![:emph](scalable) algorithms to deal with evolution in this limit. * Multiple "plasmid" alignment in the face of structural rearrangements. * Need a precise definition of a polymorphic degree of freedom to track. --- # Acknowledgements .twoThirdsLeft[![:scale 600](/figs/basel/carb/ackw.png)] .block[ My collaborators * <sup><sub>Eric Ulrich</sub></sup> * <sup><sub>Daniel Wurthrich</sub></sup> * <sup><sub>Vladimira Hinic</sub></sup> * <sup><sub>Adrian Egli</sub></sup> * <sup><sub>Richard Neher</sub></sup> You all for listening ]