presentation

# How to understand evolution quantitatively?

* What parameterizes the dynamics + environment of evolutionary models?

--
* What are the underlying ![:emph](degrees of freedom)?
 * SNPs in an alignment 
 * Polymorphic gene content 
 * Structural variation

* What can we reasonably ![:emph](predict)?
 * Speed of divergence (in some distance metric) 
 * Statistics of geneological trees 
 * 1-body mutational stats. - i.e. SFS. Higher order? - i.e. LD

* How do we ![:emph](test) our predictions?
 * Experimental evolution
 * Inference from in-vivo data

---

# Model evolution as successive mutations on static sequence

.left-column[.middle[![:scale 500](/figs/basel/ext/seq_evolve.svg)]]
.right-column[.middle[![:scale 500](/figs/basel/ext/muller_plots.jpg)]]

&nbsp;

Have understanding of 
* dependence of average rate of mutations accumulation on $\mu, N, s$
* coalescent theory: how dynamics are reflected in statistics of underlying tree
* how to extract from data: can ![:emph](align) sequences and estimate tree

.footnote[1.cite[Beneficial Mutation-Selection Balance and the Effect of Linkage on Positive Selection. Michael Desai, Daniel Fisher]]

---

# Model evolution as successive mutations on static sequence

.center[![:scale 900](/figs/basel/ext/align.png)]

&nbsp;
&nbsp;
&nbsp;

Critical requirement of analysis!
* There is a well-defined sequence alignment from which to define polymorphisms and thus the degrees of freedom under evolution.

---

# Interpretable in both lab evolution and in clinic

.left-column[
.center[
LTEE
![:scale 550](/figs/basel/ext/LTEE.png)
]
]
.right-column[
.center[
Clinical ecosystem
![:scale 350](/figs/basel/ext/flu_tree.png)
]
]

.footnote[1.cite[Dynamics of molecular evolution over 60,000 generations. Ben Good, Rich Lenksi, Michael Desai et al.]]

.left-column[
&nbsp;

* high throughput longitudinal data
* known environment 
* run replicates under "equivalent" conditions.
* artifical fossil record 
]

.right-column[
* large population sizes $N\approx10^{8}-10^{12}$
* strong selection pressure (antibiotics) with time frame
* high-throughput data collection
* applicable to microbes like HIV, MRSA
]

---

# Microbial evolution is different

&nbsp;
&nbsp;

.middle[.center[![:scale 450](/figs/basel/ext/HGT.png)]]

.center[Evolution of bacterial AMR doesn't fit mutational competition paradigm]

---

# Bacteria evolve by horizontally sharing genes

.center[![:scale 1000](/figs/basel/ext/panX_association.png)]

---

# Empirically determine relevant polymorphic "atom"

How do we reconstruct evolutionary history in the horizontal regime from sequencing data?
* Tracking mutations on relevant genes not enough
 * Selection over $20$ years. 1 kB region . 
 * Handful of mutations . 
* Most AMR genes are transferred via conjugative plasmids.
 * One-to-one correspondence? 
 * Are plasmids well approximated by static sequence? 
 * Correlations to ST? 
* Many AMR genes are embedded within transposable elements.

&nbsp;
.center[First step must be deciphering the ![:emph](relative rates) of each polymorphic generating event.]

---

# Global carbapenamase outbreak as case study.

.left-column[
* Reserve antibiotics used to treat MDR bacteria.
* First observed in the late 1980's
* Phenotypic resistence is conferred by multiple different genes
 - Growing public health problem.
 - Globally heterogeneous prevalence
* Facinating case study into deconvolving spread mediated by horizontal transfer and clonal expansion. 
]
.right-column[![:scale 400](/figs/basel/ext/carb_prevalence_eu.png)]

--
To address structural polymorphism we need full genomes!

---

# Resolving HGT with long reads

.left-column[
Reconstruct history by sequencing
- Illumina reads: high coverage, short reads.
- Too short to bridge repetitive elements
- Fragmented assemblies
- Problem! most AMR genes are flanked by repetitive/mobile elements
.center[
![:scale 350](/figs/basel/ext/bad_assembly_graph.png)
]
]

.footnote[1.cite[.url[github.com/rrwick]]]

--
.right-column[
ONT long reads required to resolve structural diversity
.center[
![:scale 300](/figs/basel/carb/minIon.jpg)
![:scale 400](/figs/basel/carb/canu_to_spades.png)
]
]

---

# Long-read sequencing of Carbapenemase producing bacteria
.center[![:scale 900](/figs/basel/carb/overview_table.svg)]

.third1[
&nbsp;
![:scale 350](/figs/basel/carb/contigSizes.svg)
]

.twoThirdsRight[
110 carbapenemase producing bacteria in Basel over $\sim$ 7 years.
* Hybrid assemblies resolve structural and nucleotide polymorphism.
* Short read contigs containing AMR genes avg. 6 genes long
 * Not enough diversity to reconstruct history 
* Use synteny/rearrangements as epidemiological clock
]

---

# Using rearrangements as a "molecular clock"
.left-column[
.center[
![:scale 250](/figs/basel/carb/syntenyCartoon.svg)
![:scale 250](/figs/basel/carb/syntenymatrix.svg)
]
.center[
Hierarchically cluster into "structural clades"
]
]

--
.right-column[
.center[
![:scale 300](/figs/basel/carb/kpc_synteny.svg)
]
]

.left-column[
* Syntenic changes resolve evolutionary relationships between plasmids
* Different $bla_{KPC}$ genes are found in same context
* Plasmids promiscuously shared across MLST and species
]

---

# Synteny alignments of Carbapenemase containing loci

.third1[
![:scale 300](/figs/basel/carb/kpc_synteny.svg)
]

.third2[
.center[
![:scale 250](/figs/basel/carb/ndm_synteny.svg)
]
]

.third3[
![:scale 240](/figs/basel/carb/oxa48_synteny.svg)
]

.block[
* $bla_{KPC}$: plasmid-bound. correlated w/ MLST and clone
* $bla_{NDM}$: high transposition rate. genome integration
* $bla_{OXA-48}$: high/low conjugation/transposition rate
]

---

# Scaling up to a global picture
Perform the same comparison against ![:emph](all) carbapenemase carrying plasmids contained in the NCBI pathogen database.

.third1[
.center[
$bla_{KPC}$
![:scale 320](/figs/basel/carb/kpc_global.png)
]
]

.third2[
.center[
$bla_{NDM}$
![:scale 320](/figs/basel/carb/ndm_global.png)
]
]

.third3[
.center[
$bla_{OXA-48}$
![:scale 320](/figs/basel/carb/oxa48_global.png)
]
]

.center[Most global structural "clades" are represented by our local sample of the Basel clinic.]

---

# Global structural phylogeny estimates rates

.left-column[
Current computation: Use rearrangements as molecular clock
* Embed our carb. plasmid global sample within all NCBI plasmids
 * Look for equivalent molecule sans bla gene
 * Frequency of "mono-resistant gene" clades estimates transposition rate.
 * Frequency of MLST/species transfer on this tree estimates conjugation rate
* Critical bit: Need a sensible distance metric to estimate time!
 * Needs to count individual rearrangements + gene gain loss accurately.
 * Check w/ known isolation times 
]

.right-column[
.center[
![:scale 500](/figs/basel/carb/plasmidTree_global.svg)
]
.center[
![:scale 300](/figs/basel/carb/plasmidTree_zoom.svg)
]
]

---

# Theoretical outlook

&nbsp;
Can we start to make theoretical in-roads into basic questions regarding polymorphism at the molecular architecture level?
* How much variation in synteny should one expect given a quickly adapting molecule?
* Can we understand the statistics of the resultant structural trees? 
* How do rearrangement dynamics renormalize the statistics of the underlying gene tree?

&nbsp;

Complementary requirement. We need ![:emph](scalable) algorithms to deal with evolution in this limit.
* Multiple "plasmid" alignment in the face of structural rearrangements.
* Need a precise definition of a polymorphic degree of freedom to track.

---

# Acknowledgements
&nbsp;
&nbsp;

.twoThirdsLeft[![:scale 600](/figs/basel/carb/ackw.png)]

.block[
My collaborators
 * Eric Ulrich
 * Daniel Wurthrich
 * Vladimira Hinic
 * Adrian Egli
 * Richard Neher

You all for listening 
]