Prediction of maladaptation from ecological and genomic data using genomic offsets

Master’s Thesis in Bioinformatics

Curro Campuzano Jiménez

Bioinformatics Research Center at Aarhus University

June 24, 2024

Genomic offsets

A set of statistical tools that predict the maladaptation of populations to rapid environmental change based on genotypes \(\times\)environment association models

Figure 1

Outline

  1. Overview of the Gain and collaborators (2023) model
  2. Methodological analysis with general simulations
    • Comparison of different methods under different scenarios

    • Identifying putatively adaptive loci

    • Measuring uncertainty

  3. Case study: Mediterranean thyme

An explicit model of genomic offsets by Gain and collaborators

Figure 2

Gaussian stabilizing selection

\[ w(z| \mathbf{x}^*) = \exp\left(\frac{-\left(z - z_{\text{opt}}(\mathbf x ^*) \right)^2}{2V_S}\right) \]

Fit a genotype \(\times\) environment association model

Figure 3

We have to assume all individuals are within their adaptive optimum and we can measure the QTLs!

With a bit of math rearranging …

\[ G^2(\mathbf{x}, \mathbf{x}^*) = \frac{\left(\sum _l ^L \hat y_l(\mathbf x) - \hat y_l(\mathbf x^*)\right)^2}{L} \]

Under Gaussian stabilizing selection we would find that a relationship between the genomic offset and shifted fitness:

\[ \mathbb E[-\log (w(\mathbf{x}, \mathbf{x}^*)] \approx \frac{a^2\mathcal G^2(\mathbf{x}, \mathbf{x}^*)}{2V_s} \]

Genomic offsets in a nutshell

  1. Sample locally optimal individuals and measure their genotypes and current environment
  2. Identify a set of putatively adaptive loci using hypothesis testing
  3. Fit a genotype \(\times\) environment association statistical model
  4. Calculate genomic offset between current and shifted environment
  5. Rank individuals / populations

Results

Simulated data using SLiM

Figure 4

Data

Genotype matrix Current environmental matrix Shifted environmental matrix Shifted fitness

Figure 5

Different local and non-local adaptation scenarios

No differences between methods if using the same set of candidates

0.000.250.500.751.00 GeometricGradient forestRDARONA Genomic oset Adjusted R squared Scenario Locally optimal Locally adapted Non local adaptation

Figure 6

How to identify the putatively adaptive loci?

  • Hypothesis testing approach based on genotype\(\times\) environment association model

  • Other options?

Minimize false negatives!

CausalEmpiricalAll SNPsIncomplete (5+5 QTLs)Missing secondary trait (5+0 QTLs)Missing primary trait (0+5 QTLs) 0.00.20.40.6 Adjusted R squared Weak asymmetryStrong asymmetry Increase false positives Increase false negatives

Figure 7: Weak and strong asymmetry refer to the difference in relative importance of the two adaptive phenotypes.

Hypothesis-testing prevents spurious inference when phenotype has no genetic basis

Figure 8

Lind and collaborators found that randomly selected were as good as putatively adaptive loci.

What about bootstrapping to measure uncertainty?

Figure 9

What about bootstrapping to measure uncertainty?

Figure 10

Representative case: Mediterranean thyme

Freezing-tolerant ecotype(non-phenolic monoterpenes)Drough-tolerant ecotype(phenolic monoterpenes)Decrease in frequency of freezing events

Representative case: Mediterranean thyme

Modest predictive power of shifted fitness

-0.10.00.10.2 0.00.51.01.52.0 Adjusted R squared Fast environmental change(2.5 pseudo generations) -0.10.00.10.2 0.02.55.07.5 Intermediate environmental change(10 pseudo generations) -0.10.00.10.2 01020304050 Sampling timepoint (pseudogenerations since climate change started) Slow environmental change(50 pseudo generations) Causal geometric genomic osetEmpirical geometric genomic oset

Figure 11

Future work

  1. Explicitly model two or more latent phenotypic trait
  2. Explore a larger space of hyperparameters in the simulations
  3. Consider different approaches for identifying putatively adaptive loci
  4. Further study the bootstrapping approach
  5. Extend thyme simulations
  6. Analyze real data!

Take home message

  • Genomic offsets are an effective approach if you have external evidence of populations being locally optimal

  • Identifying putatively adaptive loci using hypothesis-testing improves performance, but not be too strict

  • Measuring the uncertainty of your estimates using bootstrapped ranked genomic offsets is a promising strategy

  • Simulate specific data to show the method could work in theory, or if it will fail (non-continuous phenotypes, migration …)

Thanks!

Extra slides

Alternative genomic offsets

Figure 12

Conceptual issues on genomic offsets

Locally optimal versus locally adapted

  • Assumptions have been stated vaguely before
  • I argue that methods assume sampled individuals are within their adaptive optimum
  • Results of Gain and collaborators hold if latent phenotype has constant variance

Conceptual issues in genomic offset framework

Figure 13

Distribution of genomic offsets

(A) Causal (B) Empirical 0.00.10.20.3 0.00.10.20.3 Non local adaptationLocal-foreignHome-away & local-foreign Geometric genomic oset Scenario (A) Causal(B) Empirical

Figure 14

Different local and non-local adaptation scenarios

Locally adapted (local-foreign criteria Locally optimal (home-away and local-foreign criteria Non-local adaptation ## Selecting environmental variables

  • Model building and variable selection?
  • Prior knowledge?

Genomic offsets are robust to uncorrelated environmental variables

0.000.250.500.751.00 0204060 Number of added uncorrelated environmental factors Adjusted R squared Number of QTLs 2050100

Figure 15

Be cautious and look for potential confounded variables!

Figure 16

Spearman’s correlation of bootstrapped genomic offsets

RDA RONA Geometric Gradient Forest 0.000.250.500.751.00 0.000.250.500.751.00 Spearman correlation 95% condence interval Random seeds Genomic oset CausalEmpirical

Figure 17

Figure 18

A reasonably comprehensive Julia package

Figure 19

Runtimes

GeometricGradient ForestRDARONA 1030100300 100 bootstrap iterations runtime (seconds)

Figure 20

Numerical issues

λ=1e-5 λ=1e2 0200040006000 0200040006000 0.000.050.100.15 Number of random alleles added to the genotype matrix Adjusted R squared Latent factors (K) 123

Figure 21

Near-zero predictive power of average future fitness

-0.10.00.10.20.3 0.00.51.01.52.0 Adjusted R squared Fast environmental change(2.5 pseudo generations) -0.10.00.10.20.3 0.02.55.07.5 Intermediate environmental change(10 pseudo generations) -0.10.00.10.20.3 01020304050 Sampling timepoint (pseudogenerations since climate change started) Slow environmental change(50 pseudo generations) Causal geometric genomic osetEmpirical geometric genomic oset
Figure 22