Prediction of maladaptation from ecological and genomic data using genomic offsets

Master’s Thesis in Bioinformatics

Curro Campuzano Jiménez

Bioinformatics Research Center at Aarhus University

June 24, 2024

Genomic offsets

A set of statistical tools that predict the maladaptation of populations to rapid environmental change based on genotypes \(\times\)environment association models

Figure 1

Outline

Overview of the Gain and collaborators (2023) model
Methodological analysis with general simulations
- Comparison of different methods under different scenarios
- Identifying putatively adaptive loci
- Measuring uncertainty
Case study: Mediterranean thyme

An explicit model of genomic offsets by Gain and collaborators

Figure 2

Gaussian stabilizing selection

\[ w(z| \mathbf{x}^*) = \exp\left(\frac{-\left(z - z_{\text{opt}}(\mathbf x ^*) \right)^2}{2V_S}\right) \]

Fit a genotype \(\times\) environment association model

Figure 3

We have to assume all individuals are within their adaptive optimum and we can measure the QTLs!

With a bit of math rearranging …

\[ G^2(\mathbf{x}, \mathbf{x}^*) = \frac{\left(\sum _l ^L \hat y_l(\mathbf x) - \hat y_l(\mathbf x^*)\right)^2}{L} \]

Under Gaussian stabilizing selection we would find that a relationship between the genomic offset and shifted fitness:

\[ \mathbb E[-\log (w(\mathbf{x}, \mathbf{x}^*)] \approx \frac{a^2\mathcal G^2(\mathbf{x}, \mathbf{x}^*)}{2V_s} \]

Genomic offsets in a nutshell

Sample locally optimal individuals and measure their genotypes and current environment
Identify a set of putatively adaptive loci using hypothesis testing
Fit a genotype \(\times\) environment association statistical model
Calculate genomic offset between current and shifted environment
Rank individuals / populations

Results

Simulated data using SLiM

Figure 4

Data

Figure 5

Different local and non-local adaptation scenarios

No differences between methods if using the same set of candidates

Figure 6

How to identify the putatively adaptive loci?

Hypothesis testing approach based on genotype\(\times\) environment association model
Other options?

Minimize false negatives!

Figure 7: Weak and strong asymmetry refer to the difference in relative importance of the two adaptive phenotypes.

Hypothesis-testing prevents spurious inference when phenotype has no genetic basis

Figure 8

Lind and collaborators found that randomly selected were as good as putatively adaptive loci.

What about bootstrapping to measure uncertainty?

Figure 9

What about bootstrapping to measure uncertainty?

Figure 10

Representative case: Mediterranean thyme

Modest predictive power of shifted fitness

Figure 11

Future work

Explicitly model two or more latent phenotypic trait
Explore a larger space of hyperparameters in the simulations
Consider different approaches for identifying putatively adaptive loci
Further study the bootstrapping approach
Extend thyme simulations
Analyze real data!

Take home message

Genomic offsets are an effective approach if you have external evidence of populations being locally optimal
Identifying putatively adaptive loci using hypothesis-testing improves performance, but not be too strict
Measuring the uncertainty of your estimates using bootstrapped ranked genomic offsets is a promising strategy
Simulate specific data to show the method could work in theory, or if it will fail (non-continuous phenotypes, migration …)

Thanks!

Extra slides

Alternative genomic offsets

Figure 12

Conceptual issues on genomic offsets

Locally optimal versus locally adapted

Assumptions have been stated vaguely before
I argue that methods assume sampled individuals are within their adaptive optimum
Results of Gain and collaborators hold if latent phenotype has constant variance

Conceptual issues in genomic offset framework

Figure 13

Distribution of genomic offsets

Figure 14

Different local and non-local adaptation scenarios

## Selecting environmental variables

Model building and variable selection?
Prior knowledge?

Genomic offsets are robust to uncorrelated environmental variables

Figure 15

Be cautious and look for potential confounded variables!

Figure 16

Spearman’s correlation of bootstrapped genomic offsets

Figure 17

Figure 18

A reasonably comprehensive Julia package

Figure 19

Runtimes

Figure 20

Numerical issues

Figure 21

Near-zero predictive power of average future fitness

Figure 22

Prediction of maladaptation from ecological and genomic data using genomic offsets

Outline

An explicit model of genomic offsets by Gain and collaborators

Gaussian stabilizing selection

Fit a genotype \(\times\) environment association model

With a bit of math rearranging …

Genomic offsets in a nutshell

Results

Simulated data using SLiM

Data

Different local and non-local adaptation scenarios

No differences between methods if using the same set of candidates

How to identify the putatively adaptive loci?

Minimize false negatives!

Hypothesis-testing prevents spurious inference when phenotype has no genetic basis

What about bootstrapping to measure uncertainty?

What about bootstrapping to measure uncertainty?

Representative case: Mediterranean thyme

Representative case: Mediterranean thyme

Modest predictive power of shifted fitness

Future work

Take home message

Thanks!

Extra slides

Most popular methods

Alternative genomic offsets

Conceptual issues on genomic offsets

Locally optimal versus locally adapted

Conceptual issues in genomic offset framework

Distribution of genomic offsets

Different local and non-local adaptation scenarios

Genomic offsets are robust to uncorrelated environmental variables

Be cautious and look for potential confounded variables!

Spearman’s correlation of bootstrapped genomic offsets

A reasonably comprehensive Julia package

Runtimes

Numerical issues

Near-zero predictive power of average future fitness