GenomicOffsets.jl Documentation

Genomic offsets (GO) statistics measure current individuals/populations' maladaptation in the event of a drastic climate change. These metrics are obtained from Genotype environment associations (GEA).

This Julia package provides efficient implementations of the four most popular metrics: RONA, RDA GO, Gradient Forest GO and Geometric GO. In addition, we have implemented an F-statistical test based on the Linear Mixed Latent Factor model (LFMM) to perform the identification of GEA candidates through hypothesis testing and a bootstrap approach that resamples loci, performs the GEA candidates identification and computes the desired genomic offset metric.

We have documented several "recipes" in both Julia and R in the README file of the repository (so please take a look there and use this documentation as a reference for all available functions).

TL;TR

You can compute different genomic offsets by fitting different models and computing the genomic offset.

# Julia
using GenomicOffsets
Y, X, Xpred = godataset
rona = fit(RONA, Y, X)
genomic_offset(rona, X, Xpred)
100×1 Matrix{Float64}:
 0.04930412355509537
 0.09685110436545971
 0.031955998579532574
 0.057903936496619686
 0.06747793104523755
 0.07838283024137747
 0.034071076534702927
 0.05671776002159728
 0.11209514598766128
 0.050635672102255044
 ⋮
 0.05847056037198923
 0.044459426014584816
 0.1000681402606062
 0.076366019095105
 0.0779261159151334
 0.09861395620054535
 0.06261235895467696
 0.05449853921861244
 0.07504407351624044

You can fit LFMM models to do hypothesis testing and identify GEA candidates.

# Julia
lfmm = RidgeLFMM(Y, X, 1)
pvalues = LFMM_Ftest(lfmm, Y, X; genomic_control=true)
# Bonferroni correction at 0.05
candidates = findall(pvalues .< 0.05 / size(Y, 2))
90-element Vector{Int64}:
   52
   86
  105
  176
  211
  233
  284
  323
  412
  419
    ⋮
 4107
 4319
 4383
 4688
 4831
 4844
 4907
 4973
 5124

Yo can compute bootstrapping confidence intervals for different genomic offset metrics using an approach we propose that sample loci and identifies GEA candidates at each iteration:

using Random, StatsBase
nboots = 100
offsets = bootstrap_with_candidates(GeometricGO, Y, X, Xpred, nboots;candidates_threshold=0.05)
@assert size(offsets) == (size(Y, 1), nboots)
confint95 = [quantile(ind, [0.025, 0.975]) for ind in eachrow(offsets)]
100-element Vector{Vector{Float64}}:
 [0.02303280401566295, 0.03151558990536644]
 [0.1047729314923714, 0.15591623894136308]
 [0.011123446136442448, 0.015183760680373653]
 [0.0386068546151271, 0.05458975227698637]
 [0.04292727312732361, 0.06134203909699572]
 [0.0928716599812431, 0.13115510306701367]
 [0.014603083635330173, 0.020300992040981226]
 [0.03722872857179291, 0.04973468787500883]
 [0.08909070888028804, 0.1308675409351474]
 [0.03431001066987424, 0.05007488604751634]
 ⋮
 [0.021821315064466877, 0.029341761345400032]
 [0.029222879302603963, 0.04309757900855938]
 [0.1397961718143955, 0.2056429635597111]
 [0.04246632188729315, 0.06550399094959608]
 [0.048245602602202856, 0.07065170052663027]
 [0.05532866396795552, 0.07894162577477457]
 [0.04981200290106696, 0.07009733123191236]
 [0.02786753998320747, 0.04076474730946965]
 [0.06894502320101961, 0.10013528792399119]

References

GenomicOffsets.LFMMType
LFMM{T<:Real}

A struct to store the results of the Linear Fixed Effects Mixed Model (LFMM) as described by Caye et al. (2019).

\[Y = X B^T + W = X B^T + U V^T\]

source
GenomicOffsets.LFMM_FtestMethod
LFMM_Ftest(model::LFMM, Y::AbstractMatrix{T1}, X::AbstractMatrix{T2}; genomic_control::Bool=true, center=true) where {T1<:Real,T2<:Real}

Compute F-Statistic (F-test) for the Linear Fixed Effects Mixed Model (LFMM). TODO: Add reference & description.

Arguments

  • model: A LFMM{T<:Real} data structure (obtained from RidgeLFMM).
  • Y: A centered genotype matrix of size NxL.
  • X: A centered environmental matrix of size NxP.
  • genomic_control: If true, apply genomic control to the F-statistic.
  • center: If true, both the genotype matrix and the environmental matrix are centered. If false, both matrices are assumed to be centered.

Returns

  • A vector of p-values for loci in the genotype matrix.
source
GenomicOffsets.LFMM_FtestMethod
LFMM_Ftest(model::LFMM, Y::AbstractMatrix{T1}, X::AbstractMatrix{T2}; genomic_control::Bool=true, center=true) where {T1<:Real,T2<:Real}

Compute F-Statistic (F-test) for the Linear Fixed Effects Mixed Model (LFMM). TODO: Add reference & description.

Arguments

  • model: A LFMM{T<:Real} data structure (obtained from RidgeLFMM).
  • Y: A centered genotype matrix of size NxL.
  • X: A centered environmental matrix of size NxP.
  • genomic_control: If true, apply genomic control to the F-statistic.
  • center: If true, both the genotype matrix and the environmental matrix are centered. If false, both matrices are assumed to be centered.

Returns

  • A vector of p-values for loci in the genotype matrix.
source
GenomicOffsets.RidgeLFMMMethod
RidgeLFMM(Y::Matrix{T}, X::Matrix{T}, K::Int, λ::Float64) where T<:Real

Ridge solutions for the Linear Fixed Effects Mixed Model (LFMM) as described by Caye et al. (2019).

\[Y = X B^T + W = X B^T + U V^T\]

Arguments

  • Y: A centered genotype matrix of size NxL.
  • X: Environmental matrix of size NxP.
  • K: Number of latent factors.
  • λ: Regularization parameter (by 1e-5)
  • center: If true, both the genotype matrix and the environmental matrix are centered. If false, both matrices are assumed to be centered.

Returns

  • A LFMM{T<:Real} data structure with the latent factors U, Vt, and the effect sizes Bt.
source
GenomicOffsets.TracyWidomMethod
TracyWidom(eigenvalues::AbstractVector{T}; sort_eigenvalues::Bool=true) where T<:Real

Compute the Tracy-Widom statistics and p-values for a given set of eigenvalues. TODO: add reference.

Arguments

  • eigenvalues::AbstractVector{T}: A vector of eigenvalues.
  • sort_eigenvalues::Bool=true: If true, sort the eigenvalues in descending order. If false, the eigenvalues are assumed to be sorted in descending order.

Returns

  • A tuple with the Tracy-Widom statistics and p-values. For a given threshold, it is advised not to keep all eigenvalues below the threshold, but up to the first one that is above the threshold (not included).
source
GenomicOffsets.bootstrap_with_candidatesMethod
bootstrap_with_candidates(::Type{GeometricGO}, rng::Random.AbstractRNG,Y::AbstractMatrix{T1}, X::AbstractMatrix{T2},Xpred::AbstractMatrix{T2}, nboot::Int=500; candidates_threshold::Real=0.05, genomic_control::Bool=true, tw_threshold::Real=0.001) where {T1<:Real,T2<:Real}

Compute the Geometric genomic offset using a bootstrap approach. For every, bootstrap iteration, the model is fitted using a random subset of the columns of Y. The Tracy-Widom test is used to estimate the number of latent factors. The F-test is used to select putatively adaptative loci. The genomic offset is computed for the selected loci.

Arguments

  • rng::Random.AbstractRNG: Random number generator. If not provided, the global RNG is used.
  • Y::AbstractMatrix{T1}: NxL matrix with individuals genotypes or allele frequencies.
  • X::AbstractMatrix{T2}: NxP matrix with environmental data.
  • Xpred::AbstractMatrix{T2}: NxP matrix with the predicted / altered environmental matrix.
  • nboot::Int=500: Number of bootstrap iterations.
  • Optional arguments:
    • candidates_threshold::Real=0.05: Threshold for the F-test. Better to be permisive.
    • genomic_control::Bool=true: Apply genomic control to the F-test.
    • tw_threshold::Real=0.001: Threshold for the Tracy-Widom test. Better to be conservative (as the number of latent factors is often overconservative).
    • λ::Real=1e-5: Regularization parameter for the LFMM.

Returns

  • A matrix of size NxNboot with the genomic offset values. If no candidate loci are found, the row is filled with zeros.
source
GenomicOffsets.bootstrap_with_candidatesMethod
bootstrap_with_candidates(::Type{GradientForestGO}, rng::Random.AbstractRNG,Y::AbstractMatrix{T1}, X::AbstractMatrix{T2}, Xpred::AbstractMatrix{T2}, nboot::Int=500; ntrees::Int=100, candidates_threshold::Real=0.05, genomic_control::Bool=true, tw_threshold::Real=0.001) where {T1<:Real,T2<:Real}

Compute the Gradient Forest genomic offset using a bootstrap approach. For every, bootstrap iteration, the model is fitted using a random subset of the columns of Y. The Tracy-Widom test is used to estimate the number of latent factors. The F-test is used to select putatively adaptative loci. The genomic offset is computed for the selected loci.

Arguments

  • rng::Random.AbstractRNG: Random number generator. If not provided, the global RNG is used.
  • Y::AbstractMatrix{T1}: NxL matrix with individuals genotypes or allele frequencies.
  • X::AbstractMatrix{T2}: NxP matrix with environmental data.
  • Xpred::AbstractMatrix{T2}: NxP matrix with the predicted / altered environmental matrix.
  • nboot::Int=500: Number of bootstrap iterations.
  • Optional arguments:
    • ntrees::Int=100: Number of trees each forest (one per allele) will have.
    • candidates_threshold::Real=0.05: Threshold for the F-test. Better to be permisive.
    • genomic_control::Bool=true: Apply genomic control to the F-test.
    • tw_threshold::Real=0.001: Threshold for the Tracy-Widom test. Better to be conservative (as the number of latent factors is often overconservative).
    • λ::Real=1e-5: Regularization parameter for the LFMM.

Returns

  • A matrix of size NxNboot with the genomic offset values. If no candidate loci are found, the row is filled with zeros.
source
GenomicOffsets.bootstrap_with_candidatesMethod
bootstrap_with_candidates(::Type{RDAGO}, rng::Random.AbstractRNG, Y::AbstractMatrix{T1}, X::AbstractMatrix{T2}, Xpred::AbstractMatrix{T2}, nboot::Int=500; candidates_threshold::Real=0.05, genomic_control::Bool=true, tw_threshold::Real=0.001) where {T1<:Real, T2<:Real}

Compute the RDA genomic offset using a bootstrap approach. For every, bootstrap iteration, the model is fitted using a random subset of the columns of Y. The Tracy-Widom test is used to estimate the number of latent factors. The F-test is used to select putatively adaptative loci. The genomic offset is computed for the selected loci.

Arguments

  • rng::Random.AbstractRNG: Random number generator. If not provided, the global RNG is used.
  • Y::AbstractMatrix{T1}: NxL matrix with individuals genotypes or allele frequencies.
  • X::AbstractMatrix{T2}: NxP matrix with environmental data.
  • Xpred::AbstractMatrix{T2}: NxP matrix with the predicted / altered environmental matrix.
  • nboot::Int=500: Number of bootstrap iterations.
  • Optional arguments:
    • candidates_threshold::Real=0.05: Threshold for the F-test. Better to be permisive.
    • genomic_control::Bool=true: Apply genomic control to the F-test.
    • tw_threshold::Real=0.001: Threshold for the Tracy-Widom test. Better to be conservative (as the number of latent factors is often overconservative).
    • pratio::Float64=0.99: The ratio of variances preserved in the principal subspace.
    • weighted::Bool=false: If true, the euclidean distance in the projected space is weighted based on the eigenvalue of each axis.
    • λ::Real=1e-5: Regularization parameter for the LFMM.

Returns

  • A matrix of size NxNboot with the genomic offset values. If no candidate loci are found, the row is filled with zeros.
source
GenomicOffsets.bootstrap_with_candidatesMethod
bootstrap_with_candidates(::Type{RONA}, rng::Random.AbstractRNG, Y::AbstractMatrix{T1}, X::AbstractMatrix{T2}, Xpred::AbstractMatrix{T2}, nboot::Int=500; candidates_threshold::Real=0.05, genomic_control::Bool=true, tw_threshold::Real=0.001) where {T1<:Real, T2<:Real}

Compute the genomic offset for the RONA model using a bootstrap approach. For every, bootstrap iteration, the model is fitted using a random subset of the columns of Y. The Tracy-Widom test is used to estimate the number of latent factors. The F-test is used to select putatively adaptative loci. The genomic offset is computed for the selected loci.

Arguments

  • rng::Random.AbstractRNG: Random number generator. If not provided, the global RNG is used.
  • Y::AbstractMatrix{T1}: NxL matrix with individuals genotypes or allele frequencies.
  • X::AbstractMatrix{T2}: NxP matrix with environmental data.
  • Xpred::AbstractMatrix{T2}: NxP matrix with the predicted / altered environmental matrix.
  • nboot::Int=500: Number of bootstrap iterations.
  • Optional arguments:
    • candidates_threshold::Real=0.05: Threshold for the F-test. Better to be permisive.
    • genomic_control::Bool=true: Apply genomic control to the F-test.
    • tw_threshold::Real=0.001: Threshold for the Tracy-Widom test. Better to be conservative (as the number of latent factors is often overconservative).
    • λ::Real=1e-5: Regularization parameter for the LFMM.

Returns

  • A matrix of size NxNboot with the genomic offset values. If no candidate loci are found, the row is filled with zeros.
source
GenomicOffsets.fitMethod
fit(::Type{GeometricGO}, Y::AbstractMatrix{T1}, X::AbstractMatrix{T2}) where {T1<:Real,T2<:Real}

Fit the Geometric genomic offset model (that is, fit a LFMM) using the data Y and X.

Arguments

  • Y::AbstractMatrix{T1}: NxL matrix with individuals genotypes or allele frequencies.
  • X::AbstractMatrix{T}: NxP matrix with environmental data.
  • K: Number of latent factors to use when fitting th e LFMM model. It can be determined using the elbow rule of thumb. If nothing is provided, the Tracy-Widom statistic to determine the number of latent factors (which might be overconservative).
  • Optional arguments:
    • center::Bool=true: Center both the genotype and environmental data. Predicted environmental data is centered using the same mean as X. If false, data is assumed to be centered.
    • scale::Bool=true: Scale the environmental data. Predicted environmental data is scaled using the same standard deviation as X. If false, data is assumed to be scaled.
    • tw_threshold::Real=1e-3: Threshold to determine the number of latent factors if using the Tracy-Widom statistic. By default, it uses 1e-3.
    • λ::Real=1e-5: Regularization parameter for the LFMM.

Returns

  • A GeometricGO model.
source
GenomicOffsets.fitMethod
fit(::Type{GradientForestGO}, Y::AbstractMatrix{T1}, X::AbstractMatrix{T2}; ntrees::Int=100) where {T1<:Real,T2<:Real}

Fit the Gradient forest genomic offset model (that is, fit a Gradient Forest) using the data Y and X.

Arguments

  • Y::AbstractMatrix{T1}: NxL matrix with individuals genotypes or allele frequencies.
  • X::AbstractMatrix{T}: NxP matrix with environmental data.
  • ntrees: Number of trees each forest (one per allele) will have.

Returns

  • A GradientForestGO model.
source
GenomicOffsets.fitMethod
fit(::Type{RDAGO}, Y::AbstractMatrix{T1}, X::AbstractMatrix{T2}) where {T1<:Real,T2<:Real}

Fit the RDA model (that is, apply Redundancy Analysis) using the data Y and X.

Arguments

  • Y::AbstractMatrix{T1}: NxL matrix with individuals genotypes or allele frequencies.
  • X::AbstractMatrix{T}: NxP matrix with environmental data.
  • Optional arguments:
    • K::Int=size(Y,2): Number of main canonical axes to retain. The final number of axes is determined either by K or pratio.
    • pratio::Float64=0.99: The ratio of variances preserved in the principal subspace.

Returns

  • A RDAGO model.
source
GenomicOffsets.fitMethod
fit(::Type{RONA}, Y::AbstractMatrix{T1}, X::AbstractMatrix{T2}) where {T1<:Real,T2<:Real}

Fit the RONA model (that is, solve the least squares problem) using the data Y and X.

Arguments

  • Y::AbstractMatrix{T1}: NxL matrix with individuals genotypes or allele frequencies.
  • X::AbstractMatrix{T}: NxP matrix with environmental data.

Returns

  • A RONA model.
source
GenomicOffsets.genomic_offsetMethod
genomic_offset(model::GeometricGO, X::AbstractMatrix{T}, Xpred::AbstractMatrix{T}) where T<:Real

Compute the Geometric genomic offset.

Arguments

  • model::GeometricGO: A GeometricGO model.
  • X::AbstractMatrix{T}: NxP matrix with environmental data. Data will be scaled and centered if the model was fitted with center and scale set to true.
  • Xpred::Matrix{T}: NxP matrix with the predicted / altered environmental matrix. Data will be scaled and centered if the model was fitted with center and scale set to true.
  • candidates::AbstractVector{Integer}: Indices of the candidate loci.

Returns

  • A Vector{Float64} of length N with the Geometric genomic offset values.
source
GenomicOffsets.genomic_offsetMethod
genomic_offset(model::GeometricGO, X::AbstractMatrix{T}, Xpred::AbstractMatrix{T}) where T<:Real

Compute the Geometric genomic offset.

Arguments

  • model::GeometricGO: A GeometricGO model.
  • X::AbstractMatrix{T}: NxP matrix with environmental data. Data will be scaled and centered if the model was fitted with center and scale set to true.
  • Xpred::Matrix{T}: NxP matrix with the predicted / altered environmental matrix. Data will be scaled and centered if the model was fitted with center and scale set to true.

Returns

  • A Vector{Float64} of length N with the Geometric genomic offset values.
source
GenomicOffsets.genomic_offsetMethod
genomic_offset(model::GradientForestGO, X::AbstractMatrix{T}, Xpred::AbstractMatrix{T}) where T<:Real

Compute the Gradient Forest genomic offset. Attention! Our implementation does not perform extrapolation, which means that you may obtain unpredictable results if you try so.

Arguments

  • model::GradientForestGO: A GeometricGO model.
  • X::AbstractMatrix{T}: NxP matrix with environmental data. Data will be scaled and centered if the model was fitted with center and scale set to true.
  • Xpred::Matrix{T}: NxP matrix with the predicted / altered environmental matrix. Data will be scaled and centered if the model was fitted with center and scale set to true.

Returns

  • A Vector{Float64} of length N with the Gradient forest genomic offset values.
source
GenomicOffsets.genomic_offsetMethod
genomic_offset(model::RDAGO, X::AbstractMatrix{T}, Xpred::AbstractMatrix{T}) where T<:Real

Compute the RDA genomic offset.

Arguments

  • model::RDAGO: A RDAGO model.
  • X::AbstractMatrix{T}: NxP matrix with environmental data.
  • Xpred::Matrix{T}: NxP matrix with the predicted / altered environmental matrix.
  • weighted::Bool=false: If true, the euclidean distance in the projected space is weighted based on the eigenvalue of each axis.

Returns

  • A Vector{Float64} of length N with the RDAGO genomic offset values.
source
GenomicOffsets.genomic_offsetMethod
genomic_offset(model::RONA, X::AbstractMatrix{T}, Xpred::AbstractMatrix{T}) where T<:Real

Compute the genomic offset for the RONA model.

Arguments

  • model::RONA: A RONA model.
  • X::AbstractMatrix{T}: NxP matrix with environmental data.
  • Xpred::Matrix{T}: NxP matrix with the predicted / altered environmental matrix.

Returns

  • A Vector{Float64} of length N with the RONA genomic offset values.
source