GenomicOffsets.jl Documentation
Genomic offsets (GO) statistics measure current individuals/populations' maladaptation in the event of a drastic climate change. These metrics are obtained from Genotype environment associations (GEA).
This Julia package provides efficient implementations of the four most popular metrics: RONA, RDA GO, Gradient Forest GO and Geometric GO. In addition, we have implemented an F-statistical test based on the Linear Mixed Latent Factor model (LFMM) to perform the identification of GEA candidates through hypothesis testing and a bootstrap approach that resamples loci, performs the GEA candidates identification and computes the desired genomic offset metric.
We have documented several "recipes" in both Julia and R in the README file of the repository (so please take a look there and use this documentation as a reference for all available functions).
TL;TR
You can compute different genomic offsets by fitting different models and computing the genomic offset.
# Julia
using GenomicOffsets
Y, X, Xpred = godataset
rona = fit(RONA, Y, X)
genomic_offset(rona, X, Xpred)
100×1 Matrix{Float64}:
0.04930412355509537
0.09685110436545971
0.031955998579532574
0.057903936496619686
0.06747793104523755
0.07838283024137747
0.034071076534702927
0.05671776002159728
0.11209514598766128
0.050635672102255044
⋮
0.05847056037198923
0.044459426014584816
0.1000681402606062
0.076366019095105
0.0779261159151334
0.09861395620054535
0.06261235895467696
0.05449853921861244
0.07504407351624044
You can fit LFMM models to do hypothesis testing and identify GEA candidates.
# Julia
lfmm = RidgeLFMM(Y, X, 1)
pvalues = LFMM_Ftest(lfmm, Y, X; genomic_control=true)
# Bonferroni correction at 0.05
candidates = findall(pvalues .< 0.05 / size(Y, 2))
90-element Vector{Int64}:
52
86
105
176
211
233
284
323
412
419
⋮
4107
4319
4383
4688
4831
4844
4907
4973
5124
Yo can compute bootstrapping confidence intervals for different genomic offset metrics using an approach we propose that sample loci and identifies GEA candidates at each iteration:
using Random, StatsBase
nboots = 100
offsets = bootstrap_with_candidates(GeometricGO, Y, X, Xpred, nboots;candidates_threshold=0.05)
@assert size(offsets) == (size(Y, 1), nboots)
confint95 = [quantile(ind, [0.025, 0.975]) for ind in eachrow(offsets)]
100-element Vector{Vector{Float64}}:
[0.02303280401566295, 0.03151558990536644]
[0.1047729314923714, 0.15591623894136308]
[0.011123446136442448, 0.015183760680373653]
[0.0386068546151271, 0.05458975227698637]
[0.04292727312732361, 0.06134203909699572]
[0.0928716599812431, 0.13115510306701367]
[0.014603083635330173, 0.020300992040981226]
[0.03722872857179291, 0.04973468787500883]
[0.08909070888028804, 0.1308675409351474]
[0.03431001066987424, 0.05007488604751634]
⋮
[0.021821315064466877, 0.029341761345400032]
[0.029222879302603963, 0.04309757900855938]
[0.1397961718143955, 0.2056429635597111]
[0.04246632188729315, 0.06550399094959608]
[0.048245602602202856, 0.07065170052663027]
[0.05532866396795552, 0.07894162577477457]
[0.04981200290106696, 0.07009733123191236]
[0.02786753998320747, 0.04076474730946965]
[0.06894502320101961, 0.10013528792399119]
References
GenomicOffsets.GeometricGO
— TypeGeometric genomic offset. TODO: Add reference
GenomicOffsets.GradientForestGO
— TypeGradient genomic offset. TODO: Add reference
GenomicOffsets.LFMM
— TypeLFMM{T<:Real}
A struct to store the results of the Linear Fixed Effects Mixed Model (LFMM) as described by Caye et al. (2019).
\[Y = X B^T + W = X B^T + U V^T\]
GenomicOffsets.RDAGO
— TypeRedundancy Analysis (RDA) genomic offset. TODO: Add reference
GenomicOffsets.RONA
— TypeRisk of non-adaptnesss (RONA) genomic offset. First proposed by Rellstab et al. (2016).
GenomicOffsets.LFMM_Ftest
— MethodLFMM_Ftest(model::LFMM, Y::AbstractMatrix{T1}, X::AbstractMatrix{T2}; genomic_control::Bool=true, center=true) where {T1<:Real,T2<:Real}
Compute F-Statistic (F-test) for the Linear Fixed Effects Mixed Model (LFMM). TODO: Add reference & description.
Arguments
model
: ALFMM{T<:Real}
data structure (obtained fromRidgeLFMM
).Y
: A centered genotype matrix of size NxL.X
: A centered environmental matrix of size NxP.genomic_control
: If true, apply genomic control to the F-statistic.center
: If true, both the genotype matrix and the environmental matrix are centered. If false, both matrices are assumed to be centered.
Returns
- A vector of p-values for loci in the genotype matrix.
GenomicOffsets.LFMM_Ftest
— MethodLFMM_Ftest(model::LFMM, Y::AbstractMatrix{T1}, X::AbstractMatrix{T2}; genomic_control::Bool=true, center=true) where {T1<:Real,T2<:Real}
Compute F-Statistic (F-test) for the Linear Fixed Effects Mixed Model (LFMM). TODO: Add reference & description.
Arguments
model
: ALFMM{T<:Real}
data structure (obtained fromRidgeLFMM
).Y
: A centered genotype matrix of size NxL.X
: A centered environmental matrix of size NxP.genomic_control
: If true, apply genomic control to the F-statistic.center
: If true, both the genotype matrix and the environmental matrix are centered. If false, both matrices are assumed to be centered.
Returns
- A vector of p-values for loci in the genotype matrix.
GenomicOffsets.RidgeLFMM
— MethodRidgeLFMM(Y::Matrix{T}, X::Matrix{T}, K::Int, λ::Float64) where T<:Real
Ridge solutions for the Linear Fixed Effects Mixed Model (LFMM) as described by Caye et al. (2019).
\[Y = X B^T + W = X B^T + U V^T\]
Arguments
Y
: A centered genotype matrix of size NxL.X
: Environmental matrix of size NxP.K
: Number of latent factors.λ
: Regularization parameter (by 1e-5)center
: If true, both the genotype matrix and the environmental matrix are centered. If false, both matrices are assumed to be centered.
Returns
- A
LFMM{T<:Real}
data structure with the latent factorsU
,Vt
, and the effect sizesBt
.
GenomicOffsets.TracyWidom
— MethodTracyWidom(eigenvalues::AbstractVector{T}; sort_eigenvalues::Bool=true) where T<:Real
Compute the Tracy-Widom statistics and p-values for a given set of eigenvalues. TODO: add reference.
Arguments
eigenvalues::AbstractVector{T}
: A vector of eigenvalues.sort_eigenvalues::Bool=true
: If true, sort the eigenvalues in descending order. If false, the eigenvalues are assumed to be sorted in descending order.
Returns
- A tuple with the Tracy-Widom statistics and p-values. For a given threshold, it is advised not to keep all eigenvalues below the threshold, but up to the first one that is above the threshold (not included).
GenomicOffsets.bootstrap_with_candidates
— Methodbootstrap_with_candidates(::Type{GeometricGO}, rng::Random.AbstractRNG,Y::AbstractMatrix{T1}, X::AbstractMatrix{T2},Xpred::AbstractMatrix{T2}, nboot::Int=500; candidates_threshold::Real=0.05, genomic_control::Bool=true, tw_threshold::Real=0.001) where {T1<:Real,T2<:Real}
Compute the Geometric genomic offset using a bootstrap approach. For every, bootstrap iteration, the model is fitted using a random subset of the columns of Y
. The Tracy-Widom test is used to estimate the number of latent factors. The F-test is used to select putatively adaptative loci. The genomic offset is computed for the selected loci.
Arguments
rng::Random.AbstractRNG
: Random number generator. If not provided, the global RNG is used.Y::AbstractMatrix{T1}
: NxL matrix with individuals genotypes or allele frequencies.X::AbstractMatrix{T2}
: NxP matrix with environmental data.Xpred::AbstractMatrix{T2}
: NxP matrix with the predicted / altered environmental matrix.nboot::Int=500
: Number of bootstrap iterations.- Optional arguments:
candidates_threshold::Real=0.05
: Threshold for the F-test. Better to be permisive.genomic_control::Bool=true
: Apply genomic control to the F-test.tw_threshold::Real=0.001
: Threshold for the Tracy-Widom test. Better to be conservative (as the number of latent factors is often overconservative).- λ::Real=1e-5: Regularization parameter for the LFMM.
Returns
- A matrix of size NxNboot with the genomic offset values. If no candidate loci are found, the row is filled with zeros.
GenomicOffsets.bootstrap_with_candidates
— Methodbootstrap_with_candidates(::Type{GradientForestGO}, rng::Random.AbstractRNG,Y::AbstractMatrix{T1}, X::AbstractMatrix{T2}, Xpred::AbstractMatrix{T2}, nboot::Int=500; ntrees::Int=100, candidates_threshold::Real=0.05, genomic_control::Bool=true, tw_threshold::Real=0.001) where {T1<:Real,T2<:Real}
Compute the Gradient Forest genomic offset using a bootstrap approach. For every, bootstrap iteration, the model is fitted using a random subset of the columns of Y
. The Tracy-Widom test is used to estimate the number of latent factors. The F-test is used to select putatively adaptative loci. The genomic offset is computed for the selected loci.
Arguments
rng::Random.AbstractRNG
: Random number generator. If not provided, the global RNG is used.Y::AbstractMatrix{T1}
: NxL matrix with individuals genotypes or allele frequencies.X::AbstractMatrix{T2}
: NxP matrix with environmental data.Xpred::AbstractMatrix{T2}
: NxP matrix with the predicted / altered environmental matrix.nboot::Int=500
: Number of bootstrap iterations.- Optional arguments:
ntrees::Int=100
: Number of trees each forest (one per allele) will have.candidates_threshold::Real=0.05
: Threshold for the F-test. Better to be permisive.genomic_control::Bool=true
: Apply genomic control to the F-test.tw_threshold::Real=0.001
: Threshold for the Tracy-Widom test. Better to be conservative (as the number of latent factors is often overconservative).- λ::Real=1e-5: Regularization parameter for the LFMM.
Returns
- A matrix of size NxNboot with the genomic offset values. If no candidate loci are found, the row is filled with zeros.
GenomicOffsets.bootstrap_with_candidates
— Methodbootstrap_with_candidates(::Type{RDAGO}, rng::Random.AbstractRNG, Y::AbstractMatrix{T1}, X::AbstractMatrix{T2}, Xpred::AbstractMatrix{T2}, nboot::Int=500; candidates_threshold::Real=0.05, genomic_control::Bool=true, tw_threshold::Real=0.001) where {T1<:Real, T2<:Real}
Compute the RDA genomic offset using a bootstrap approach. For every, bootstrap iteration, the model is fitted using a random subset of the columns of Y
. The Tracy-Widom test is used to estimate the number of latent factors. The F-test is used to select putatively adaptative loci. The genomic offset is computed for the selected loci.
Arguments
rng::Random.AbstractRNG
: Random number generator. If not provided, the global RNG is used.Y::AbstractMatrix{T1}
: NxL matrix with individuals genotypes or allele frequencies.X::AbstractMatrix{T2}
: NxP matrix with environmental data.Xpred::AbstractMatrix{T2}
: NxP matrix with the predicted / altered environmental matrix.nboot::Int=500
: Number of bootstrap iterations.- Optional arguments:
candidates_threshold::Real=0.05
: Threshold for the F-test. Better to be permisive.genomic_control::Bool=true
: Apply genomic control to the F-test.tw_threshold::Real=0.001
: Threshold for the Tracy-Widom test. Better to be conservative (as the number of latent factors is often overconservative).pratio::Float64=0.99
: The ratio of variances preserved in the principal subspace.weighted::Bool=false
: If true, the euclidean distance in the projected space is weighted based on the eigenvalue of each axis.- λ::Real=1e-5: Regularization parameter for the LFMM.
Returns
- A matrix of size NxNboot with the genomic offset values. If no candidate loci are found, the row is filled with zeros.
GenomicOffsets.bootstrap_with_candidates
— Methodbootstrap_with_candidates(::Type{RONA}, rng::Random.AbstractRNG, Y::AbstractMatrix{T1}, X::AbstractMatrix{T2}, Xpred::AbstractMatrix{T2}, nboot::Int=500; candidates_threshold::Real=0.05, genomic_control::Bool=true, tw_threshold::Real=0.001) where {T1<:Real, T2<:Real}
Compute the genomic offset for the RONA model using a bootstrap approach. For every, bootstrap iteration, the model is fitted using a random subset of the columns of Y
. The Tracy-Widom test is used to estimate the number of latent factors. The F-test is used to select putatively adaptative loci. The genomic offset is computed for the selected loci.
Arguments
rng::Random.AbstractRNG
: Random number generator. If not provided, the global RNG is used.Y::AbstractMatrix{T1}
: NxL matrix with individuals genotypes or allele frequencies.X::AbstractMatrix{T2}
: NxP matrix with environmental data.Xpred::AbstractMatrix{T2}
: NxP matrix with the predicted / altered environmental matrix.nboot::Int=500
: Number of bootstrap iterations.- Optional arguments:
candidates_threshold::Real=0.05
: Threshold for the F-test. Better to be permisive.genomic_control::Bool=true
: Apply genomic control to the F-test.tw_threshold::Real=0.001
: Threshold for the Tracy-Widom test. Better to be conservative (as the number of latent factors is often overconservative).- λ::Real=1e-5: Regularization parameter for the LFMM.
Returns
- A matrix of size NxNboot with the genomic offset values. If no candidate loci are found, the row is filled with zeros.
GenomicOffsets.fit
— Methodfit(::Type{GeometricGO}, Y::AbstractMatrix{T1}, X::AbstractMatrix{T2}) where {T1<:Real,T2<:Real}
Fit the Geometric genomic offset model (that is, fit a LFMM) using the data Y
and X
.
Arguments
Y::AbstractMatrix{T1}
: NxL matrix with individuals genotypes or allele frequencies.X::AbstractMatrix{T}
: NxP matrix with environmental data.K
: Number of latent factors to use when fitting th e LFMM model. It can be determined using the elbow rule of thumb. If nothing is provided, the Tracy-Widom statistic to determine the number of latent factors (which might be overconservative).- Optional arguments:
center::Bool=true
: Center both the genotype and environmental data. Predicted environmental data is centered using the same mean as X. If false, data is assumed to be centered.scale::Bool=true
: Scale the environmental data. Predicted environmental data is scaled using the same standard deviation as X. If false, data is assumed to be scaled.- tw_threshold::Real=1e-3: Threshold to determine the number of latent factors if using the Tracy-Widom statistic. By default, it uses 1e-3.
- λ::Real=1e-5: Regularization parameter for the LFMM.
Returns
- A GeometricGO model.
GenomicOffsets.fit
— Methodfit(::Type{GradientForestGO}, Y::AbstractMatrix{T1}, X::AbstractMatrix{T2}; ntrees::Int=100) where {T1<:Real,T2<:Real}
Fit the Gradient forest genomic offset model (that is, fit a Gradient Forest) using the data Y
and X
.
Arguments
Y::AbstractMatrix{T1}
: NxL matrix with individuals genotypes or allele frequencies.X::AbstractMatrix{T}
: NxP matrix with environmental data.ntrees
: Number of trees each forest (one per allele) will have.
Returns
- A GradientForestGO model.
GenomicOffsets.fit
— Methodfit(::Type{RDAGO}, Y::AbstractMatrix{T1}, X::AbstractMatrix{T2}) where {T1<:Real,T2<:Real}
Fit the RDA model (that is, apply Redundancy Analysis) using the data Y
and X
.
Arguments
Y::AbstractMatrix{T1}
: NxL matrix with individuals genotypes or allele frequencies.X::AbstractMatrix{T}
: NxP matrix with environmental data.- Optional arguments:
K::Int=size(Y,2)
: Number of main canonical axes to retain. The final number of axes is determined either by K or pratio.pratio::Float64=0.99
: The ratio of variances preserved in the principal subspace.
Returns
- A RDAGO model.
GenomicOffsets.fit
— Methodfit(::Type{RONA}, Y::AbstractMatrix{T1}, X::AbstractMatrix{T2}) where {T1<:Real,T2<:Real}
Fit the RONA model (that is, solve the least squares problem) using the data Y
and X
.
Arguments
Y::AbstractMatrix{T1}
: NxL matrix with individuals genotypes or allele frequencies.X::AbstractMatrix{T}
: NxP matrix with environmental data.
Returns
- A RONA model.
GenomicOffsets.genomic_offset
— Methodgenomic_offset(model::GeometricGO, X::AbstractMatrix{T}, Xpred::AbstractMatrix{T}) where T<:Real
Compute the Geometric genomic offset.
Arguments
model::GeometricGO
: A GeometricGO model.X::AbstractMatrix{T}
: NxP matrix with environmental data. Data will be scaled and centered if the model was fitted with center and scale set to true.Xpred::Matrix{T}
: NxP matrix with the predicted / altered environmental matrix. Data will be scaled and centered if the model was fitted with center and scale set to true.candidates::AbstractVector{Integer}
: Indices of the candidate loci.
Returns
- A Vector{Float64} of length N with the Geometric genomic offset values.
GenomicOffsets.genomic_offset
— Methodgenomic_offset(model::GeometricGO, X::AbstractMatrix{T}, Xpred::AbstractMatrix{T}) where T<:Real
Compute the Geometric genomic offset.
Arguments
model::GeometricGO
: A GeometricGO model.X::AbstractMatrix{T}
: NxP matrix with environmental data. Data will be scaled and centered if the model was fitted with center and scale set to true.Xpred::Matrix{T}
: NxP matrix with the predicted / altered environmental matrix. Data will be scaled and centered if the model was fitted with center and scale set to true.
Returns
- A Vector{Float64} of length N with the Geometric genomic offset values.
GenomicOffsets.genomic_offset
— Methodgenomic_offset(model::GradientForestGO, X::AbstractMatrix{T}, Xpred::AbstractMatrix{T}) where T<:Real
Compute the Gradient Forest genomic offset. Attention! Our implementation does not perform extrapolation, which means that you may obtain unpredictable results if you try so.
Arguments
model::GradientForestGO
: A GeometricGO model.X::AbstractMatrix{T}
: NxP matrix with environmental data. Data will be scaled and centered if the model was fitted with center and scale set to true.Xpred::Matrix{T}
: NxP matrix with the predicted / altered environmental matrix. Data will be scaled and centered if the model was fitted with center and scale set to true.
Returns
- A Vector{Float64} of length N with the Gradient forest genomic offset values.
GenomicOffsets.genomic_offset
— Methodgenomic_offset(model::RDAGO, X::AbstractMatrix{T}, Xpred::AbstractMatrix{T}) where T<:Real
Compute the RDA genomic offset.
Arguments
model::RDAGO
: A RDAGO model.X::AbstractMatrix{T}
: NxP matrix with environmental data.Xpred::Matrix{T}
: NxP matrix with the predicted / altered environmental matrix.weighted::Bool=false
: If true, the euclidean distance in the projected space is weighted based on the eigenvalue of each axis.
Returns
- A Vector{Float64} of length N with the RDAGO genomic offset values.
GenomicOffsets.genomic_offset
— Methodgenomic_offset(model::RONA, X::AbstractMatrix{T}, Xpred::AbstractMatrix{T}) where T<:Real
Compute the genomic offset for the RONA model.
Arguments
model::RONA
: A RONA model.X::AbstractMatrix{T}
: NxP matrix with environmental data.Xpred::Matrix{T}
: NxP matrix with the predicted / altered environmental matrix.
Returns
- A Vector{Float64} of length N with the RONA genomic offset values.