2023-06-24
Figure 1: Spatial distribution of Thymus vulgaris phenolic (black) and non-phenolic (yellow) chemotypes. Obtained from Thompson (2020)
We have a de novo assembly from HiFi, but it is highly fragmented
\[ Y = \begin{cases} 0, & \textrm{with probability } p \\ \textrm{Poisson}(\lambda), & \textrm{with probability } 1-p \end{cases} \]
Figure 3: Synteny-like plot of CM044164.1 (only four T. vulgaris contigs)
Figure 4: Comparison of genome assembly size
From 1.87 Mb (n=133) to 48.92 Mb (n=8)
Figure 5: For historical context, N50 statistics of the published plant genomes. Obtained from Sun et al. (2022)
BUSCO looks for the presence of nearly-universal orthologous genes
Completeness score: 96.4%1 (using Eudicots dataset)
Disclaimer
Complete and duplicated genes is high (31.4%)
Recent genome duplication event?
Chimeric assembly of haplotypes?
Genome assembly quality evaluation: repetitive sequences
Validation with experimental data: transcriptome
Improve assembly: alternative reference-based scaffold algorithms
Phylogenetic analysis: thyme ecology
has to be done
Figure 6: Evolution of the number of published Chromosome-level genome assemblies. Data was retrieved from NCBI on 6/15/2023.
Obtained from Wikipedia
HiFiasm assembled \(\sim 1.7\cdot 10^6\) long reads into \(\sim1.900\) contigs
\(N50= 1.87\)Mb
Important
Highly fragmented, but within the expectation for highly repetitive plant genomes
Figure 8: Covered areas of T. quinquecostatus in the Whole-genome alignment
RagTag infers the gap size between two adjacent sequences, \(\textrm{seq1}\) and \(\textrm{seq2}\), according to Equation
\[ \textrm{gapsize} = \left(\textrm{aln2}_\textrm{rs} - \textrm{aln2}_\textrm{qs}\right) - \left(\textrm{aln1}_\textrm{re} - \textrm{aln1}_\textrm{qe} + \textrm{len}(\textrm{seq1}\right)) \]
\[ Y = \begin{cases} 0, & \textrm{with probability } p \\ \textrm{Poisson}(\lambda), & \textrm{with probability } 1-p \end{cases} \]
Figure 9: Sampled posterior distribution of parameters for pseudo-chromosome CM044164.1 using MCMC
Two hybridization conditions to enrich certain sequences: 15% and 5% of nucleotide divergence (Bataillon et al. 2022)
Project in Bioinformatics | MSc. in Bioinformatics at Aarhus University