2023-06-24
We have a de novo assembly from HiFi, but it is highly fragmented
\[ Y = \begin{cases} 0, & \textrm{with probability } p \\ \textrm{Poisson}(\lambda), & \textrm{with probability } 1-p \end{cases} \]
From 1.87 Mb (n=133) to 48.92 Mb (n=8)
BUSCO looks for the presence of nearly-universal orthologous genes
Completeness score: 96.4%1 (using Eudicots dataset)
Disclaimer
Complete and duplicated genes is high (31.4%)
Recent genome duplication event?
Chimeric assembly of haplotypes?
Genome assembly quality evaluation: repetitive sequences
Validation with experimental data: transcriptome
Improve assembly: alternative reference-based scaffold algorithms
Phylogenetic analysis: thyme ecology
has to be done
HiFiasm assembled \(\sim 1.7\cdot 10^6\) long reads into \(\sim1.900\) contigs
\(N50= 1.87\)Mb
Important
Highly fragmented, but within the expectation for highly repetitive plant genomes
RagTag infers the gap size between two adjacent sequences, \(\textrm{seq1}\) and \(\textrm{seq2}\), according to Equation
\[ \textrm{gapsize} = \left(\textrm{aln2}_\textrm{rs} - \textrm{aln2}_\textrm{qs}\right) - \left(\textrm{aln1}_\textrm{re} - \textrm{aln1}_\textrm{qe} + \textrm{len}(\textrm{seq1}\right)) \]
\[ Y = \begin{cases} 0, & \textrm{with probability } p \\ \textrm{Poisson}(\lambda), & \textrm{with probability } 1-p \end{cases} \]
Two hybridization conditions to enrich certain sequences: 15% and 5% of nucleotide divergence (Bataillon et al. 2022)
Project in Bioinformatics | MSc. in Bioinformatics at Aarhus University