In Depth: Mixed Basecalling and Simple Mixed
Basecalling
(This page mirrors the essence of https://www.nucleics.com/in-depth-mixed-basecalling-and-simple-mixed-basecalling/)
April 12, 2016 By
Daniel
One
of the major internal changes within Auto PeakTrace
6 is an improvement to how mixed basecalling (detection of polymorphic
sites) is performed. PeakTrace 5 used a
similar algorithm to the KB Basecaller. Both
called a base as mixed if the secondary signal beneath the primary peak of the
processed data exceeds the mixed peak threshold. For example, if the mixed
peak threshold is set to 30% then any peak with secondary signal greater
than 30% of the primary peak height is called as a mixed base. Mixed
basecalling is switched on by using a mixed peak threshold of greater
than 0.
Figure 1.
Polymorphic region basecalled with KB with a mixed
peak threshold of 30%
While
this approach closely replicates how KB works, it does suffer from two
major issues. First, the primary and secondary peaks do not always align
perfectly. This can result in secondary peaks that are more than the threshold
not being called as mixed if the secondary peak location is offset from the
primary. Second, the relative processed peak heights do not always correspond
to the relative signal strength seen in the raw data channel. This is an issue
for both KB and PeakTrace 5, but
it is a larger issue with PeakTrace 5.
To
overcome both of these problems the algorithm used for mixed basecalling in PeakTrace 6 was improved to ensure that the
secondary and primary peaks are aligned before making the relative signal
height comparisons. The new algorithm also uses the relative signal heights
from the raw data channel to make the peak height comparisons.
Figure 2.
Polymorphic region basecalled using
PeakTrace 5 and a 30% threshold.
Figure 3.
Polymorphic region basecalled using
PeakTrace 6 and a 30% threshold.
The
effect of these changes can be seen in Figures 2 & 3. The mixed
bases at positions 196 and 197 were not detected as mixed basecalls
by PeakTrace 5 due to the peak offsets
and secondary peak height suppression in the processed data channel. These
problems do not occur with the new mixed basecall
algorithm introduced in PeakTrace 6 and
both positions are now called as mixed.
This
problem of secondary peak height suppression is very obvious when the no
peak resolution PeakTrace output is used (Figure
4). This setting outputs the raw data smoothed, baselined and mobility
shifted. The peaks while not as well resolved in the late regions of the trace
show the raw data relative peak heights.
Figure 4.
Polymorphic region basecalled using
no peak resolution and a 30% threshold.
If
desired the old PeakTrace 5 mixed
basecalling module can be used by checking simple mixed basecalling in
the Additional Options window along with using the appropriate mixed
peak threshold. This will make PeakTrace
6 perform the same mixed basecalling as was done in PeakTrace
5.
Potential Pitfalls with Mixed Basecalling
The
major factor to be aware of when using the new PeakTrace
6 mixed basecall setting is the best mixed
peak threshold to use will likely have changed. If you had previously been
using a mixed peak threshold value of 50% then you may wish to reduce
the value to 30%. It is hard to give a precise mixed peak threshold
level to use for mixed basecalling, but setting the mixed peak threshold
high (over 50%) will result in fewer false positives and the expense of more
false negatives and vice versa. A good starting point for the mixed peak
threshold is 30%, but you may wish to change this threshold on the basis of
your needs.