In Depth: Mixed Basecalling and Simple Mixed Basecalling

(This page mirrors the essence of https://www.nucleics.com/in-depth-mixed-basecalling-and-simple-mixed-basecalling/)

April 12, 2016 By Daniel

One of the major internal changes within Auto PeakTrace 6 is an improvement to how mixed basecalling (detection of polymorphic sites) is performed. PeakTrace 5 used a similar algorithm to the KB Basecaller. Both called a base as mixed if the secondary signal beneath the primary peak of the processed data exceeds the mixed peak threshold. For example, if the mixed peak threshold is set to 30% then any peak with secondary signal greater than 30% of the primary peak height is called as a mixed base. Mixed basecalling is switched on by using a mixed peak threshold of greater than 0.

Figure 1. Polymorphic region basecalled with KB.

Figure 1. Polymorphic region basecalled with KB with a mixed peak threshold of 30%

While this approach closely replicates how KB works, it does suffer from two major issues. First, the primary and secondary peaks do not always align perfectly. This can result in secondary peaks that are more than the threshold not being called as mixed if the secondary peak location is offset from the primary. Second, the relative processed peak heights do not always correspond to the relative signal strength seen in the raw data channel. This is an issue for both KB and PeakTrace 5, but it is a larger issue with PeakTrace 5.

To overcome both of these problems the algorithm used for mixed basecalling in PeakTrace 6 was improved to ensure that the secondary and primary peaks are aligned before making the relative signal height comparisons. The new algorithm also uses the relative signal heights from the raw data channel to make the peak height comparisons.

Figure 2. Polymorphic region basecalled using PeakTrace 5 and a 30% threshold.

Figure 2. Polymorphic region basecalled using PeakTrace 5 and a 30% threshold.

Figure 3. Polymorphic region basecalled using PeakTrace 6 and a 30% threshold.

Figure 3. Polymorphic region basecalled using PeakTrace 6 and a 30% threshold.

The effect of these changes can be seen in Figures 2 & 3. The mixed bases at positions 196 and 197 were not detected as mixed basecalls by PeakTrace 5 due to the peak offsets and secondary peak height suppression in the processed data channel. These problems do not occur with the new mixed basecall algorithm introduced in PeakTrace 6 and both positions are now called as mixed.

This problem of secondary peak height suppression is very obvious when the no peak resolution PeakTrace output is used (Figure 4). This setting outputs the raw data smoothed, baselined and mobility shifted. The peaks while not as well resolved in the late regions of the trace show the raw data relative peak heights.

Figure 4. Polymorphic region basecalled using no peak resolution and a 30% threshold.

Figure 4. Polymorphic region basecalled using no peak resolution and a 30% threshold.

If desired the old PeakTrace 5 mixed basecalling module can be used by checking simple mixed basecalling in the Additional Options window along with using the appropriate mixed peak threshold. This will make PeakTrace 6 perform the same mixed basecalling as was done in PeakTrace 5.

Potential Pitfalls with Mixed Basecalling

The major factor to be aware of when using the new PeakTrace 6 mixed basecall setting is the best mixed peak threshold to use will likely have changed. If you had previously been using a mixed peak threshold value of 50% then you may wish to reduce the value to 30%. It is hard to give a precise mixed peak threshold level to use for mixed basecalling, but setting the mixed peak threshold high (over 50%) will result in fewer false positives and the expense of more false negatives and vice versa. A good starting point for the mixed peak threshold is 30%, but you may wish to change this threshold on the basis of your needs.