ProbCons

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

In bioinformatics and proteomics, ProbCons is an open source software for probabilistic consistency-based multiple alignment of amino acid sequences. It is one of the most efficient protein multiple sequence alignment programs, since it has repeatedly demonstrated a statistically significant advantage in accuracy over similar tools, including Clustal and MAFFT.[1][2]

Algorithm

[edit | edit source]

The following describes the basic outline of the ProbCons algorithm.[3]

Step 1: Reliability of an alignment edge

[edit | edit source]

For every pair of sequences compute the probability that letters xi and yi are paired in a* an alignment that is generated by the model.

P(xiyi|x,y) =def Pr[xiyi in some a|x,y]= alignment awith xiyiPr[a|x,y]= alignment a𝟏{xiyia}Pr[a|x,y]

(Where 𝟏{xiyia} is equal to 1 if xi and yi are in the alignment and 0 otherwise.)

Step 2: Maximum expected accuracy

[edit | edit source]

The accuracy of an alignment a* with respect to another alignment a is defined as the number of common aligned pairs divided by the length of the shorter sequence.

Calculate expected accuracy of each sequence:

EPr[a|x,y](acc(a*,a))=aPr[a|x,y]acc(a*,a)=1min(|x|,|y|)a𝟏{xiyia}Pr[a|x,y]=1min(|x|,|y|)xiyiP(xiyj|x,y)

This yields a maximum expected accuracy (MEA) alignment:

E(x,y)=argmaxa*EPr[a|x,y](acc(a*,a))

Step 3: Probabilistic Consistency Transformation

[edit | edit source]

All pairs of sequences x,y from the set of all sequences 𝒮 are now re-estimated using all intermediate sequences z:

P(xiyi|x,y)=1|𝒮|z1k|z|P(xizi|x,z)P(ziyi|z,y)

This step can be iterated.

Step 4: Computation of guide tree

[edit | edit source]

Construct a guide tree by hierarchical clustering using MEA score as sequence similarity score. Cluster similarity is defined using weighted average over pairwise sequence similarity.

Step 5: Compute MSA

[edit | edit source]

Finally compute the MSA using progressive alignment or iterative alignment.

See also

[edit | edit source]

References

[edit | edit source]
  1. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  2. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  3. ^ Lecture "Bioinformatics II" at University of Freiburg
[edit | edit source]