Fastcov - Fast Multiple Covariance Detector v1.03

Usage

Name:
  fastcov V1.03 -- Fast Multiple Covariance Detector
  http://yanlilab.github.io/fastcov

Authors:
  Yan Li   <liyan.com@gmail.com>
  Wei Shen <shenwei356@gmail.com>

Usage:
  fastcov [options] inputfile

Available Options:
  -p FLOAT                  minimum pairing purity of two sites [0.7]
  -r FLOAT                  minimum matching ratio of to the pattern [0.45]
  -n INT                    minimum residue number at each site [5]
  -c FLOAT                  minimum proportion of any sequence identical to the
                            consensus [0.33]
  -o STRING                 prefix of output files [inputfile]
  -j INT                    CPU number [CPU number of your computer]
  -h, --help                show this help message

Copyright:
  Copyright © 2015-2016, All Rights Reserved
  This software is free to distribute for academic research.

Positional arguments

Options

Main algorithm parameters

Sequences filter criteria

Output

Performance

Examples

Taking examples/ABCD_RT_M.aligned.fas for example.

Quik run:

fastcov ABCD_RT_M.aligned.fas

Terminal output:

Input: ABCD_RT_M.aligned.fas

Step 1/5: Reading sequences


Done

Step 2/5: Searching candidate sites

Done

Step 3/5: Searching independent pairs
21115 / 21115 [===================================================================================] 100.00 % 28s

Covariant site pairs saved to file: ABCD_RT_M.aligned.fas.pairs

Done

Step 4/5: Searching covariant patterns
52 / 52 [===========================================================================================] 100.00 % 0

Covariant patterns saved to file: ABCD_RT_M.aligned.fas.patterns

Done

Step 5/5: Clustering by covariant patterns
Covariant patterns assigned to sequences: ABCD_RT_M.aligned.fas.seq2patterns
Sequences clustered by covariant patterns: ABCD_RT_M.aligned.fas.clusters

The most time-consuming stage is step 3, so we add a process bar.

Output files:

ABCD_RT_M.aligned.fas.pairs.txt            # covariant pairs information, table file, could be imported to MS Excel
ABCD_RT_M.aligned.fas.patterns.txt         # covariant patterns, table file, could be imported to MS Excel
ABCD_RT_M.aligned.fas.clusters.txt         # sequence clusters by covariant patterns
ABCD_RT_M.aligned.fas.seq2patterns.txt     # covariant patterns of every sequence, table file, could be imported to MS Excel

Note: For windows user, please use a modern text editor to view the result files. Notepad is not recommended, Notepad++ is a better choice.

More examples: fastcov-examples.tar.gz

Errors and Solutions

  1. No input file given. Please feed fastcov a aligned amino acids sequences in FASTA format.

    $ fastcov
    [Error] no input file (aligned amino acids sequences in FASTA format) given.
    type "fastcov -h" for help
    
  2. Input file is not aligned.

    [Error] sequence length not equal: 343 (AB014392_Pol-C) != 344.
    input file should be aligned amino acids sequences in FASTA format
    
  3. Illegal characters in sequence. FASTA parsing module of fastcov strictly check the sequences, you may check input sequence according according to the IUPAC nucleotide code (http://www.bioinformatics.org/sms2/iupac.html). It may also be caused by unmatch of sequence type (PROTEIN) and actual sequence type (DNA) in FASTA file.

    Input: test.fa
    
    Step 1/5: Reading sequences
    error when reading AB014367_Pol-C: invalid Protein sequence: AB014367_Pol-C
    

FAQ

Please don't hesitate to email us.

Q: What a mess when opening the result files!

A: Microsoft Windows user may open the result files by Notepad provided by the Operating system. Please choose another moder text editor like Notepad++.

Authors

Yan Li liyan.com@gmail.com, Wei Shen shenwei356@gmail.com

Citation

Wei Shen, Yan Li*. A novel algorithm for detecting multiple covariance and clustering of biological sequences. Sci. Rep. 6, 30425; doi:10.1038/srep30425 (2016).

Copyright © 2015-2016, All Rights Reserved.

This software is free to distribute for academic research.