[Cdd-commits] r1063 - projects/med/trunk/debian-med/tasks
CDD Subversion Commit
noreply at alioth.debian.org
Mon Sep 8 08:47:25 UTC 2008
Author: tille
Date: Mon Sep 8 08:47:25 2008
New Revision: 1063
Modified:
projects/med/trunk/debian-med/tasks/bio
Log:
Further BioLinux packages.
Modified: projects/med/trunk/debian-med/tasks/bio
==============================================================================
--- projects/med/trunk/debian-med/tasks/bio (original)
+++ projects/med/trunk/debian-med/tasks/bio Mon Sep 8 08:47:25 2008
@@ -2052,3 +2052,59 @@
.
Remark: The link to the source archive on the web pages is not valid
any more - it might be a problem to obtain the source.
+
+Depends: cap3
+Homepage: http://genome.cs.mtu.edu/cap/cap3.html
+License: free for governmental agency or a non-profit educational institution
+Responsible: BioLinux - Bela Tiwari <btiwari at ceh.ac.uk>
+Pkg-URL: http://nebc.nox.ac.uk/bio-linux/dists/unstable/bio-linux/binary-i386/
+Pkg-Description: DNA Sequence Assembly Program
+ CAP3 contains the following improvements to the CAP sequence assembly
+ program.
+ 1. Use of forward-reverse constraints to correct assembly errors and
+ link contigs.
+ 2. Use of base quality values in alignment of sequence reads.
+ 3. Automatic clipping of 5' and 3' poor regions of reads.
+ 4. Generation of assembly results in ace file format for Consed.
+ 5. CAP3 can be used in GAP4 of the Staden package.
+ These improvements allow CAP3 to take longer sequences of higher
+ errors and produce more accurate consensus sequences.
+ .
+ Remark: Obtaining the source requires to fill in a registration form
+ so official distribution in Debian is probably impossible. The
+ package included in the BioLinux distribution
+ http://envgen.nox.ac.uk/biolinux.html containins only the binaries
+ cap3 and formcon, dated Aug 29, 2002. This package exists purely for
+ convenience to Bio-Linux users so that the files are placed in
+ locations consistent with the Bio-Linux setup.
+
+Depends: cd-hit
+Homepage: http://www.bioinformatics.org/cd-hit/
+License: to be clarified
+Responsible: BioLinux - Bela Tiwari <btiwari at ceh.ac.uk>
+Pkg-URL: http://nebc.nox.ac.uk/bio-linux/dists/unstable/bio-linux/binary-i386/
+Pkg-Description: suite of programs designed to quickly group sequences.
+ CD-HIT stands for Cluster Database at High Identity with
+ Tolerance. The program (cd-hit) takes a fasta format sequence
+ database as input and produces a set of 'non-redundant' (nr)
+ representative sequences as output. In addition cd-hit outputs a
+ cluster file, documenting the sequence 'groupies' for each nr
+ sequence representative. The idea is to reduce the overall size of
+ the database without removing any sequence information by only
+ removing 'redundant' (or highly similar) sequences. This is why the
+ resulting database is called non-redundant (nr). Essentially, cd-hit
+ produces a set of closely related protein families from a given fasta
+ sequence database.
+
+ CD-HIT uses a 'longest sequence first' list removal algorithm to
+ remove sequences above a certain identity threshold. Additionally the
+ algorithm implements a very fast heuristic to find high identity
+ segments between sequences, and so can avoid many costly full
+ alignments.
+
+ With recent developments, cd-hit package offers new programs for DNA
+ sequence clustering and comparing two databases. It also has lots of
+ new options for clustering control.
+ .
+ This package is included into BioLinux distribution
+ http://envgen.nox.ac.uk/biolinux.html
More information about the Cdd-commits
mailing list