[Cdd-commits] r1063 - projects/med/trunk/debian-med/tasks

Mon Sep 8 08:47:25 UTC 2008

Author: tille
Date: Mon Sep  8 08:47:25 2008
New Revision: 1063

Modified:
   projects/med/trunk/debian-med/tasks/bio
Log:
Further BioLinux packages.


Modified: projects/med/trunk/debian-med/tasks/bio
==============================================================================

--- projects/med/trunk/debian-med/tasks/bio	(original)
+++ projects/med/trunk/debian-med/tasks/bio	Mon Sep  8 08:47:25 2008
@@ -2052,3 +2052,59 @@
  .
  Remark: The link to the source archive on the web pages is not valid
  any more - it might be a problem to obtain the source.
+
+Depends: cap3
+Homepage: http://genome.cs.mtu.edu/cap/cap3.html
+License: free for governmental agency or a non-profit educational institution
+Responsible: BioLinux - Bela Tiwari <btiwari at ceh.ac.uk>
+Pkg-URL: http://nebc.nox.ac.uk/bio-linux/dists/unstable/bio-linux/binary-i386/
+Pkg-Description: DNA Sequence Assembly Program
+ CAP3 contains the following improvements to the CAP sequence assembly
+ program. 
+  1. Use of forward-reverse constraints to correct assembly errors and
+     link contigs. 
+  2. Use of base quality values in alignment of sequence reads.
+  3. Automatic clipping of 5' and 3' poor regions of reads.
+  4. Generation of assembly results in ace file format for Consed.
+  5. CAP3 can be used in GAP4 of the Staden package.
+ These improvements allow CAP3 to take longer sequences of higher
+ errors and produce more accurate consensus sequences.
+ .
+ Remark: Obtaining the source requires to fill in a registration form
+ so official distribution in Debian is probably impossible.  The
+ package included in the BioLinux distribution
+ http://envgen.nox.ac.uk/biolinux.html containins only the binaries
+ cap3 and formcon, dated Aug 29, 2002.  This package exists purely for
+ convenience to Bio-Linux users so that the files are placed in
+ locations consistent with the Bio-Linux setup.
+
+Depends: cd-hit
+Homepage: http://www.bioinformatics.org/cd-hit/
+License: to be clarified
+Responsible: BioLinux - Bela Tiwari <btiwari at ceh.ac.uk>
+Pkg-URL: http://nebc.nox.ac.uk/bio-linux/dists/unstable/bio-linux/binary-i386/
+Pkg-Description: suite of programs designed to quickly group sequences.
+ CD-HIT stands for Cluster Database at High Identity with
+ Tolerance. The program (cd-hit) takes a fasta format sequence
+ database as input and produces a set of 'non-redundant' (nr)
+ representative sequences as output. In addition cd-hit outputs a
+ cluster file, documenting the sequence 'groupies' for each nr
+ sequence representative. The idea is to reduce the overall size of
+ the database without removing any sequence information by only
+ removing 'redundant' (or highly similar) sequences. This is why the
+ resulting database is called non-redundant (nr). Essentially, cd-hit
+ produces a set of closely related protein families from a given fasta
+ sequence database.
+
+ CD-HIT uses a 'longest sequence first' list removal algorithm to
+ remove sequences above a certain identity threshold. Additionally the
+ algorithm implements a very fast heuristic to find high identity
+ segments between sequences, and so can avoid many costly full
+ alignments.
+
+ With recent developments, cd-hit package offers new programs for DNA
+ sequence clustering and comparing two databases. It also has lots of
+ new options for clustering control.
+ .
+ This package is included into BioLinux distribution
+ http://envgen.nox.ac.uk/biolinux.html