r21528 - in /branches/upstream/perlindex: ./ current/ current/ChangeLog current/MANIFEST current/META.yml current/Makefile.PL current/README current/lib/ current/lib/Text/ current/lib/Text/English.pm current/perlindex.PL current/t/ current/t/basic.t

gregoa at users.alioth.debian.org gregoa at users.alioth.debian.org
Sun Jun 15 15:45:02 UTC 2008


Author: gregoa
Date: Sun Jun 15 15:45:02 2008
New Revision: 21528

URL: http://svn.debian.org/wsvn/?sc=1&rev=21528
Log:
[svn-inject] Installing original source of perlindex

Added:
    branches/upstream/perlindex/
    branches/upstream/perlindex/current/
    branches/upstream/perlindex/current/ChangeLog
    branches/upstream/perlindex/current/MANIFEST
    branches/upstream/perlindex/current/META.yml
    branches/upstream/perlindex/current/Makefile.PL
    branches/upstream/perlindex/current/README
    branches/upstream/perlindex/current/lib/
    branches/upstream/perlindex/current/lib/Text/
    branches/upstream/perlindex/current/lib/Text/English.pm
    branches/upstream/perlindex/current/perlindex.PL
    branches/upstream/perlindex/current/t/
    branches/upstream/perlindex/current/t/basic.t

Added: branches/upstream/perlindex/current/ChangeLog
URL: http://svn.debian.org/wsvn/branches/upstream/perlindex/current/ChangeLog?rev=21528&op=file
==============================================================================
--- branches/upstream/perlindex/current/ChangeLog (added)
+++ branches/upstream/perlindex/current/ChangeLog Sun Jun 15 15:45:02 2008
@@ -1,0 +1,112 @@
+2006-07-02  Ulrich Pfeifer  <upf at de.uu.net>
+
+	* perlindex.PL (index): moving the check for Pod::Text in the
+	index function.  The result went out of scope before (thanks
+	Florian for the bug report).  Fixing the loop variable in the main
+	loop (thanks Florian for the bug report).  Adding support for
+	IO::Scalar.
+	
+2006-03-19  Ulrich Pfeifer  <upf at de.uu.net>
+
+	* Fixed a bug reported by Florian Ragwitz:  Absolute filenames
+	were access incorrectly from hit list.
+
+2005-09-18  Ulrich Pfeifer  <pfeifer at wait.de>
+
+	* Integrated a patch from Marek Rouchal to use Pod::Text for
+	parsing if available.  I did rework the patch a little to remove
+	the need for IO::Scalar which does no seem to be part of debian
+	sarge.  Now underscore is also a valid letter.
+
+2005-04-10  Ulrich Pfeifer  <upf at wait.de>
+
+	* Re-Added Text::English as it does not seem to be available
+	separately.  I'd rather not package a "foreign" separately.
+
+2005-04-03  Ulrich Pfeifer  <pfeifer at wait.de>
+
+	* Removed Text::English from Distribution
+	* Determine pager and search path at run time (Patch from Marek, Ticket #4506)
+
+2004-05-02  Florian Ragwitz
+
+	* please parameterize the $IDIR in the make process
+	* clarified license
+	* clarified authorship of Text::English
+	
+2003-06-19  Slaven Rezic 
+
+	* t/basic.t: made test more safe and portable
+
+2003-06-18  Ulrich Pfeifer  <pfeifer at wait.de>
+
+	Added some regression tests before the CPAN testers beat me up.
+
+2003-06-18  Slaven Rezic 
+
+        [cpan #2820] Fix indexed directories
+
+        In some perl installations, installsitelib is not part of the
+	privlib directory. That is, privlib is something like
+	/usr/local/lib/perl5/5.8.0 and installsitelib something like
+	/usr/local/lib/perl5/site_perl/5.8.0.  This causes the perlindex
+	indexer not to dive into installsitelib.
+
+
+Sun Mar 10 13:05:46 MET 1996   "Chuck D. Phillips (NON-HP Employee)" <cdp at hpescdp.fc.hp.com>
+
+By default, the program perlindex isn't deleted when you do a "make
+clean" the result is that the Config.pm constants don't get updated
+next time you do a "make all".  To fix this, you can add the following
+line to the WriteMakefile() parameters in Makefile.PL:
+
+	      'clean' =>	{ 'FILES' => 'perlindex' },
+
+
+Nit: Term::ReadKey is more reliable for cbreak than using $d_bsd.  On
+HPUX, setting BSD_Style to either 0 or 1 doesn't quite work right.
+I've hacked my own copy of perlindex.PL to prefer Term::ReadKey if
+available.  Otherwise, it defaults to old behavior execept that it
+resolves during "make all" instead of at run time.  (I also insert
+"col" between the nroff and the pager to avoid some garbage on HPUX.)
+I've included the diffs at the bottom.
+
+patch7 Description:
+
+	Fixed test for compressed int patch.
+
+Fri Mar  8 20:26:27 MET 1996   Ulrich Pfeifer <pfeifer at ls6.informatik.uni-dortmund.de>
+
+patch6 Description:
+
+	man3direxp will not be indexed any more. Even if inside of
+	privlibexp. Fixed bug with -nomenu. Did eat characters ;-)
+	cbreak for non bsd systems (hopefully).
+
+Wed Feb 28 13:45:28 MET 1996   Ulrich Pfeifer <pfeifer at ls6.informatik.uni-dortmund.de>
+
+patch5 Description:
+
+	Nroff, man1direxp, privlibexp, prefix and pager are now
+	determined at extraction time. Nroff and pager were hardcoded
+	before.
+
+Fri Feb 23 11:22:12 MET 1996   Ulrich Pfeifer <pfeifer at ls6.informatik.uni-dortmund.de>
+
+. Description:
+
+	Fixed version computation.
+	Fixed menu numbering. First hit could not be selected before.
+        Added chmod 0755 to perlindex.PL
+Thu Feb 22 19:17:54 MET 1996   Ulrich Pfeifer <pfeifer at ls6.informatik.uni-dortmund.de>
+
+. Description:
+
+	New version numbering.
+	Added cbreak mode.
+
+Thu Feb 22 16:43:57 MET 1996   Ulrich Pfeifer <pfeifer at ls6.informatik.uni-dortmund.de>
+
+. Description:
+
+	

Added: branches/upstream/perlindex/current/MANIFEST
URL: http://svn.debian.org/wsvn/branches/upstream/perlindex/current/MANIFEST?rev=21528&op=file
==============================================================================
--- branches/upstream/perlindex/current/MANIFEST (added)
+++ branches/upstream/perlindex/current/MANIFEST Sun Jun 15 15:45:02 2008
@@ -1,0 +1,8 @@
+ChangeLog
+MANIFEST
+README
+Makefile.PL
+perlindex.PL
+t/basic.t
+lib/Text/English.pm
+META.yml                                 Module meta-data (added by MakeMaker)

Added: branches/upstream/perlindex/current/META.yml
URL: http://svn.debian.org/wsvn/branches/upstream/perlindex/current/META.yml?rev=21528&op=file
==============================================================================
--- branches/upstream/perlindex/current/META.yml (added)
+++ branches/upstream/perlindex/current/META.yml Sun Jun 15 15:45:02 2008
@@ -1,0 +1,10 @@
+# http://module-build.sourceforge.net/META-spec.html
+#XXXXXXX This is a prototype!!!  It will change in the future!!! XXXXX#
+name:         perlindex
+version:      1.502
+version_from: 
+installdirs:  site
+requires:
+
+distribution_type: module
+generated_by: ExtUtils::MakeMaker version 6.17

Added: branches/upstream/perlindex/current/Makefile.PL
URL: http://svn.debian.org/wsvn/branches/upstream/perlindex/current/Makefile.PL?rev=21528&op=file
==============================================================================
--- branches/upstream/perlindex/current/Makefile.PL (added)
+++ branches/upstream/perlindex/current/Makefile.PL Sun Jun 15 15:45:02 2008
@@ -1,0 +1,31 @@
+#!/usr/bin/perl
+#                              -*- Mode: Perl -*- 
+# Author          : Ulrich Pfeifer
+# Created On      : Tue May 27 17:27:28 1997
+# Last Modified On: Sun Jul  2 10:32:23 2006
+# Language        : CPerl
+# Update Count    : 30
+#
+# (C) Copyright 1997-2005, Ulrich Pfeifer, all rights reserved.  This
+# file is free software; you can redistribute it and/or modify it
+# under the same terms as Perl itself.
+
+$VERSION = "1.502";
+
+use ExtUtils::MakeMaker;
+# See lib/ExtUtils/MakeMaker.pm for details of how to influence
+# the contents of the Makefile that is written.
+
+WriteMakefile(
+    'NAME'         => 'perlindex',
+    'VERSION'      => $VERSION,
+    'LIBS'         => [''],   # e.g., '-lm' 
+    'DEFINE'       => '',     # e.g., '-DHAVE_SOMETHING' 
+    'INC'          => '',     # e.g., '-I/usr/include/other' 
+    'dist'         => { SUFFIX => "gz", COMPRESS => "gzip -f"},
+# we do bundle that module in the distribution
+#   'PREREQ_PM'    => { 'Text::English' => 0 },
+    'EXE_FILES'    => [ 'perlindex' ],
+    'clean'        => { 'FILES' => 'perlindex' },
+);
+

Added: branches/upstream/perlindex/current/README
URL: http://svn.debian.org/wsvn/branches/upstream/perlindex/current/README?rev=21528&op=file
==============================================================================
--- branches/upstream/perlindex/current/README (added)
+++ branches/upstream/perlindex/current/README Sun Jun 15 15:45:02 2008
@@ -1,0 +1,22 @@
+This is the perlindex distribution 
+
+Perlindex is a program to index and search the perl documentation.
+
+The newest version is always available from CPAN/authors/id/ULPFR.
+
+Perlindex should work with any Perl >= 5.002 at least. With Perl >=
+5.003_07 it will use compress integers which results in smaller
+indices.
+
+To  install do:
+
+        perl Makefile.PL
+        make install
+        perlindex -index
+
+You may give additional files to index on the command line. I'd
+recommend to specify the perl utils (perldoc, h2xs, ...):
+
+        perlindex -index /usr/local/ls6/perl-5.002/bin/*
+
+Ulrich Pfeifer

Added: branches/upstream/perlindex/current/lib/Text/English.pm
URL: http://svn.debian.org/wsvn/branches/upstream/perlindex/current/lib/Text/English.pm?rev=21528&op=file
==============================================================================
--- branches/upstream/perlindex/current/lib/Text/English.pm (added)
+++ branches/upstream/perlindex/current/lib/Text/English.pm Sun Jun 15 15:45:02 2008
@@ -1,0 +1,146 @@
+#!/usr/bin/perl
+#                              -*- Mode: Perl -*- 
+# Author          : Ian Phillipps
+# Last Modified On: Sun May  2 15:35:33 2004
+# Language        : CPerl
+
+package Text::English;
+
+$VERSION = $VERSION = '0.01';
+
+sub stem {
+    my @parms = @_;
+    foreach( @parms ) {
+	$_ = lc $_;
+
+	# Step 0 - remove punctuation
+	s/'s$//; s/^[^a-z]+//; s/[^a-z]+$//;
+	next unless /^[a-z]+$/;
+
+	# step1a_rules
+	if( /[^s]s$/ ) { s/sses$/ss/ || s/ies$/i/ || s/s$// }
+       
+	# step1b_rules. The business with rule==106 is embedded in the
+	# boolean expressions here.
+	(/[aeiouy][^aeiouy].*eed$/ && s/eed$/ee/ ) || 
+	    ( s/([aeiou].*)ed$/$1/ || s/([aeiouy].*)ing$/$1/ ) &&
+	    ( # step1b1_rules
+		s/at$/ate/	|| s/bl$/ble/	|| s/iz$/ize/	|| s/bb$/b/	||
+		s/dd$/d/	|| s/ff$/f/	|| s/gg$/g/	|| s/mm$/m/	||
+		s/nn$/n/	|| s/pp$/p/	|| s/rr$/r/	|| s/tt$/t/	||
+		s/ww$/w/	|| s/xx$/x/	||
+		# This is wordsize==1 && CVC...addanE...
+		s/^[^aeiouy]+[aeiouy][^aeiouy]$/$&e/
+	    )
+#DEBUG	    && warn "step1b1: $_\n"
+	    ;
+	# step1c_rules
+#DEBUG	warn "step1c: $_\n" if
+	s/([aeiouy].*)y$/$1i/;
+
+	# step2_rules
+
+	if (	s/ational$/ate/	|| s/tional$/tion/	|| s/enci$/ence/	||
+		s/anci$/ance/	|| s/izer$/ize/		|| s/iser$/ise/		||
+		s/abli$/able/	|| s/alli$/al/		|| s/entli$/ent/	||
+		s/eli$/e/	|| s/ousli$/ous/	|| s/ization$/ize/	||
+		s/isation$/ise/	|| s/ation$/ate/	|| s/ator$/ate/		||
+		s/alism$/al/	|| s/iveness$/ive/	|| s/fulnes$/ful/	||
+		s/ousness$/ous/	|| s/aliti$/al/		|| s/iviti$/ive/	||
+		s/biliti$/ble/
+	    ) {
+	    my ($l,$m) = ($`,$&);
+#DEBUG	    warn "step 2: l=$l m=$m\n";
+	    $_ = $l.$m unless $l =~ /[^aeiou][aeiouy]/;
+	}
+	# step3_rules
+	if (	s/icate$/ic/	|| s/ative$//	|| s/alize$/al/	||
+		s/iciti$/ic/	|| s/ical$/ic/	|| s/ful$//	||
+		s/ness$//
+	    ) {
+	    my ($l,$m) = ($`,$&);
+#DEBUG	    warn "step 3: l=$l m=$m\n";
+	    $_ = $l.$m unless $l =~ /[^aeiou][aeiouy]/;
+	}
+
+	# step4_rules
+	if (	s/al$//		|| s/ance$//	|| s/ence$//	|| s/er$//	||
+		s/ic$//		|| s/able$//	|| s/ible$//	|| s/ant$//	||
+		s/ement$//	|| s/ment$//	|| s/ent$//	|| s/sion$/s/	||
+		s/tion$/t/	|| s/ou$//	|| s/ism$//	|| s/ate$//	||
+		s/iti$//	|| s/ous$//	|| s/ive$//	|| s/ize$//	||
+		s/ise$//
+	    ) {
+	    my ($l,$m) = ($`,$&);
+	# Look for two consonant/vowel transitions
+	# NB simplified...
+#DEBUG	    warn "step 4: l=$l m=$m\n";
+	    $_ = $l.$m unless $l =~ /[^aeiou][aeiouy].*[^aeiou][aeiouy]/;
+	}
+
+	# step5a_rules
+#DEBUG	warn("step 5a: $_\n") &&
+	s/e$// if ( /[^aeiou][aeiouy].*[^aeiou][aeiouy].*e$/ ||
+		( /[aeiou][^aeiouy].*e/ && ! /[^aeiou][aeiouy][^aeiouwxy]e$/) );
+
+	# step5b_rules
+#DEBUG	warn("step 5b: $_\n") &&
+	s/ll$/l/ if /[^aeiou][aeiouy].*[^aeiou][aeiouy].*ll$/;
+
+	# Cosmetic step 
+	s/(.)i$/$1y/;
+    }
+    @parms;
+}
+
+1;
+
+__END__
+
+=head1 NAME
+
+Text::English - Porter's stemming algorithm
+
+=head1 SYNOPSIS
+
+    use Text::English;
+    @stems = Text::English::stem( @words );
+
+=head1 DESCRIPTION
+
+This routine applies the Porter Stemming Algorithm to its parameters,
+returning the stemmed words.
+It is derived from the C program "stemmer.c"
+as found in freewais and elsewhere, which contains these notes:
+
+   Purpose:    Implementation of the Porter stemming algorithm documented 
+               in: Porter, M.F., "An Algorithm For Suffix Stripping," 
+               Program 14 (3), July 1980, pp. 130-137.
+   Provenance: Written by B. Frakes and C. Cox, 1986.
+
+I have re-interpreted areas that use Frakes and Cox's "WordSize"
+function. My version may misbehave on short words starting with "y",
+but I can't think of any examples.
+
+The step numbers correspond to Frakes and Cox, and are probably in
+Porter's article (which I've not seen).
+Porter's algorithm still has rough spots (e.g current/currency, -ings words),
+which I've not attempted to cure, although I have added
+support for the British -ise suffix.
+
+=head1 NOTES
+
+This is version 0.1. I would welcome feedback, especially improvements
+to the punctuation-stripping step.
+
+=head1 AUTHOR
+
+Ian Phillipps <ian at unipalm.pipex.com>
+
+=head1 COPYRIGHT
+
+Copyright Public IP Exchange Ltd (PIPEX).
+Available for use under the same terms as perl.
+
+=cut
+

Added: branches/upstream/perlindex/current/perlindex.PL
URL: http://svn.debian.org/wsvn/branches/upstream/perlindex/current/perlindex.PL?rev=21528&op=file
==============================================================================
--- branches/upstream/perlindex/current/perlindex.PL (added)
+++ branches/upstream/perlindex/current/perlindex.PL Sun Jun 15 15:45:02 2008
@@ -1,0 +1,1049 @@
+#!/usr/local/bin/perl
+
+use Config;
+use File::Basename qw(&basename &dirname);
+
+# List explicitly here the variables you want Configure to
+# generate.  Metaconfig only looks for shell variables, so you
+# have to mention them as if they were shell variables, not
+# %Config entries.  Thus you write
+#  $startperl
+# to ensure Configure will look for $Config{startperl}.
+
+# This forces PL files to create target in same directory as PL file.
+# This is so that make depend always knows where to find PL derivatives.
+chdir(dirname($0));
+($file = basename($0)) =~ s/\.PL$//;
+$file =~ s/\.pl$//
+	if ($Config{'osname'} eq 'VMS' or
+	    $Config{'osname'} eq 'OS2');  # "case-forgiving"
+open OUT,">$file" or die "Can't create $file: $!";
+
+print "Extracting $file (with variable substitutions)\n";
+chmod 0775, $file;
+
+# In this section, perl variables will be expanded during extraction.
+# You can use $Config{...} to use Configure variables.
+
+print OUT <<"!GROK!THIS!";
+$Config{'startperl'}
+    eval 'exec perl -S \$0 "\$@"'
+	if 0;
+!GROK!THIS!
+print OUT <<'!NO!SUBS!';
+#                              -*- Mode: Perl -*- 
+# Author          : Ulrich Pfeifer
+# Created On      : Mon Jan 22 13:00:41 1996
+# Last Modified On: Sun Jul  2 10:39:22 2006
+# Language        : Perl
+# Update Count    : 349
+# Status          : Unknown, Use with caution!
+# 
+# (C) Copyright 1996-2005, Ulrich Pfeifer, all rights reserved.
+# This file is free software; you can redistribute it and/or modify it
+# under the same terms as Perl itself.
+# 
+# 
+# %SEEN is used to store the absolute pathes to files which have been
+#       indexed. Probably this could be replaced by %FN.
+# 
+# %FN   $FN{'last'}    greatest documentid
+#       $FN{$did}      a pair of $mtf and $filename where $mtf is the
+#                      number of occurances of the most frequent word in
+#                      the document with number $did.
+# 
+# %IDF  $IDF{'*all*'}  number of documents (essentially the same as 
+#                      $FN{'last'})
+#       $IDF{$word}    number of documents containing $word
+# 
+# %IF   $IF{$word}     list of pairs ($docid,$tf) where $docid is
+#                      the number of a document containing $word $tf
+# 
+use Fcntl;
+use less 'time';
+use Getopt::Long;
+use File::Basename;
+use Text::English;
+use Config;
+
+# NDBM_File as LAST resort
+BEGIN { @AnyDBM_File::ISA = qw(DB_File GDBM_File SDBM_File ODBM_File NDBM_File) }
+use AnyDBM_File;
+
+!NO!SUBS!
+
+eval "\$x = pack 'w', 1;";
+if ($x eq "\001") {
+    print OUT "\$p          = 'w'; # compressed int patch available\n";
+} else {
+    print OUT "\$p          = 'S'; # change to 'I' for large collections\n";  
+}
+
+$index_dir = $Config{'man1direxp'};
+$index_dir =~ s:/[^/]*$::;
+$index_dir = $ENV{'INDEXDIR'} if $ENV{'INDEXDIR'};
+
+print OUT <<"EOC";
+\$nroff      = \'$Config{'nroff'}\' || \'nroff\';
+\$man1direxp = \'$Config{'man1direxp'}\';
+\$man3direxp = \'$Config{'man3direxp'}\';
+\$IDIR       = \'$index_dir\';
+\$prefix     = \'$Config{'prefix'}\';
+EOC
+;
+
+# Use Term::ReadKey for character-at-a-time input if available.
+# Else use "stty cbreak" if BSD, "stty icanon" if non-BSD.
+if (eval 'require Term::ReadKey') {
+  print OUT "use Term::ReadKey;\n";
+  $BSD_STYLE= -1;
+} elsif ($Config{'d_bsd'}) {
+  $BSD_STYLE= 1;
+} else {
+  $BSD_STYLE= 0;
+}
+
+# # Let's look for a stemmer
+# eval "use Text::English;";
+# unless ($@) {
+#     $stemmer = \&Text::English::stem;
+# } else {
+#     eval "require 'Stem.pl';";
+#     unless ($@) {
+#         $stemmer = \&stem;
+#     }
+# }
+
+# In the following, perl variables are not expanded during extraction.
+
+print OUT <<'!NO!SUBS!';
+
+$pager= $Config{'pager'};
+if ($Config{'osname'} eq 'hpux') {
+  $pager= "col | $pager";
+}
+if (exists $ENV{PAGER}) {
+  $pager=$ENV{PAGER}
+}
+$stemmer = \&Text::English::stem;
+$debug       = 0;
+$opt_index   = '';                # make perl -w happy
+$opt_menu    = 1;
+$opt_maxhits = 15;
+$opt_cbreak  = 1;
+&GetOptions(
+            'index',
+            'cbreak!',
+            'maxhits=i',
+            'menu!',
+            'verbose',
+            'dict:i',
+            'idir=s',
+            ) || die "Usage: $0 [-index] [words ...]\n";
+
+if (defined $opt_idir) {
+    $IDIR = $opt_idir;          # avoid to many changes below.
+}
+
+if (defined $opt_dict) {
+    $opt_dict ||= 100;
+}
+
+if ($opt_index) {
+
+    # check whether we can use Pod::Text to extract POD
+    $PodText     = 0;
+    eval {
+      require Pod::Text;
+      print "Using Pod::Text\n";
+      $PodText = 1;
+    };
+
+
+    $IoScalar    = 0;
+    eval {
+      require IO::Scalar;
+      print "Using IO::Scalar\n";
+      $IoScalar = 1;
+    };
+
+    &initstop;
+
+    tie (%IF,   AnyDBM_File, "$IDIR/index_if",   O_CREAT|O_RDWR, 0644)
+        or die "Could not tie $IDIR/index_if: $!\n";
+    tie (%IDF,  AnyDBM_File, "$IDIR/index_idf",  O_CREAT|O_RDWR, 0644)
+        or die "Could not tie $IDIR/index_idf: $!\n";
+    tie (%SEEN, AnyDBM_File, "$IDIR/index_seen", O_CREAT|O_RDWR, 0644)
+        or die "Could not tie $IDIR/index_seen: $!\n";
+    tie (%FN,   AnyDBM_File, "$IDIR/index_fn",   O_CREAT|O_RDWR, 0644)
+        or die "Could not tie $IDIR/index_fn: $!\n";
+
+    require "find.pl";
+
+    unless (@ARGV) {            # breaks compatibility :-(
+        my %seen;
+        my @perllib = grep(length && -d && !$seen{$_}++,
+            @Config{qw(installprivlib installarchlib installsitelib installvendorlib installscript installsitearch)});
+
+        for $dir (@perllib) {
+            print "Scanning $dir ... \n";
+            if (-l $dir) { # debian symlinks installarchlib but we do not want to follow links in general
+               my $target = readlink $dir;
+               $dir =~ s:[^/]+$:$target:;
+            }
+            &find($dir);
+        }
+    }
+    for $name (@ARGV) {
+        my $fns = $name;
+        $fns =~ s:\Q$prefix/::;
+        next if $SEEN{$fns};
+        next unless -f $name;
+        if ($name !~ /(~|,v)$/) {
+            $did = $FN{'last'}++;
+            $SEEN{$fns} = &index($name, $fns, $did); 
+        }
+    }
+    untie %IF;
+    untie %IDF;
+    untie %FN;
+    untie %SEEN;
+} elsif ($opt_dict) {
+    tie (%IDF,  AnyDBM_File, "$IDIR/index_idf",  O_RDONLY, 0644)
+        or die "Could not tie $IDIR/index_idf: $!\n".
+            "Did you run '$0 -index'?\n";
+    while (($key,$val) = each %IDF) {
+        printf "%-20s %d\n", $key, $val if $val >= $opt_dict;
+    }
+    untie %IDF;
+} else {
+    tie (%IF,   AnyDBM_File, "$IDIR/index_if",   O_RDONLY, 0644)
+        or die "Could not tie $IDIR/index_if: $!\n".
+            "Did you run '$0 -index'?\n";
+    tie (%IDF,  AnyDBM_File, "$IDIR/index_idf",   O_RDONLY, 0644)
+        or die "Could not tie $IDIR/index_idf: $!\n";
+    tie (%FN,   AnyDBM_File, "$IDIR/index_fn",   O_RDONLY, 0644)
+        or die "Could not tie $IDIR/index_fn: $!\n";
+    &search(@ARGV);
+    untie %IF;
+    untie %IDF;
+    untie %FN;
+    untie %SEEN;
+}
+
+sub wanted {
+    my $fns = $name;
+
+    if ($name eq $man3direxp) {
+        $prune = 1;
+    }
+    $fns =~ s:\Q$prefix/::;
+    return if $SEEN{$fns};
+    return unless -f $_;
+    if ($name =~ /man|bin|\.(pod|pm)$/) {
+        if (!/(~|,v)$/) {
+            $did = $FN{'last'}++;
+            $SEEN{$fns} = &index($name, $fns, $did); 
+        }
+    }
+}
+
+sub index {
+    my $fn  = shift;
+    my $fns = shift;
+    my $did = shift;
+    my %tf;
+    my $maxtf = 0;
+    my $pod = ($fns =~ /\.pod|man/);
+
+    if($PodText and -T $fn) {
+      my $result;
+      my $parser = Pod::Text->new(sentence => 0, width => 78);
+      if ($IoScalar) {
+        my $tmpfile = new IO::Scalar;
+        open(IN,"<$fn");
+        $parser->parse_from_filehandle(*IN, $tmpfile);
+        warn "===> $fn, $tmpfile" if $debug;
+        #for my $line (split /\n/, $tmpfile) {
+        while ($tmpfile =~ s/^(.*\n)//) {
+          warn "=> $1\n" if $debug;
+          for $word (&normalize($1)) {
+              next if $stop{$word};
+              $tf{$word}++;
+          }
+        }
+        close IN;
+      } else {
+        my $tmpfile = "$IDIR/tmppod.txt";
+        $parser->parse_from_file($fn, $tmpfile);
+        open (IN, "<$tmpfile") || warn "Could not open $fn: $!\n", return (0);
+        while ($line = <IN>) {
+          warn "=> $line\n" if $debug;
+          for $word (&normalize($line)) {
+              next if $stop{$word};
+              $tf{$word}++;
+          }
+        }
+        close IN;
+        unlink $tmpfile;
+      }
+    } else {
+      # no Pod::Text found
+      open(IN, "<$fn") || warn "Could not open $fn: $!\n", return (0);
+      while ($line = <IN>) {
+          warn "=> $line\n" if $debug;
+          if ($line =~ /^=head/) {
+              $pod = 1;
+          } elsif ($line =~ /^=cut/){
+              $pod = 0;
+          } else {
+              next unless $pod;
+          }
+          for $word (&normalize($line)) {
+              next if $stop{$word};
+              $tf{$word}++;
+          }
+      }
+      close(IN);
+    }
+    for $tf (values %tf) {
+        $maxtf = $tf if $tf > $maxtf;
+    }
+    for $word (keys %tf) {
+        $IDF{$word}++;
+        $IF{$word} = '' unless defined $IF{$word}; # perl -w
+        $IF{$word} .= pack($p.$p, $did, $tf{$word});
+    }
+    $FN{$did} = pack($p, $maxtf).$fns; 
+    print STDERR "$fns\n";
+    1;
+}
+
+sub normalize {
+    my $line = join ' ', @_;
+    my @result;
+
+    $line =~ tr/A-Z/a-z/;
+    $line =~ tr/a-z0-9_/ /cs;
+    for $word (split / /, $line ) {
+        $word =~ s/^\d+//;
+        next unless length($word) > 2;
+        if ($stemmer) {
+            push @result, &$stemmer($word);
+        } else {
+            push @result, $word;
+        }
+    }
+    @result;
+}
+
+sub search {
+    my %score;
+    my $maxhits = $opt_maxhits;
+    my (@unknown, @stop);
+
+    &initstop if $opt_verbose;
+    for $word (normalize(@_)) {
+        unless ($IF{$word}) {
+            if ($stop{$word}) {
+                push @stop, $word;
+            } else {
+                push @unknown, $word;
+            }
+            next;
+        }
+        my %post = unpack($p.'*',$IF{$word});
+        my $idf = log($FN{'last'}/$IDF{$word});
+        for $did (keys %post) {
+            my ($maxtf) = unpack($p, $FN{$did});
+            $score{$did} = 0 unless defined $score{$did}; # perl -w 
+            $score{$did} += $post{$did} / $maxtf * $idf;
+        }
+    }
+    if ($opt_verbose) {
+        print "Unkown:  @unknown\n" if @unknown;
+        print "Ingnore: @stop\n" if @stop;
+    }
+    if ($opt_menu) {
+        my @menu;
+        my $answer = '';
+        my $no = 0;
+        my @s = ('1' .. '9', 'a' .. 'z');
+        my %s;
+        
+        for $did (sort {$score{$b} <=> $score{$a}} keys %score) {
+            my ($mtf, $path) = unpack($p.'a*', $FN{$did});
+            my $s = $s[$no];
+            push @menu, sprintf "%s %6.3f %s\n", $s, $score{$did}, $path;
+            $s{$s} = ++$no;
+            last unless --$maxhits;
+        }
+        &cbreak('on') if $opt_cbreak;
+        while (1) {
+            print @menu;
+            print "\nEnter Number or 'q'> ";
+            if ($opt_cbreak) {
+                read(TTYIN,$answer,1);
+                print "\n";
+            } else {
+                $answer = <STDIN>;
+            }
+            last if $answer =~ /^q/i;
+            $answer = ($s{substr($answer,0,1)})-1;
+            if ($answer >= 0 and $answer <= $#menu) {
+                my $selection = $menu[$answer];
+                if ($selection =~ m:/man:) {
+                    my ($page, $sect) = 
+                        ($selection =~ m:([^/]*)\.(.{1,3})$:);
+                    print STDERR "Running man $sect $page\n";
+                    system 'man', $sect, $page;
+                } else {
+                    my ($path) = ($selection =~ m:(\S+)$:);
+                    $path = $prefix.'/'.$path unless $path =~ m:^/:;
+                    print STDERR "Running pod2man $path\n";
+                    system "pod2man --official $path | $nroff -man | $pager";
+                }
+            } else {
+                my $path = $prefix."/bin/perlindex";
+                system "pod2man --official $path | $nroff -man | $pager";
+            }
+        }
+        &cbreak('off') if $opt_cbreak;
+    } else {
+        for $did (sort {$score{$b} <=> $score{$a}} keys %score) {
+            printf("%6.3f %s\n", $score{$did}, 
+                   (unpack($p.'a*', $FN{$did}))[1]);
+            last unless --$maxhits;
+        }
+    }
+}
+
+sub cbreak {
+    my $mode = shift;
+    if ($mode eq 'on') {
+        open(TTYIN, "</dev/tty") || die "can't read /dev/tty: $!";
+        open(TTYOUT, ">/dev/tty") || die "can't write /dev/tty: $!";
+        select(TTYOUT);
+        $| = 1;
+        select(STDOUT);
+        $SIG{'QUIT'} = $SIG{'INT'} = 'cbreak';
+!NO!SUBS!
+;
+
+if ($BSD_STYLE == -1) {
+    print OUT "\tReadMode 3; # Set cbreak mode\n";
+} elsif ($BSD_STYLE) {
+    print OUT "\tsystem \"stty cbreak </dev/tty >/dev/tty 2>&1\";\n";
+    #print OUT "\tsystem \"stty cbreak echo </dev/tty >/dev/tty 2>&1\";\n";
+} else {
+    print OUT "\tsystem \"stty\", '-icanon', 'eol', \"\\001\";\n";
+}
+print OUT "    } else {\n";
+if ($BSD_STYLE == -1) {
+    print OUT "\tReadMode 0; # Restore non-cbreak mode\n";
+} elsif ($BSD_STYLE) {
+    print OUT "\tsystem \"stty -cbreak </dev/tty >/dev/tty 2>&1\";\n";
+    #print OUT "\tsystem \"stty -cbreak echo </dev/tty >/dev/tty 2>&1\";\n";
+    } else {
+    print OUT "\tsystem \"stty\", 'icanon', 'eol', '^\@'; # ascii null\n";
+        }
+
+print OUT <<'!NO!SUBS!';
+    }
+}
+
+
+$stopinited = 0;                # perl -w
+sub initstop {
+    return if $stopinited++;
+    while (<DATA>) {
+        next if /^\#/;
+        for (normalize($_)) {
+          $stop{$_}++;
+        }
+    }
+}
+
+=head1 NAME
+
+perlindex - index and query perl manual pages
+
+=head1 SYNOPSIS
+
+    perlindex -index
+
+    perlindex tell me where the flowers are
+
+=head1 DESCRIPTION
+
+"C<perlindex -index>" generates an AnyDBM_File index which can be
+searched with free text queries "C<perlindex> I<a verbose query>".
+
+Each word of the query is searched in the index and a score is
+generated for each document containing it. Scores for all words are
+added and the documents with the highest score are printed.  All words
+are stemed with Porters algorithm (see L<Text::English>) before
+indexing and searching happens.
+
+The score is computed as:
+
+    $score{$document} += $tf{$word,$document}/$maxtf{$document}
+                         * log ($N/$n{$word});
+
+where
+
+=over 10
+
+=item C<$N>
+
+is the number of documents in the index,
+
+=item C<$n{$word}>
+
+is the number of documents containing the I<word>,
+
+=item C<$tf{$word,$document}>
+
+is the number of occurances of I<word> in the I<document>, and
+
+=item C<$maxtf{$document}>
+
+is the maximum freqency of any word in I<document>.
+
+=back
+
+=head1 OPTIONS
+
+All options may be abreviated.
+
+=over 10
+
+=item B<-maxhits> maxhits
+
+Maximum numer of hits to display. Default is 15.
+
+=item B<-menu>
+
+=item B<-nomenu>
+
+Use the matches as menu for calling C<man>. Default is B<-menu>.q
+
+=item B<-cbreak>
+
+=item B<-nocbreak>
+
+Switch to cbreak in menu mode or dont. B<-cbreak> is the default.
+
+=item B<-verbose>
+
+Generates additional information which query words have been not found
+in the database and which words of the query are stopwords.
+
+=back
+
+=head1 EXAMPLE
+
+    perlindex foo bar
+
+    1  3.735 lib/pod/perlbot.pod
+    2  2.640 lib/pod/perlsec.pod
+    3  2.153 lib/pod/perldata.pod
+    4  1.920 lib/Symbol.pm
+    5  1.802 lib/pod/perlsub.pod
+    6  1.586 lib/Getopt/Long.pm
+    7  1.190 lib/File/Path.pm
+    8  1.042 lib/pod/perlop.pod
+    9  0.857 lib/pod/perlre.pod
+    a  0.830 lib/Shell.pm
+    b  0.691 lib/strict.pm
+    c  0.691 lib/Carp.pm
+    d  0.680 lib/pod/perlpod.pod
+    e  0.680 lib/File/Find.pm
+    f  0.626 lib/pod/perlsyn.pod
+    Enter Number or 'q'>
+
+Hitting the keys C<1> to C<f> will display the corresponding manual
+page. Hitting C<q> quits. All other keys display this manual page.
+
+=head1 FILES
+
+The index will be generated in your man directory. Strictly speaking in 
+C<$Config{man1direxp}/..>
+
+    The following files will be generated:
+
+    index_fn           # docid -> (max frequency, filename)
+    index_idf          # term  -> number of documents containing term
+    index_if           # term  -> (docid, frequency)*
+    index_seen         # fn    -> indexed?
+    
+
+=head1 AUTHOR
+
+Ulrich Pfeifer E<lt>F<pfeifer at ls6.informatik.uni-dortmund.de>E<gt>
+
+=cut
+
+__END__
+# freeWAIS-sf stopwords
+a
+about
+above
+according
+across
+actually
+adj
+after
+afterwards
+again
+against
+all
+almost
+alone
+along
+already
+also
+although
+always
+among
+amongst
+an
+and
+another
+any
+anyhow
+anyone
+anything
+anywhere
+are
+aren't
+around
+as
+at
+b
+be
+became
+because
+become
+becomes
+becoming
+been
+before
+beforehand
+begin
+beginning
+behind
+being
+below
+beside
+besides
+between
+beyond
+billion
+both
+but
+by
+c
+can
+can't
+cannot
+caption
+co
+co.
+could
+couldn't
+d
+did
+didn't
+do
+does
+doesn't
+don't
+down
+during
+e
+each
+eg
+eight
+eighty
+either
+else
+elsewhere
+end
+ending
+enough
+etc
+even
+ever
+every
+everyone
+everything
+everywhere
+except
+f
+few
+fifty
+first
+five
+vfor
+former
+formerly
+forty
+found "
+four
+from
+further
+g
+h
+had
+has
+hasn't
+have
+haven't
+he
+he'd
+he'll
+he's
+hence
+her
+here
+here's
+hereafter
+hereby
+herein
+hereupon
+hers
+herself
+him
+himself
+his
+how
+however
+hundred
+i
+i'd
+i'll
+i'm
+i've
+ie
+if
+in
+inc.
+indeed
+instead
+into
+is
+isn't
+it
+it's
+its
+itself
+j
+k
+l
+last
+later
+latter
+latterly
+least
+less
+let
+let's
+like
+likely
+ltd
+m
+made
+make
+makes
+many
+maybe
+me
+meantime
+meanwhile
+might
+million
+miss
+more
+moreover
+most
+mostly
+mr
+mrs
+much
+must
+my
+myself
+n
+namely
+neither
+never
+nevertheless
+next
+nine
+ninety
+no
+nobody
+none
+nonetheless
+noone
+nor
+not
+nothing
+now
+nowhere
+o
+of
+off
+often
+on
+once
+one
+one's
+only
+onto
+or
+other
+others
+otherwise
+our
+ours
+ourselves
+out
+over
+overall
+own
+p
+per
+perhaps
+q
+r
+rather
+recent
+recently
+s
+same
+seem
+seemed
+seeming
+seems
+seven
+seventy
+several
+she
+she'd
+she'll
+she's
+should
+shouldn't
+since
+six
+sixty
+so
+some
+somehow
+someone
+something
+sometime
+sometimes
+somewhere
+still
+stop
+such
+t
+taking
+ten
+than
+that
+that'll
+that's
+that've
+the
+their
+them
+themselves
+then
+thence
+there
+there'd
+there'll
+there're
+there's
+there've
+thereafter
+thereby
+therefore
+therein
+thereupon
+these
+they
+they'd
+they'll
+they're
+they've
+thirty
+this
+those
+though
+thousand
+three
+through
+throughout
+thru
+thus
+to
+together
+too
+toward
+towards
+trillion
+twenty
+two
+u
+under
+unless
+unlike
+unlikely
+until
+up
+upon
+us
+used
+using
+v
+very
+via
+w
+was
+wasn't
+we
+we'd
+we'll
+we're
+we've
+well
+were
+weren't
+what
+what'll
+what's
+what've
+whatever
+when
+whence
+whenever
+where
+where's
+whereafter
+whereas
+whereby
+wherein
+whereupon
+wherever
+whether
+which
+while
+whither
+who
+who'd
+who'll
+who's
+whoever
+whole
+whom
+whomever
+whose
+why
+will
+with
+within
+without
+won't
+would
+wouldn't
+x
+y
+yes
+yet
+you
+you'd
+you'll
+you're
+you've
+your
+yours
+yourself
+yourselves
+z
+# occuring in more than 100 files
+acc
+accent
+accents
+and
+are
+bell
+can
+character
+corrections
+crt
+daisy
+dash
+date
+defined
+definitions
+description
+devices
+diablo
+dummy
+factors
+following
+font
+for
+from
+fudge
+give
+have
+header
+holds
+log
+logo
+low
+lpr
+mark
+name
+nroff
+out
+output
+perl
+pitch
+put
+rcsfile
+reference
+resolution
+revision
+see
+set
+simple
+smi
+some
+string
+synopsis
+system
+that
+the
+this
+translation
+troff
+typewriter
+ucb
+unbreakable
+use
+used
+user
+vroff
+wheel
+will
+with
+you
+
+!NO!SUBS!

Added: branches/upstream/perlindex/current/t/basic.t
URL: http://svn.debian.org/wsvn/branches/upstream/perlindex/current/t/basic.t?rev=21528&op=file
==============================================================================
--- branches/upstream/perlindex/current/t/basic.t (added)
+++ branches/upstream/perlindex/current/t/basic.t Sun Jun 15 15:45:02 2008
@@ -1,0 +1,52 @@
+#!/usr/bin/perl -w
+#                              -*- Mode: Perl -*- 
+# $Basename$
+# $Revision: 1.3 $
+# Author          : Ulrich Pfeifer
+# Created On      : Wed Jun 18 19:44:37 2003
+# Last Modified By: Ulrich Pfeifer
+# Last Modified On: Thu Jun 19 11:25:36 2003
+# Language        : CPerl
+# 
+# (C) Copyright 2003, UUNET Deutschland GmbH, Germany
+# 
+use strict;
+use Test;
+BEGIN {
+    if (!eval {
+	require File::Temp;
+	require File::Spec;
+	1;
+    }) {
+	print "1..0 # SKIP: File::Temp and/or File::Spec not available, skipping tests\n";
+	exit(0);
+    }
+    File::Temp->import(qw(tempdir));
+}
+BEGIN { plan tests => 2, todo => [] }
+
+sub run {
+  my ($cmd, $test) = @_;
+
+  local $/;
+  open(SUB, "$^X $cmd < " . File::Spec->devnull . " 2>&1 |") or die $!;
+  my $result = <SUB>;
+  close SUB or return;
+
+  return &$test($result);
+}
+
+my $tmp = tempdir(CLEANUP => 1);
+
+ok(
+   run(
+       "-Mblib ./perlindex -idir $tmp --index README MANIFEST perlindex.PL",
+       sub { print "[[$_[0]]]\n"; $_[0] =~ /MANIFEST/ }
+      )
+  );
+ok(
+   run(
+       "-Mblib ./perlindex -idir $tmp --nomenu index",
+       sub { print "[[$_[0]]]\n"; $_[0] =~ /perlindex.PL/ }
+      )
+  );




More information about the Pkg-perl-cvs-commits mailing list