r35997 - in /packages/wordnet/trunk/debian: changelog control goldendict-wordnet.install goldendict-wordnet_abrv.dsl rules wn-for-goldendict.rb

tille at users.alioth.debian.org tille at users.alioth.debian.org
Wed Nov 18 07:24:12 UTC 2009


Author: tille
Date: Wed Nov 18 07:24:12 2009
New Revision: 35997

URL: http://svn.debian.org/wsvn/debian-science/?sc=1&rev=35997
Log:
Provide package with dictionary preformatted for goldendict

Added:
    packages/wordnet/trunk/debian/goldendict-wordnet.install
    packages/wordnet/trunk/debian/goldendict-wordnet_abrv.dsl
    packages/wordnet/trunk/debian/wn-for-goldendict.rb
Modified:
    packages/wordnet/trunk/debian/changelog
    packages/wordnet/trunk/debian/control
    packages/wordnet/trunk/debian/rules

Modified: packages/wordnet/trunk/debian/changelog
URL: http://svn.debian.org/wsvn/debian-science/packages/wordnet/trunk/debian/changelog?rev=35997&op=diff
==============================================================================
--- packages/wordnet/trunk/debian/changelog (original)
+++ packages/wordnet/trunk/debian/changelog Wed Nov 18 07:24:12 2009
@@ -1,3 +1,13 @@
+wordnet (1:3.0-19) unstable; urgency=low
+
+  * Added goldendict-wordnet package: it has been generated from
+    wordnet database by script which was written specially for
+    goldendict and other GUI dictionaries, closes: #555707.
+  * Added myself to uploaders list, thanks for permissions
+    to Andreas Tille.
+
+ -- Dmitry E. Oboukhov <unera at debian.org>  Thu, 12 Nov 2009 21:55:25 +0300
+
 wordnet (1:3.0-18) unstable; urgency=low
 
   * debian/patches/20_adj.all_fix.patch

Modified: packages/wordnet/trunk/debian/control
URL: http://svn.debian.org/wsvn/debian-science/packages/wordnet/trunk/debian/control?rev=35997&op=diff
==============================================================================
--- packages/wordnet/trunk/debian/control (original)
+++ packages/wordnet/trunk/debian/control Wed Nov 18 07:24:12 2009
@@ -2,11 +2,12 @@
 Section: text
 Build-Depends: cdbs (>= 0.4.23-1.1), autotools-dev, debhelper (>= 7), quilt,
  tk8.5-dev, tcl8.5-dev, libxaw7-dev, flex, dictzip, python, groff, gs-common,
- autoconf, automake, libtool, bison, man-db, libxss-dev, libxft-dev
+ autoconf, automake, libtool, bison, man-db, libxss-dev, libxft-dev, ruby
 Priority: optional
 Maintainer: Debian Science Team <debian-science-maintainers at lists.alioth.debian.org>
 DM-Upload-Allowed: yes
-Uploaders: Andreas Tille <tille at debian.org>
+Uploaders: Andreas Tille <tille at debian.org>,
+ Dmitry E. Oboukhov <unera at debian.org>
 Standards-Version: 3.8.3
 Vcs-Browser: http://svn.debian.org/wsvn/debian-science/packages/wordnet/trunk/?rev=0&sc=0
 Vcs-Svn: svn://svn.debian.org/svn/debian-science/packages/wordnet/trunk/
@@ -149,3 +150,22 @@
  .
  This package will be of limited use without the server found in the
  dictd package.
+
+Package: goldendict-wordnet
+Conflicts: wordnet-goldendict
+Architecture: all
+Depends: ${misc:Depends}
+Recommends: goldendict
+Description: electronic lexical database of English language for dict
+ WordNet(C) is an on-line lexical reference system whose design is
+ inspired by current psycholinguistic theories of human lexical
+ memory. English nouns, verbs, adjectives and adverbs are organized
+ into synonym sets, each representing one underlying lexical
+ concept. Different relations link the synonym sets.
+ .
+ WordNet was developed by the Cognitive Science Laboratory
+ (http://www.cogsci.princeton.edu/) at Princeton University under the
+ direction of Professor George A. Miller (Principal Investigator).
+ .
+ This package contains an adaptation wordnet database for such dictionaries
+ as goldendict.

Added: packages/wordnet/trunk/debian/goldendict-wordnet.install
URL: http://svn.debian.org/wsvn/debian-science/packages/wordnet/trunk/debian/goldendict-wordnet.install?rev=35997&op=file
==============================================================================
--- packages/wordnet/trunk/debian/goldendict-wordnet.install (added)
+++ packages/wordnet/trunk/debian/goldendict-wordnet.install Wed Nov 18 07:24:12 2009
@@ -1,0 +1,2 @@
+goldendict-wordnet.dsl.dz /usr/share/goldendict-wordnet/
+goldendict-wordnet_abrv.dsl /usr/share/goldendict-wordnet/

Added: packages/wordnet/trunk/debian/goldendict-wordnet_abrv.dsl
URL: http://svn.debian.org/wsvn/debian-science/packages/wordnet/trunk/debian/goldendict-wordnet_abrv.dsl?rev=35997&op=file
==============================================================================
--- packages/wordnet/trunk/debian/goldendict-wordnet_abrv.dsl (added)
+++ packages/wordnet/trunk/debian/goldendict-wordnet_abrv.dsl Wed Nov 18 07:24:12 2009
@@ -1,0 +1,64 @@
+#NAME "Abbreviations for WordNet 3.0 (En-En)"
+#INDEX_LANGUAGE "English"
+#CONTENTS_LANGUAGE "English"
+
+Freq.
+	Frequency count. The number of times each semantically tagged sense occurs in the Semantic Concordance files.
+Syn
+	Synonyms - words with the same meaning
+Ant
+	Antonyms - words with the opposite meaning
+Pertains to noun
+	Only for relational adjectives. For example, "medical" pertains to "medicine" and "musical" pertains to "music".
+Derived from adjective
+	Only for adverbs.
+Similar to
+	Similar to ...
+See Also
+	See Also ...
+Derivationally related forms
+	For example, a derivationally related form of "meter" is "metrical".
+Usage Domain
+	Usage Domains for this entry
+Topics
+	Topic Domains for this entry
+Regions
+	Region Domains for this entry
+Members of this Usage Domain
+	Members of this Usage Domain
+Members of this Topic
+	Members of this Topic
+Members of this Region
+	Members of this Region
+Hypernyms
+	The generic term used to designate a whole class of specific instances. Y  is a hypernym of X  if X is a (kind of) Y. E.g., "tree" is a hypernym of "oak".
+Instance Hypernyms
+	E.g., the instance hypernym of "Mississippi River" is "river".
+Hyponyms
+	The specific term used to designate a member of a class. X  is a hyponym of Y  if X  is a (kind of) Y. E.g., "oak" is a hyponym of "tree".
+Instance Hyponyms
+	Instance hyponyms represent specific instances of something. E.g., "Amazon River" is an instance hyponym of "river". 
+Member Holonyms
+	X is a member holonym of Y if Y is a member of X. E.g., "forest" is a member holonym of "tree".
+Substance Holonyms
+	X is a substance holonym of Y if Y is a substance of X. E.g., "air" is a substance holonym of "oxygen".
+Part Holonyms
+	X is a part holonym of Y if Y is a part of X. E.g., "bird" is a part holonym of "wing".
+Member Meronyms
+	X is a member meronym of Y if X is a member of Y. E.g., "tree" is a member meronym of "forest".
+Substance Meronyms
+	X is a substance of Y if X is a substance of Y. E.g., "oxygen" is a substance meronym of "air".
+Part Meronyms
+	X is a part meronym of Y if X is a part of Y. E.g., "wing" is a part meronym of "bird".
+Attrubites
+	Attribute is a noun for which adjectives express values. The noun "weight" is an attribute, for which the adjectives "light" and "heavy"  express values.
+Verb Group
+	Verb Group
+Entailment
+	A verb X entails Y if X  cannot be done unless Y is, or has been, done. E.g., "snore" entails "sleep".
+Cause
+	A verb X causes Y if X denotes the causation of the state or activity referred to by Y. E.g., "scare" causes "fear".
+Participle of verb
+	Participle of verb
+Verb Frames
+	Generic sentence frames illustrating the types of simple sentences in which the verb can be used.

Modified: packages/wordnet/trunk/debian/rules
URL: http://svn.debian.org/wsvn/debian-science/packages/wordnet/trunk/debian/rules?rev=35997&op=diff
==============================================================================
--- packages/wordnet/trunk/debian/rules (original)
+++ packages/wordnet/trunk/debian/rules Wed Nov 18 07:24:12 2009
@@ -28,4 +28,17 @@
 	rm -rf src/grind/grind-wnparse.[ch] src/grind/grind-wnlex.c
 	# Make sure that really all Makefiles in doc are deleted
 	rm -f `find doc -name Makefile`
+	rm -f goldendict-wordnet.dsl goldendict-wordnet.dsl.dz
+	rm -f goldendict-wordnet_abrv.dsl
 
+build/goldendict-wordnet:: goldendict-wordnet.dsl.dz goldendict-wordnet_abrv.dsl
+
+goldendict-wordnet_abrv.dsl: debian/goldendict-wordnet_abrv.dsl
+	echo -ne '\xff\xfe' > $@
+	iconv -t utf-16le $< >> $@
+
+goldendict-wordnet.dsl.dz: goldendict-wordnet.dsl
+	dictzip -k $<
+
+goldendict-wordnet.dsl:
+	ruby debian/wn-for-goldendict.rb > $@

Added: packages/wordnet/trunk/debian/wn-for-goldendict.rb
URL: http://svn.debian.org/wsvn/debian-science/packages/wordnet/trunk/debian/wn-for-goldendict.rb?rev=35997&op=file
==============================================================================
--- packages/wordnet/trunk/debian/wn-for-goldendict.rb (added)
+++ packages/wordnet/trunk/debian/wn-for-goldendict.rb Wed Nov 18 07:24:12 2009
@@ -1,0 +1,704 @@
+#!/usr/bin/env ruby
+
+# A script to convert WordNet 3.0 dictionary from original
+# format (http://wordnet.princeton.edu/wordnet/download/)
+# to DSL format, suitable for Lingvo and GoldenDict.
+#
+# This script is released into public domain with no
+# conditions. Use it as you see appropriate.
+
+# generates small part of dictionary, for testing purposes
+
+# This script was adapted to build debian package from exists debian src-
+# package (some paths were changed)
+
+$short = false
+
+$CARDS = {}
+$CARDS_COUNT = 0
+
+
+# INPUT FILES
+$data_file_noun     = 'dict/dbfiles/data.noun'
+$data_file_verb     = 'dict/dbfiles/data.verb'
+$data_file_adj      = 'dict/dbfiles/data.adj'
+$data_file_adv      = 'dict/dbfiles/data.adv'
+$data_file_sentidx  = 'dict/sentidx.vrb'
+$data_file_sent     = 'dict/sents.vrb'
+$data_file_cntlist  = 'dict/dbfiles/cntlist'
+$index_file_noun    = 'dict/dbfiles/index.noun'
+$index_file_verb    = 'dict/dbfiles/index.verb'
+$index_file_adj     = 'dict/dbfiles/index.adj'
+$index_file_adv     = 'dict/dbfiles/index.adv'
+
+# print UTF-8 BOM first
+print "\xEF\xBB\xBF"
+
+# Dictionary Header
+DIC_NAME = "WordNet 3.0. \(En-En\)"
+ABBR_DIC_NAME = "Abbreviations for #{DIC_NAME}"
+puts "\#NAME \"#{DIC_NAME}\""
+puts %q{#INDEX_LANGUAGE "English"
+#CONTENTS_LANGUAGE "English"}
+
+$noun_data = File.open($data_file_noun, 'rb')
+$verb_data = File.open($data_file_verb, 'rb')
+$adj_data = File.open($data_file_adj, 'rb')
+$adv_data = File.open($data_file_adv, 'rb')
+
+$LEMMA_IDX = {}
+
+$VERB_IDX = {}
+File.open($data_file_sentidx, 'rb') { |sentidx|
+  sentidx.each_line { |line|
+    d = line.split()
+    if (d.size != 2)
+      $stderr.puts "WARNING: sentidx.vrb format error: #{d.inspect}"
+    end
+    $VERB_IDX[d[0]] = d[1]
+  }
+}
+
+$VERB_PTRNS = {}
+File.open($data_file_sent, 'rb') { |f|
+  f.each_line { |line|
+    d = line.strip.split(/\s+/, 2)
+    if (d.size != 2)
+      $stderr.puts "WARNING: sents.vrb format error: #{d.inspect}"
+    end
+    $VERB_PTRNS[d[0]] = d[1]
+  }
+}
+
+$SENSE_COUNTS = {}
+File.open($data_file_cntlist, 'rb') { |f|
+  f.each_line { |line|
+    d = line.strip.split(/\s+/)
+    if (d.size != 3)
+      $stderr.puts "WARNING: sents.vrb format error: #{d.inspect}"
+    end
+    sense = d[1].gsub(/\((p|a|ip)\)/, '')
+    $SENSE_COUNTS[sense] = d[0].to_i
+  }
+}
+
+$POS = {'n'=> 'noun', 'v' => 'verb', 'a' => 'adjective', 's' => 'adjective', 'r' => 'adverb'}
+$POS_NUM = {'n'=> '1', 'v' => '2', 'a' => '3', 's' => '5', 'r' => '4'}
+$ROME = ['I', 'II', 'III', 'IV']
+
+$frames = [ nil,
+  "Something ----s",
+  "Somebody ----s",
+  "It is ----ing",
+  "Something is ----ing PP",
+  "Something ----s something Adjective/Noun",
+  "Something ----s Adjective/Noun",
+  "Somebody ----s Adjective",
+  "Somebody ----s something",
+  "Somebody ----s somebody",
+  "Something ----s somebody",
+  "Something ----s something",
+  "Something ----s to somebody",
+  "Somebody ----s on something",
+  "Somebody ----s somebody something",
+  "Somebody ----s something to somebody",
+  "Somebody ----s something from somebody",
+  "Somebody ----s somebody with something",
+  "Somebody ----s somebody of something",
+  "Somebody ----s something on somebody",
+  "Somebody ----s somebody PP",
+  "Somebody ----s something PP",
+  "Somebody ----s PP",
+  'Somebody\'s (body part) ----s',
+  "Somebody ----s somebody to INFINITIVE",
+  "Somebody ----s somebody INFINITIVE",
+  "Somebody ----s that CLAUSE",
+  "Somebody ----s to somebody",
+  "Somebody ----s to INFINITIVE",
+  "Somebody ----s whether INFINITIVE",
+  "Somebody ----s somebody into V-ing something",
+  "Somebody ----s something with something",
+  "Somebody ----s INFINITIVE",
+  "Somebody ----s VERB-ing",
+  "It ----s that CLAUSE",
+  "Something ----s INFINITIVE"
+]
+
+def progress(count)
+    if count == 'done'
+    	$stderr.puts("\n")
+    elsif count =~ /\D/
+        $stderr.puts(" " + count)
+    elsif (count % 10000 == 0)
+        $stderr.print "."
+    end
+end
+
+def get_data(offset, pos)
+  data_file = nil
+  case pos
+  when :n, 'n'
+    data_file = $noun_data
+  when :v, 'v'
+    data_file = $verb_data
+  when :a, 'a'
+    data_file = $adj_data
+  when :r, 'r'
+    data_file = $adv_data
+  else
+    $stderr.puts "WARN #7: get_data for unknown pos: #{pos}"
+    exit
+  end
+  data_file.seek(offset.to_i)
+  DataEntry.new(data_file.gets)
+end
+
+class Card
+  attr_reader :headword, :senses
+  def initialize(headword)
+    @headword = headword
+    @all_senses = []
+    adjectives = []
+    @senses = {'n'=>[], 'v' =>[], 'a' => adjectives, 's' => adjectives, 'r' => []}
+  end
+  def << (sense)
+    unless @all_senses.include?(sense)
+      @all_senses << sense
+      @senses[sense.pos] << sense
+    end
+  end
+  def <=> (card)
+    @headword.downcase <=> card.headword.downcase
+  end
+  def print_out
+    puts @headword
+    poses = 0
+    ['n', 'v', 'a', 'r'].each { |pos|
+      poses += 1 unless @senses[pos].empty?
+    }
+    pos_count = 0
+    ['n', 'v', 'a', 'r'].each { |pos|
+      pos_senses = @senses[pos]
+      if (pos_senses.size > 0)
+        if (poses > 1)
+          puts "\t[m0][b]#{$ROME[pos_count]}[/b][/m]"
+          pos_count += 1
+        end
+        puts "\t[m1][p]#{$POS[pos]}[/p][/m]"
+        sense_count = 1
+        pos_senses_total = pos_senses.size
+        pos_senses.sort {|x, y|
+          next 0 if $short
+
+          val1 = x.sense_key(@headword)
+          val2 = y.sense_key(@headword)
+          count1 = $SENSE_COUNTS[val1] || 0
+          count2 = $SENSE_COUNTS[val2] || 0
+
+          if (count1 + count2 > 0)
+            comp = count2 <=> count1 # reverse comparison here!
+            if comp != 0
+              next comp
+            end
+          end
+
+          idxEntry = x.idx
+          if (idxEntry.nil?)
+            $stderr.puts "No idxEntry for headword: #{@headword}"
+            exit
+          end
+          val1 = idxEntry.offsets.index(x.offset)
+          val2 = idxEntry.offsets.index(y.offset)          
+          if (val1.nil? || val2.nil?)
+            idxEntry = y.idx
+            if (idxEntry.nil?)
+              $stderr.puts "No idxEntry for headword: #{@headword}"
+              exit
+            end
+            val1 = idxEntry.offsets.index(x.offset)
+            val2 = idxEntry.offsets.index(y.offset)
+          end
+
+          if (val1.nil? || val2.nil?) # can't compare for some reasons...
+            0
+          else
+            idxEntry.offsets.index(x.offset) <=> idxEntry.offsets.index(y.offset)
+          end
+        }.each { |sense|
+          if (pos_senses_total > 1)
+            print "\t[m2][b]#{sense_count}.[/b] "
+            sense_count += 1
+          else
+            print "\t[m2] "
+          end
+          sense.print_out(@headword)
+        }
+      end
+    }
+  end
+end
+
+class IdxEntry
+  attr_accessor :offsets, :lemma, :senses
+  def initialize(str)
+    @senses = []
+    @str = str
+    data = str.split
+    @lemma = data[0]
+    @pos = data[1]
+    @synset_cnt = data[2].to_i
+    @p_cnt = data[3]
+    @pointers = ""
+    i = 3
+    Integer(@p_cnt).times {
+      i += 1
+      @pointers << data[i]
+    }
+    i += 1
+    @sense_cnt = data[i]
+    i += 1
+    @tagsense_cnt = data[i]
+    i += 1
+    @offsets = []
+    (i..data.size-1).each { |idx|
+      @offsets << data[idx].to_i
+    }
+    if (@offsets.size != @synset_cnt)
+      $stderr.puts "ERROR #1: size mismatch"
+      exit
+    end
+  end
+  def to_s
+    "#{@lemma}" # : POS: #{@pos}" #, Senses: #{@synset_cnt}"
+  end
+  def add_sense(sense)
+    sense.idx = self
+    @senses << sense
+    sense.each_headword { |hw|
+      ($CARDS[hw] ||= Card.new(hw)) << sense
+    }
+  end
+end
+
+class DataEntry
+  attr_accessor :words, :str, :pos, :idx, :offset, :lex_ids
+  def initialize(str)
+    @str = str
+    data = str.split
+    @offset = data[0].to_i
+    @lex_filenum = data[1]
+    @pos = data[2]
+    @w_cnt = [data[3]].pack('H2')[0]
+    @words = []
+    i = 4
+    @lex_ids = []
+    @w_cnt.times {
+      @words << data[i].gsub(/_/, ' ').gsub(/\s*\((p|a|ip)\)\s*$/, '')
+      i += 1
+      @lex_ids << [data[i]].pack('h')[0]
+      i += 1
+    }
+
+    @p_cnt = data[i].to_i
+    i += 1
+    @pointers = []
+    @p_cnt.times {
+      pointer = []
+      pointer << data[i]
+      pointer << data[i + 1]
+      pointer << data[i + 2]
+      pointer << data[i + 3]
+      i += 4
+      @pointers << pointer
+    }
+
+    @frames = []
+    # everything from this point up to the "|" is verb frames data
+    if data[i] != "|" # we found a verb frame
+      f_cnt = data[i].to_i
+      i += 1
+      if (f_cnt == 0)
+        $stderr.puts "ERROR: 0 number of verb frames specified"
+        exit
+      end
+      
+      f_cnt.times {
+        if (data[i] != "+")
+          $stderr.puts "ERROR: wrong verb frame format!"
+          exit
+        end
+        i += 1
+        @frames << [data[i], data[i + 1]]
+        i += 2
+      }
+    end
+
+    if data[i] != "|"
+      $stderr.puts "ERROR: expected '|' separator, but got: #{data[i]}"
+      exit
+    end
+    i += 1
+
+    @gloss = data[i, data.size - i].join(" ").gsub(/\[/, '\[').gsub(/\]/, '\]')
+    @gloss_str = ""
+  end
+  def == (other)
+    @str == other.str
+  end
+  def each_headword
+    @words.each { |w|
+      yield w
+    }
+  end
+  def to_s
+    "Set: #{@words.inspect}, P_CNT: #{@p_cnt}, Pointers: #{@pointers.inspect}, Gloss: #{@gloss}"
+  end
+  def get_pointer_data(headword, other, src_target)
+    if (src_target == "0000")
+      return other.words
+    else
+      src = [src_target[0, 2]].pack('H2')[0]
+      target = [src_target[2, 2]].pack('H2')[0]
+      h_src = words[src - 1]
+      if (h_src == headword)
+        return [other.words[target - 1]]
+      else
+        return ["#{make_link(other.words[target - 1])} [c darkgray](for: #{make_link(words[src - 1])})[/c]"]
+      end
+    end
+  end
+  def get_frame_data(headword, frame)
+    f_num = frame[0].to_i
+    w_num = [frame[1]].pack('H2')[0]
+    if (w_num == 0)
+      return [$frames[f_num]]
+    else
+      if (w_num < 1)
+        $stderr.puts "ERROR: w_num is invalid!"
+        exit
+      end
+      h_src = words[w_num - 1]
+      if (h_src == headword)
+        return [$frames[f_num]]
+      else
+        return ["[*][ex]#{$frames[f_num]}[/ex][/*]  [c darkgray](for: #{make_link(h_src)})[/c]"]
+      end
+    end
+  end
+  def sense_key(headword)
+    i = @words.index(headword)
+    if (i.nil?)
+      $stderr.puts "ERROR: can't find index for the headword: #{headword}"
+      exit
+    end
+    res = "#{headword.downcase.gsub(/\s+/, '_')}%#{$POS_NUM[@pos]}:#{@lex_filenum}:#{sprintf('%02d', @lex_ids[i])}"
+    if (@pos != 's')
+      res << "::"
+    else
+      @pointers.each {|ptr|
+        if (ptr[0] == "&") # similar to
+          similars = get_data(ptr[1], ptr[2])
+          res << ":#{similars.words[0]}:#{sprintf('%02d',similars.lex_ids[0])}"
+        end
+      }
+    end
+    res
+  end
+  def freq_count(headword)
+    $SENSE_COUNTS[sense_key(headword)] || 0
+  end
+  def print_out(headword)
+    $headword = headword
+
+    str1 = ""
+    exa = false
+    extra = ""
+    freq = if (freq_count(headword) > 0)
+      " [com][c darkgray]([p]Freq.[/p] #{freq_count(headword)})[/c][/com]"
+    else
+      ""
+    end
+
+    @gloss.split(';').each { |s|
+      s = "#{extra}; #{s}" unless extra.empty?
+      extra = ""
+      
+      # detect broken quotations
+      if s.gsub(/[^"]/, '').size % 2 != 0
+        extra = s
+        next
+      end
+      
+      if s =~ /^\s*(".*)$/ # example
+        unless freq.empty?
+          str1 << freq
+          freq = ""
+        end
+        example = $1.gsub(/^"(.*)"$/, '\1')
+        str1 << "[/m]\n\t[m3]- [*][ex]#{example}[/ex][/*]"
+        exa = true
+      else
+        if (exa)
+          str1 << "[/m]\n\t[m3]"
+        end
+        s = "[trn]#{s.strip.gsub(/(\(.*?\))/, '[i]\1[/i]')}[/trn]"
+        if (str1.empty?)
+          str1 << s
+        else
+          if (exa)
+            str1 << s
+          else
+            str1 << "; #{s}"
+          end
+        end
+        exa = false
+      end
+    }
+
+    puts "#{str1}#{freq}[/m]"
+
+    print_array(@words, 'Syn', "[c blue]•[/c]")
+
+    antonyms = []
+    pertainyms = []
+    derivs = []
+    deriv_rels = []
+    topics = []
+    regions = []
+    usages = []
+    m_topics = []
+    m_regions = []
+    m_usages = []
+    hypers = []
+    inst_hypers = []
+    hypos = []
+    inst_hypos = []
+    m_holos = []
+    s_holos = []
+    p_holos = []
+    m_meros = []
+    s_meros = []
+    p_meros = []
+    attribs = []
+    verb_group = []
+    ents = []
+    alsos = []
+    causes = []
+    similars = []
+    part_verbs = []
+    @pointers.each {|ptr|
+      if (ptr[0] == '!')     # antonym
+        antonyms += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+      elsif (ptr[0] == "\\") # pertainym or deriv. from adjective
+        if (@pos == 'r') # adverb
+          derivs += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+        elsif (@pos == 'a' || @pos == 's') # adjective
+          pertainyms += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+        else
+          $stderr.puts "ERROR: unexpected POS for slash: #{@pos}"
+          exit
+        end
+      elsif (ptr[0] == "=") # attributes
+        attribs += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+      elsif (ptr[0] == ";c") # topics domain
+        topics += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+      elsif (ptr[0] == ";r") # regions domain
+        regions += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+      elsif (ptr[0] == ";u") # usage domain
+        usages += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+      elsif (ptr[0] == "-c") # topics domain
+        m_topics += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+      elsif (ptr[0] == "-r") # regions domain
+        m_regions += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+      elsif (ptr[0] == "-u") # usage domain
+        m_usages += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+      elsif (ptr[0] == '$') # verb group
+        verb_group += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+      elsif (ptr[0] == '*') # entailment
+        ents += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+      elsif (ptr[0] == '^') # see also
+        alsos += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+      elsif (ptr[0] == '>') # see also
+        causes += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+      elsif (ptr[0] == '+')  # deriv related form
+        deriv_rels += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+      elsif (ptr[0] == "@") # hypernyms
+        hypers += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+      elsif (ptr[0] == "@i") # instance hypernyms
+        inst_hypers += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+      elsif (ptr[0] == "~") # hyponyms
+        hypos += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+      elsif (ptr[0] == "~i") # instance hyponyms
+        inst_hypos += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+      elsif (ptr[0] == "#m") # m holonyms
+        m_holos += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+      elsif (ptr[0] == "#s") # s holonyms
+        s_holos += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+      elsif (ptr[0] == "#p") # p holonyms
+        p_holos += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+      elsif (ptr[0] == "%m") # m meronyms
+        m_meros += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+      elsif (ptr[0] == "%s") # s meronyms
+        s_meros += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+      elsif (ptr[0] == "%p") # p meronyms
+        p_meros += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+      elsif (ptr[0] == "&") # similar to
+        similars += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+      elsif (ptr[0] == "<") # similar to
+        part_verbs += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+      else
+        $stderr.puts "WARN #8: Unknown pointer type #{ptr[0]}"
+      end
+    }
+
+    print_array(antonyms, 'Ant', "[c red]•[/c]")
+    print_array(derivs, 'Derived from adjective', "[c deepskyblue]•[/c]")
+    print_array(pertainyms, 'Pertains to noun', "[c deepskyblue]•[/c]")
+    print_array(similars, 'Similar to', "[c darkturquoise]•[/c]")
+    print_array(alsos, 'See Also', "[c darkturquoise]•[/c]")
+    print_array(deriv_rels, 'Derivationally related forms', "[c dodgerblue]•[/c]")
+
+    print_array(usages, 'Usage Domain', "[c darkorchid]•[/c]")
+    print_array(topics, 'Topics', "[c darkorchid]•[/c]")
+    print_array(regions, 'Regions', "[c darkorchid]•[/c]")
+    print_array(m_usages, 'Members of this Usage Domain')
+    print_array(m_topics, 'Members of this Topic')
+    print_array(m_regions, 'Members of this Region')
+
+    print_array(hypers, 'Hypernyms')
+    print_array(inst_hypers, 'Instance Hypernyms')
+
+    print_array(hypos, 'Hyponyms')
+    print_array(inst_hypos, 'Instance Hyponyms')
+    
+    print_array(m_holos, 'Member Holonyms')
+    print_array(s_holos, 'Substance Holonyms')
+    print_array(p_holos, 'Part Holonyms')
+
+    print_array(m_meros, 'Member Meronyms')
+    print_array(s_meros, 'Substance Meronyms')
+    print_array(p_meros, 'Part Meronyms')
+
+    print_array(attribs, 'Attrubites', "[c yellow]•[/c]")
+
+    print_array(verb_group, 'Verb Group', "[c maroon]•[/c]")    
+    print_array(ents, 'Entailment')
+    print_array(causes, 'Cause')
+
+    print_array(part_verbs, "Participle of verb")
+    
+    verb_sentences = []
+    unless (@frames.empty?)
+      puts "\t[m3][com][c maroon]•[/c] [p]Verb Frames[/p]:[/com][/m]"
+      @frames.each {|frame|
+        verb_sentences += get_frame_data(headword, frame)
+      }
+    end
+
+    if @pos == 'v' # only for verbs
+      key = sense_key(headword)
+      values = $VERB_IDX[key]
+      if (values)
+        values.split(/,/).each { |value|
+          verb_sentences << $VERB_PTRNS[value].gsub(/%s/, headword)
+        }
+      end
+    end
+
+    verb_sentences.each { |sentence|
+      if sentence =~ /\[ex\]/
+        puts "\t[m4]- #{sentence}[/m]"
+      else
+        puts "\t[m4]- [*][ex]#{sentence}[/ex][/*][/m]"
+      end
+    }
+  end
+  def print_array(a, label, prefix = "[c darkgray]•[/c]")
+    a -= [$headword]
+    a.uniq!
+    separator = if (a.size > 6)
+      "[/m]\n\t[m4]"
+    else
+      ""
+    end
+    puts "\t[m3][com]#{prefix} [p]#{label}[/p]:#{separator} #{a.collect{|x| make_link(x)}.join(', ')}[/com][/m]" unless a.empty?
+  end
+  def make_link(target)
+    target = target.strip
+    if (target =~ /<<.+>>/)
+      target
+    else
+      # no need to validate links, the format is good, no broken links
+      "<<#{target}>>"
+    end
+  end
+end
+
+count = 0
+
+File.foreach($index_file_noun) { |idx_line|
+  next if idx_line =~ /^\s\s/
+  entry = IdxEntry.new(idx_line)
+  entry.offsets.each { |offset|
+    d_entry = get_data(offset, :n)
+    entry.add_sense(d_entry)
+  }
+  count += 1
+  break if count == 600 && $short
+  progress(count)
+}
+progress($index_file_noun + " was processed");
+
+File.foreach($index_file_verb) { |idx_line|
+  next if idx_line =~ /^\s\s/
+  entry = IdxEntry.new(idx_line)
+  entry.offsets.each { |offset|
+    d_entry = get_data(offset, :v)
+    entry.add_sense(d_entry)
+  }
+  count += 1
+  break if count == 1200 && $short
+  progress(count)
+}
+progress($index_file_verb + " was processed");
+
+File.foreach($index_file_adj) { |idx_line|
+  next if idx_line =~ /^\s\s/
+  entry = IdxEntry.new(idx_line)
+  entry.offsets.each { |offset|
+    d_entry = get_data(offset, :a)
+    entry.add_sense(d_entry)
+  }
+  count += 1
+  break if count == 1800 && $short
+  progress(count)
+}
+progress($index_file_adj + " was processed");
+
+File.foreach($index_file_adv) { |idx_line|
+  next if idx_line =~ /^\s\s/
+  entry = IdxEntry.new(idx_line)
+  entry.offsets.each { |offset|
+    d_entry = get_data(offset, :r)
+    entry.add_sense(d_entry)
+  }
+  count += 1
+  break if count == 2400 && $short
+  progress(count)
+}
+progress($index_file_adj + " was processed");
+
+card_count = 0
+$CARDS.values.sort.each { |card|
+  card.print_out
+  card_count += 1
+  progress(card_count)
+}
+progress("CARDS were processed");
+
+$noun_data.close
+$verb_data.close
+$adj_data.close
+$adv_data.close
+
+$stderr.puts "TOTAL CARDS: #{$CARDS.size}" 




More information about the debian-science-commits mailing list