r35997 - in /packages/wordnet/trunk/debian: changelog control goldendict-wordnet.install goldendict-wordnet_abrv.dsl rules wn-for-goldendict.rb
tille at users.alioth.debian.org
tille at users.alioth.debian.org
Wed Nov 18 07:24:12 UTC 2009
Author: tille
Date: Wed Nov 18 07:24:12 2009
New Revision: 35997
URL: http://svn.debian.org/wsvn/debian-science/?sc=1&rev=35997
Log:
Provide package with dictionary preformatted for goldendict
Added:
packages/wordnet/trunk/debian/goldendict-wordnet.install
packages/wordnet/trunk/debian/goldendict-wordnet_abrv.dsl
packages/wordnet/trunk/debian/wn-for-goldendict.rb
Modified:
packages/wordnet/trunk/debian/changelog
packages/wordnet/trunk/debian/control
packages/wordnet/trunk/debian/rules
Modified: packages/wordnet/trunk/debian/changelog
URL: http://svn.debian.org/wsvn/debian-science/packages/wordnet/trunk/debian/changelog?rev=35997&op=diff
==============================================================================
--- packages/wordnet/trunk/debian/changelog (original)
+++ packages/wordnet/trunk/debian/changelog Wed Nov 18 07:24:12 2009
@@ -1,3 +1,13 @@
+wordnet (1:3.0-19) unstable; urgency=low
+
+ * Added goldendict-wordnet package: it has been generated from
+ wordnet database by script which was written specially for
+ goldendict and other GUI dictionaries, closes: #555707.
+ * Added myself to uploaders list, thanks for permissions
+ to Andreas Tille.
+
+ -- Dmitry E. Oboukhov <unera at debian.org> Thu, 12 Nov 2009 21:55:25 +0300
+
wordnet (1:3.0-18) unstable; urgency=low
* debian/patches/20_adj.all_fix.patch
Modified: packages/wordnet/trunk/debian/control
URL: http://svn.debian.org/wsvn/debian-science/packages/wordnet/trunk/debian/control?rev=35997&op=diff
==============================================================================
--- packages/wordnet/trunk/debian/control (original)
+++ packages/wordnet/trunk/debian/control Wed Nov 18 07:24:12 2009
@@ -2,11 +2,12 @@
Section: text
Build-Depends: cdbs (>= 0.4.23-1.1), autotools-dev, debhelper (>= 7), quilt,
tk8.5-dev, tcl8.5-dev, libxaw7-dev, flex, dictzip, python, groff, gs-common,
- autoconf, automake, libtool, bison, man-db, libxss-dev, libxft-dev
+ autoconf, automake, libtool, bison, man-db, libxss-dev, libxft-dev, ruby
Priority: optional
Maintainer: Debian Science Team <debian-science-maintainers at lists.alioth.debian.org>
DM-Upload-Allowed: yes
-Uploaders: Andreas Tille <tille at debian.org>
+Uploaders: Andreas Tille <tille at debian.org>,
+ Dmitry E. Oboukhov <unera at debian.org>
Standards-Version: 3.8.3
Vcs-Browser: http://svn.debian.org/wsvn/debian-science/packages/wordnet/trunk/?rev=0&sc=0
Vcs-Svn: svn://svn.debian.org/svn/debian-science/packages/wordnet/trunk/
@@ -149,3 +150,22 @@
.
This package will be of limited use without the server found in the
dictd package.
+
+Package: goldendict-wordnet
+Conflicts: wordnet-goldendict
+Architecture: all
+Depends: ${misc:Depends}
+Recommends: goldendict
+Description: electronic lexical database of English language for dict
+ WordNet(C) is an on-line lexical reference system whose design is
+ inspired by current psycholinguistic theories of human lexical
+ memory. English nouns, verbs, adjectives and adverbs are organized
+ into synonym sets, each representing one underlying lexical
+ concept. Different relations link the synonym sets.
+ .
+ WordNet was developed by the Cognitive Science Laboratory
+ (http://www.cogsci.princeton.edu/) at Princeton University under the
+ direction of Professor George A. Miller (Principal Investigator).
+ .
+ This package contains an adaptation wordnet database for such dictionaries
+ as goldendict.
Added: packages/wordnet/trunk/debian/goldendict-wordnet.install
URL: http://svn.debian.org/wsvn/debian-science/packages/wordnet/trunk/debian/goldendict-wordnet.install?rev=35997&op=file
==============================================================================
--- packages/wordnet/trunk/debian/goldendict-wordnet.install (added)
+++ packages/wordnet/trunk/debian/goldendict-wordnet.install Wed Nov 18 07:24:12 2009
@@ -1,0 +1,2 @@
+goldendict-wordnet.dsl.dz /usr/share/goldendict-wordnet/
+goldendict-wordnet_abrv.dsl /usr/share/goldendict-wordnet/
Added: packages/wordnet/trunk/debian/goldendict-wordnet_abrv.dsl
URL: http://svn.debian.org/wsvn/debian-science/packages/wordnet/trunk/debian/goldendict-wordnet_abrv.dsl?rev=35997&op=file
==============================================================================
--- packages/wordnet/trunk/debian/goldendict-wordnet_abrv.dsl (added)
+++ packages/wordnet/trunk/debian/goldendict-wordnet_abrv.dsl Wed Nov 18 07:24:12 2009
@@ -1,0 +1,64 @@
+#NAME "Abbreviations for WordNet 3.0 (En-En)"
+#INDEX_LANGUAGE "English"
+#CONTENTS_LANGUAGE "English"
+
+Freq.
+ Frequency count. The number of times each semantically tagged sense occurs in the Semantic Concordance files.
+Syn
+ Synonyms - words with the same meaning
+Ant
+ Antonyms - words with the opposite meaning
+Pertains to noun
+ Only for relational adjectives. For example, "medical" pertains to "medicine" and "musical" pertains to "music".
+Derived from adjective
+ Only for adverbs.
+Similar to
+ Similar to ...
+See Also
+ See Also ...
+Derivationally related forms
+ For example, a derivationally related form of "meter" is "metrical".
+Usage Domain
+ Usage Domains for this entry
+Topics
+ Topic Domains for this entry
+Regions
+ Region Domains for this entry
+Members of this Usage Domain
+ Members of this Usage Domain
+Members of this Topic
+ Members of this Topic
+Members of this Region
+ Members of this Region
+Hypernyms
+ The generic term used to designate a whole class of specific instances. Y is a hypernym of X if X is a (kind of) Y. E.g., "tree" is a hypernym of "oak".
+Instance Hypernyms
+ E.g., the instance hypernym of "Mississippi River" is "river".
+Hyponyms
+ The specific term used to designate a member of a class. X is a hyponym of Y if X is a (kind of) Y. E.g., "oak" is a hyponym of "tree".
+Instance Hyponyms
+ Instance hyponyms represent specific instances of something. E.g., "Amazon River" is an instance hyponym of "river".
+Member Holonyms
+ X is a member holonym of Y if Y is a member of X. E.g., "forest" is a member holonym of "tree".
+Substance Holonyms
+ X is a substance holonym of Y if Y is a substance of X. E.g., "air" is a substance holonym of "oxygen".
+Part Holonyms
+ X is a part holonym of Y if Y is a part of X. E.g., "bird" is a part holonym of "wing".
+Member Meronyms
+ X is a member meronym of Y if X is a member of Y. E.g., "tree" is a member meronym of "forest".
+Substance Meronyms
+ X is a substance of Y if X is a substance of Y. E.g., "oxygen" is a substance meronym of "air".
+Part Meronyms
+ X is a part meronym of Y if X is a part of Y. E.g., "wing" is a part meronym of "bird".
+Attrubites
+ Attribute is a noun for which adjectives express values. The noun "weight" is an attribute, for which the adjectives "light" and "heavy" express values.
+Verb Group
+ Verb Group
+Entailment
+ A verb X entails Y if X cannot be done unless Y is, or has been, done. E.g., "snore" entails "sleep".
+Cause
+ A verb X causes Y if X denotes the causation of the state or activity referred to by Y. E.g., "scare" causes "fear".
+Participle of verb
+ Participle of verb
+Verb Frames
+ Generic sentence frames illustrating the types of simple sentences in which the verb can be used.
Modified: packages/wordnet/trunk/debian/rules
URL: http://svn.debian.org/wsvn/debian-science/packages/wordnet/trunk/debian/rules?rev=35997&op=diff
==============================================================================
--- packages/wordnet/trunk/debian/rules (original)
+++ packages/wordnet/trunk/debian/rules Wed Nov 18 07:24:12 2009
@@ -28,4 +28,17 @@
rm -rf src/grind/grind-wnparse.[ch] src/grind/grind-wnlex.c
# Make sure that really all Makefiles in doc are deleted
rm -f `find doc -name Makefile`
+ rm -f goldendict-wordnet.dsl goldendict-wordnet.dsl.dz
+ rm -f goldendict-wordnet_abrv.dsl
+build/goldendict-wordnet:: goldendict-wordnet.dsl.dz goldendict-wordnet_abrv.dsl
+
+goldendict-wordnet_abrv.dsl: debian/goldendict-wordnet_abrv.dsl
+ echo -ne '\xff\xfe' > $@
+ iconv -t utf-16le $< >> $@
+
+goldendict-wordnet.dsl.dz: goldendict-wordnet.dsl
+ dictzip -k $<
+
+goldendict-wordnet.dsl:
+ ruby debian/wn-for-goldendict.rb > $@
Added: packages/wordnet/trunk/debian/wn-for-goldendict.rb
URL: http://svn.debian.org/wsvn/debian-science/packages/wordnet/trunk/debian/wn-for-goldendict.rb?rev=35997&op=file
==============================================================================
--- packages/wordnet/trunk/debian/wn-for-goldendict.rb (added)
+++ packages/wordnet/trunk/debian/wn-for-goldendict.rb Wed Nov 18 07:24:12 2009
@@ -1,0 +1,704 @@
+#!/usr/bin/env ruby
+
+# A script to convert WordNet 3.0 dictionary from original
+# format (http://wordnet.princeton.edu/wordnet/download/)
+# to DSL format, suitable for Lingvo and GoldenDict.
+#
+# This script is released into public domain with no
+# conditions. Use it as you see appropriate.
+
+# generates small part of dictionary, for testing purposes
+
+# This script was adapted to build debian package from exists debian src-
+# package (some paths were changed)
+
+$short = false
+
+$CARDS = {}
+$CARDS_COUNT = 0
+
+
+# INPUT FILES
+$data_file_noun = 'dict/dbfiles/data.noun'
+$data_file_verb = 'dict/dbfiles/data.verb'
+$data_file_adj = 'dict/dbfiles/data.adj'
+$data_file_adv = 'dict/dbfiles/data.adv'
+$data_file_sentidx = 'dict/sentidx.vrb'
+$data_file_sent = 'dict/sents.vrb'
+$data_file_cntlist = 'dict/dbfiles/cntlist'
+$index_file_noun = 'dict/dbfiles/index.noun'
+$index_file_verb = 'dict/dbfiles/index.verb'
+$index_file_adj = 'dict/dbfiles/index.adj'
+$index_file_adv = 'dict/dbfiles/index.adv'
+
+# print UTF-8 BOM first
+print "\xEF\xBB\xBF"
+
+# Dictionary Header
+DIC_NAME = "WordNet 3.0. \(En-En\)"
+ABBR_DIC_NAME = "Abbreviations for #{DIC_NAME}"
+puts "\#NAME \"#{DIC_NAME}\""
+puts %q{#INDEX_LANGUAGE "English"
+#CONTENTS_LANGUAGE "English"}
+
+$noun_data = File.open($data_file_noun, 'rb')
+$verb_data = File.open($data_file_verb, 'rb')
+$adj_data = File.open($data_file_adj, 'rb')
+$adv_data = File.open($data_file_adv, 'rb')
+
+$LEMMA_IDX = {}
+
+$VERB_IDX = {}
+File.open($data_file_sentidx, 'rb') { |sentidx|
+ sentidx.each_line { |line|
+ d = line.split()
+ if (d.size != 2)
+ $stderr.puts "WARNING: sentidx.vrb format error: #{d.inspect}"
+ end
+ $VERB_IDX[d[0]] = d[1]
+ }
+}
+
+$VERB_PTRNS = {}
+File.open($data_file_sent, 'rb') { |f|
+ f.each_line { |line|
+ d = line.strip.split(/\s+/, 2)
+ if (d.size != 2)
+ $stderr.puts "WARNING: sents.vrb format error: #{d.inspect}"
+ end
+ $VERB_PTRNS[d[0]] = d[1]
+ }
+}
+
+$SENSE_COUNTS = {}
+File.open($data_file_cntlist, 'rb') { |f|
+ f.each_line { |line|
+ d = line.strip.split(/\s+/)
+ if (d.size != 3)
+ $stderr.puts "WARNING: sents.vrb format error: #{d.inspect}"
+ end
+ sense = d[1].gsub(/\((p|a|ip)\)/, '')
+ $SENSE_COUNTS[sense] = d[0].to_i
+ }
+}
+
+$POS = {'n'=> 'noun', 'v' => 'verb', 'a' => 'adjective', 's' => 'adjective', 'r' => 'adverb'}
+$POS_NUM = {'n'=> '1', 'v' => '2', 'a' => '3', 's' => '5', 'r' => '4'}
+$ROME = ['I', 'II', 'III', 'IV']
+
+$frames = [ nil,
+ "Something ----s",
+ "Somebody ----s",
+ "It is ----ing",
+ "Something is ----ing PP",
+ "Something ----s something Adjective/Noun",
+ "Something ----s Adjective/Noun",
+ "Somebody ----s Adjective",
+ "Somebody ----s something",
+ "Somebody ----s somebody",
+ "Something ----s somebody",
+ "Something ----s something",
+ "Something ----s to somebody",
+ "Somebody ----s on something",
+ "Somebody ----s somebody something",
+ "Somebody ----s something to somebody",
+ "Somebody ----s something from somebody",
+ "Somebody ----s somebody with something",
+ "Somebody ----s somebody of something",
+ "Somebody ----s something on somebody",
+ "Somebody ----s somebody PP",
+ "Somebody ----s something PP",
+ "Somebody ----s PP",
+ 'Somebody\'s (body part) ----s',
+ "Somebody ----s somebody to INFINITIVE",
+ "Somebody ----s somebody INFINITIVE",
+ "Somebody ----s that CLAUSE",
+ "Somebody ----s to somebody",
+ "Somebody ----s to INFINITIVE",
+ "Somebody ----s whether INFINITIVE",
+ "Somebody ----s somebody into V-ing something",
+ "Somebody ----s something with something",
+ "Somebody ----s INFINITIVE",
+ "Somebody ----s VERB-ing",
+ "It ----s that CLAUSE",
+ "Something ----s INFINITIVE"
+]
+
+def progress(count)
+ if count == 'done'
+ $stderr.puts("\n")
+ elsif count =~ /\D/
+ $stderr.puts(" " + count)
+ elsif (count % 10000 == 0)
+ $stderr.print "."
+ end
+end
+
+def get_data(offset, pos)
+ data_file = nil
+ case pos
+ when :n, 'n'
+ data_file = $noun_data
+ when :v, 'v'
+ data_file = $verb_data
+ when :a, 'a'
+ data_file = $adj_data
+ when :r, 'r'
+ data_file = $adv_data
+ else
+ $stderr.puts "WARN #7: get_data for unknown pos: #{pos}"
+ exit
+ end
+ data_file.seek(offset.to_i)
+ DataEntry.new(data_file.gets)
+end
+
+class Card
+ attr_reader :headword, :senses
+ def initialize(headword)
+ @headword = headword
+ @all_senses = []
+ adjectives = []
+ @senses = {'n'=>[], 'v' =>[], 'a' => adjectives, 's' => adjectives, 'r' => []}
+ end
+ def << (sense)
+ unless @all_senses.include?(sense)
+ @all_senses << sense
+ @senses[sense.pos] << sense
+ end
+ end
+ def <=> (card)
+ @headword.downcase <=> card.headword.downcase
+ end
+ def print_out
+ puts @headword
+ poses = 0
+ ['n', 'v', 'a', 'r'].each { |pos|
+ poses += 1 unless @senses[pos].empty?
+ }
+ pos_count = 0
+ ['n', 'v', 'a', 'r'].each { |pos|
+ pos_senses = @senses[pos]
+ if (pos_senses.size > 0)
+ if (poses > 1)
+ puts "\t[m0][b]#{$ROME[pos_count]}[/b][/m]"
+ pos_count += 1
+ end
+ puts "\t[m1][p]#{$POS[pos]}[/p][/m]"
+ sense_count = 1
+ pos_senses_total = pos_senses.size
+ pos_senses.sort {|x, y|
+ next 0 if $short
+
+ val1 = x.sense_key(@headword)
+ val2 = y.sense_key(@headword)
+ count1 = $SENSE_COUNTS[val1] || 0
+ count2 = $SENSE_COUNTS[val2] || 0
+
+ if (count1 + count2 > 0)
+ comp = count2 <=> count1 # reverse comparison here!
+ if comp != 0
+ next comp
+ end
+ end
+
+ idxEntry = x.idx
+ if (idxEntry.nil?)
+ $stderr.puts "No idxEntry for headword: #{@headword}"
+ exit
+ end
+ val1 = idxEntry.offsets.index(x.offset)
+ val2 = idxEntry.offsets.index(y.offset)
+ if (val1.nil? || val2.nil?)
+ idxEntry = y.idx
+ if (idxEntry.nil?)
+ $stderr.puts "No idxEntry for headword: #{@headword}"
+ exit
+ end
+ val1 = idxEntry.offsets.index(x.offset)
+ val2 = idxEntry.offsets.index(y.offset)
+ end
+
+ if (val1.nil? || val2.nil?) # can't compare for some reasons...
+ 0
+ else
+ idxEntry.offsets.index(x.offset) <=> idxEntry.offsets.index(y.offset)
+ end
+ }.each { |sense|
+ if (pos_senses_total > 1)
+ print "\t[m2][b]#{sense_count}.[/b] "
+ sense_count += 1
+ else
+ print "\t[m2] "
+ end
+ sense.print_out(@headword)
+ }
+ end
+ }
+ end
+end
+
+class IdxEntry
+ attr_accessor :offsets, :lemma, :senses
+ def initialize(str)
+ @senses = []
+ @str = str
+ data = str.split
+ @lemma = data[0]
+ @pos = data[1]
+ @synset_cnt = data[2].to_i
+ @p_cnt = data[3]
+ @pointers = ""
+ i = 3
+ Integer(@p_cnt).times {
+ i += 1
+ @pointers << data[i]
+ }
+ i += 1
+ @sense_cnt = data[i]
+ i += 1
+ @tagsense_cnt = data[i]
+ i += 1
+ @offsets = []
+ (i..data.size-1).each { |idx|
+ @offsets << data[idx].to_i
+ }
+ if (@offsets.size != @synset_cnt)
+ $stderr.puts "ERROR #1: size mismatch"
+ exit
+ end
+ end
+ def to_s
+ "#{@lemma}" # : POS: #{@pos}" #, Senses: #{@synset_cnt}"
+ end
+ def add_sense(sense)
+ sense.idx = self
+ @senses << sense
+ sense.each_headword { |hw|
+ ($CARDS[hw] ||= Card.new(hw)) << sense
+ }
+ end
+end
+
+class DataEntry
+ attr_accessor :words, :str, :pos, :idx, :offset, :lex_ids
+ def initialize(str)
+ @str = str
+ data = str.split
+ @offset = data[0].to_i
+ @lex_filenum = data[1]
+ @pos = data[2]
+ @w_cnt = [data[3]].pack('H2')[0]
+ @words = []
+ i = 4
+ @lex_ids = []
+ @w_cnt.times {
+ @words << data[i].gsub(/_/, ' ').gsub(/\s*\((p|a|ip)\)\s*$/, '')
+ i += 1
+ @lex_ids << [data[i]].pack('h')[0]
+ i += 1
+ }
+
+ @p_cnt = data[i].to_i
+ i += 1
+ @pointers = []
+ @p_cnt.times {
+ pointer = []
+ pointer << data[i]
+ pointer << data[i + 1]
+ pointer << data[i + 2]
+ pointer << data[i + 3]
+ i += 4
+ @pointers << pointer
+ }
+
+ @frames = []
+ # everything from this point up to the "|" is verb frames data
+ if data[i] != "|" # we found a verb frame
+ f_cnt = data[i].to_i
+ i += 1
+ if (f_cnt == 0)
+ $stderr.puts "ERROR: 0 number of verb frames specified"
+ exit
+ end
+
+ f_cnt.times {
+ if (data[i] != "+")
+ $stderr.puts "ERROR: wrong verb frame format!"
+ exit
+ end
+ i += 1
+ @frames << [data[i], data[i + 1]]
+ i += 2
+ }
+ end
+
+ if data[i] != "|"
+ $stderr.puts "ERROR: expected '|' separator, but got: #{data[i]}"
+ exit
+ end
+ i += 1
+
+ @gloss = data[i, data.size - i].join(" ").gsub(/\[/, '\[').gsub(/\]/, '\]')
+ @gloss_str = ""
+ end
+ def == (other)
+ @str == other.str
+ end
+ def each_headword
+ @words.each { |w|
+ yield w
+ }
+ end
+ def to_s
+ "Set: #{@words.inspect}, P_CNT: #{@p_cnt}, Pointers: #{@pointers.inspect}, Gloss: #{@gloss}"
+ end
+ def get_pointer_data(headword, other, src_target)
+ if (src_target == "0000")
+ return other.words
+ else
+ src = [src_target[0, 2]].pack('H2')[0]
+ target = [src_target[2, 2]].pack('H2')[0]
+ h_src = words[src - 1]
+ if (h_src == headword)
+ return [other.words[target - 1]]
+ else
+ return ["#{make_link(other.words[target - 1])} [c darkgray](for: #{make_link(words[src - 1])})[/c]"]
+ end
+ end
+ end
+ def get_frame_data(headword, frame)
+ f_num = frame[0].to_i
+ w_num = [frame[1]].pack('H2')[0]
+ if (w_num == 0)
+ return [$frames[f_num]]
+ else
+ if (w_num < 1)
+ $stderr.puts "ERROR: w_num is invalid!"
+ exit
+ end
+ h_src = words[w_num - 1]
+ if (h_src == headword)
+ return [$frames[f_num]]
+ else
+ return ["[*][ex]#{$frames[f_num]}[/ex][/*] [c darkgray](for: #{make_link(h_src)})[/c]"]
+ end
+ end
+ end
+ def sense_key(headword)
+ i = @words.index(headword)
+ if (i.nil?)
+ $stderr.puts "ERROR: can't find index for the headword: #{headword}"
+ exit
+ end
+ res = "#{headword.downcase.gsub(/\s+/, '_')}%#{$POS_NUM[@pos]}:#{@lex_filenum}:#{sprintf('%02d', @lex_ids[i])}"
+ if (@pos != 's')
+ res << "::"
+ else
+ @pointers.each {|ptr|
+ if (ptr[0] == "&") # similar to
+ similars = get_data(ptr[1], ptr[2])
+ res << ":#{similars.words[0]}:#{sprintf('%02d',similars.lex_ids[0])}"
+ end
+ }
+ end
+ res
+ end
+ def freq_count(headword)
+ $SENSE_COUNTS[sense_key(headword)] || 0
+ end
+ def print_out(headword)
+ $headword = headword
+
+ str1 = ""
+ exa = false
+ extra = ""
+ freq = if (freq_count(headword) > 0)
+ " [com][c darkgray]([p]Freq.[/p] #{freq_count(headword)})[/c][/com]"
+ else
+ ""
+ end
+
+ @gloss.split(';').each { |s|
+ s = "#{extra}; #{s}" unless extra.empty?
+ extra = ""
+
+ # detect broken quotations
+ if s.gsub(/[^"]/, '').size % 2 != 0
+ extra = s
+ next
+ end
+
+ if s =~ /^\s*(".*)$/ # example
+ unless freq.empty?
+ str1 << freq
+ freq = ""
+ end
+ example = $1.gsub(/^"(.*)"$/, '\1')
+ str1 << "[/m]\n\t[m3]- [*][ex]#{example}[/ex][/*]"
+ exa = true
+ else
+ if (exa)
+ str1 << "[/m]\n\t[m3]"
+ end
+ s = "[trn]#{s.strip.gsub(/(\(.*?\))/, '[i]\1[/i]')}[/trn]"
+ if (str1.empty?)
+ str1 << s
+ else
+ if (exa)
+ str1 << s
+ else
+ str1 << "; #{s}"
+ end
+ end
+ exa = false
+ end
+ }
+
+ puts "#{str1}#{freq}[/m]"
+
+ print_array(@words, 'Syn', "[c blue]â¢[/c]")
+
+ antonyms = []
+ pertainyms = []
+ derivs = []
+ deriv_rels = []
+ topics = []
+ regions = []
+ usages = []
+ m_topics = []
+ m_regions = []
+ m_usages = []
+ hypers = []
+ inst_hypers = []
+ hypos = []
+ inst_hypos = []
+ m_holos = []
+ s_holos = []
+ p_holos = []
+ m_meros = []
+ s_meros = []
+ p_meros = []
+ attribs = []
+ verb_group = []
+ ents = []
+ alsos = []
+ causes = []
+ similars = []
+ part_verbs = []
+ @pointers.each {|ptr|
+ if (ptr[0] == '!') # antonym
+ antonyms += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+ elsif (ptr[0] == "\\") # pertainym or deriv. from adjective
+ if (@pos == 'r') # adverb
+ derivs += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+ elsif (@pos == 'a' || @pos == 's') # adjective
+ pertainyms += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+ else
+ $stderr.puts "ERROR: unexpected POS for slash: #{@pos}"
+ exit
+ end
+ elsif (ptr[0] == "=") # attributes
+ attribs += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+ elsif (ptr[0] == ";c") # topics domain
+ topics += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+ elsif (ptr[0] == ";r") # regions domain
+ regions += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+ elsif (ptr[0] == ";u") # usage domain
+ usages += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+ elsif (ptr[0] == "-c") # topics domain
+ m_topics += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+ elsif (ptr[0] == "-r") # regions domain
+ m_regions += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+ elsif (ptr[0] == "-u") # usage domain
+ m_usages += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+ elsif (ptr[0] == '$') # verb group
+ verb_group += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+ elsif (ptr[0] == '*') # entailment
+ ents += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+ elsif (ptr[0] == '^') # see also
+ alsos += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+ elsif (ptr[0] == '>') # see also
+ causes += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+ elsif (ptr[0] == '+') # deriv related form
+ deriv_rels += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+ elsif (ptr[0] == "@") # hypernyms
+ hypers += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+ elsif (ptr[0] == "@i") # instance hypernyms
+ inst_hypers += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+ elsif (ptr[0] == "~") # hyponyms
+ hypos += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+ elsif (ptr[0] == "~i") # instance hyponyms
+ inst_hypos += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+ elsif (ptr[0] == "#m") # m holonyms
+ m_holos += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+ elsif (ptr[0] == "#s") # s holonyms
+ s_holos += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+ elsif (ptr[0] == "#p") # p holonyms
+ p_holos += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+ elsif (ptr[0] == "%m") # m meronyms
+ m_meros += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+ elsif (ptr[0] == "%s") # s meronyms
+ s_meros += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+ elsif (ptr[0] == "%p") # p meronyms
+ p_meros += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+ elsif (ptr[0] == "&") # similar to
+ similars += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+ elsif (ptr[0] == "<") # similar to
+ part_verbs += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3])
+ else
+ $stderr.puts "WARN #8: Unknown pointer type #{ptr[0]}"
+ end
+ }
+
+ print_array(antonyms, 'Ant', "[c red]â¢[/c]")
+ print_array(derivs, 'Derived from adjective', "[c deepskyblue]â¢[/c]")
+ print_array(pertainyms, 'Pertains to noun', "[c deepskyblue]â¢[/c]")
+ print_array(similars, 'Similar to', "[c darkturquoise]â¢[/c]")
+ print_array(alsos, 'See Also', "[c darkturquoise]â¢[/c]")
+ print_array(deriv_rels, 'Derivationally related forms', "[c dodgerblue]â¢[/c]")
+
+ print_array(usages, 'Usage Domain', "[c darkorchid]â¢[/c]")
+ print_array(topics, 'Topics', "[c darkorchid]â¢[/c]")
+ print_array(regions, 'Regions', "[c darkorchid]â¢[/c]")
+ print_array(m_usages, 'Members of this Usage Domain')
+ print_array(m_topics, 'Members of this Topic')
+ print_array(m_regions, 'Members of this Region')
+
+ print_array(hypers, 'Hypernyms')
+ print_array(inst_hypers, 'Instance Hypernyms')
+
+ print_array(hypos, 'Hyponyms')
+ print_array(inst_hypos, 'Instance Hyponyms')
+
+ print_array(m_holos, 'Member Holonyms')
+ print_array(s_holos, 'Substance Holonyms')
+ print_array(p_holos, 'Part Holonyms')
+
+ print_array(m_meros, 'Member Meronyms')
+ print_array(s_meros, 'Substance Meronyms')
+ print_array(p_meros, 'Part Meronyms')
+
+ print_array(attribs, 'Attrubites', "[c yellow]â¢[/c]")
+
+ print_array(verb_group, 'Verb Group', "[c maroon]â¢[/c]")
+ print_array(ents, 'Entailment')
+ print_array(causes, 'Cause')
+
+ print_array(part_verbs, "Participle of verb")
+
+ verb_sentences = []
+ unless (@frames.empty?)
+ puts "\t[m3][com][c maroon]â¢[/c] [p]Verb Frames[/p]:[/com][/m]"
+ @frames.each {|frame|
+ verb_sentences += get_frame_data(headword, frame)
+ }
+ end
+
+ if @pos == 'v' # only for verbs
+ key = sense_key(headword)
+ values = $VERB_IDX[key]
+ if (values)
+ values.split(/,/).each { |value|
+ verb_sentences << $VERB_PTRNS[value].gsub(/%s/, headword)
+ }
+ end
+ end
+
+ verb_sentences.each { |sentence|
+ if sentence =~ /\[ex\]/
+ puts "\t[m4]- #{sentence}[/m]"
+ else
+ puts "\t[m4]- [*][ex]#{sentence}[/ex][/*][/m]"
+ end
+ }
+ end
+ def print_array(a, label, prefix = "[c darkgray]â¢[/c]")
+ a -= [$headword]
+ a.uniq!
+ separator = if (a.size > 6)
+ "[/m]\n\t[m4]"
+ else
+ ""
+ end
+ puts "\t[m3][com]#{prefix} [p]#{label}[/p]:#{separator} #{a.collect{|x| make_link(x)}.join(', ')}[/com][/m]" unless a.empty?
+ end
+ def make_link(target)
+ target = target.strip
+ if (target =~ /<<.+>>/)
+ target
+ else
+ # no need to validate links, the format is good, no broken links
+ "<<#{target}>>"
+ end
+ end
+end
+
+count = 0
+
+File.foreach($index_file_noun) { |idx_line|
+ next if idx_line =~ /^\s\s/
+ entry = IdxEntry.new(idx_line)
+ entry.offsets.each { |offset|
+ d_entry = get_data(offset, :n)
+ entry.add_sense(d_entry)
+ }
+ count += 1
+ break if count == 600 && $short
+ progress(count)
+}
+progress($index_file_noun + " was processed");
+
+File.foreach($index_file_verb) { |idx_line|
+ next if idx_line =~ /^\s\s/
+ entry = IdxEntry.new(idx_line)
+ entry.offsets.each { |offset|
+ d_entry = get_data(offset, :v)
+ entry.add_sense(d_entry)
+ }
+ count += 1
+ break if count == 1200 && $short
+ progress(count)
+}
+progress($index_file_verb + " was processed");
+
+File.foreach($index_file_adj) { |idx_line|
+ next if idx_line =~ /^\s\s/
+ entry = IdxEntry.new(idx_line)
+ entry.offsets.each { |offset|
+ d_entry = get_data(offset, :a)
+ entry.add_sense(d_entry)
+ }
+ count += 1
+ break if count == 1800 && $short
+ progress(count)
+}
+progress($index_file_adj + " was processed");
+
+File.foreach($index_file_adv) { |idx_line|
+ next if idx_line =~ /^\s\s/
+ entry = IdxEntry.new(idx_line)
+ entry.offsets.each { |offset|
+ d_entry = get_data(offset, :r)
+ entry.add_sense(d_entry)
+ }
+ count += 1
+ break if count == 2400 && $short
+ progress(count)
+}
+progress($index_file_adj + " was processed");
+
+card_count = 0
+$CARDS.values.sort.each { |card|
+ card.print_out
+ card_count += 1
+ progress(card_count)
+}
+progress("CARDS were processed");
+
+$noun_data.close
+$verb_data.close
+$adj_data.close
+$adv_data.close
+
+$stderr.puts "TOTAL CARDS: #{$CARDS.size}"
More information about the debian-science-commits
mailing list