Ve

A linguistic framework for anyone. No degree required.

Read all about it on kimtaro.github.com/ve.

Build Status

Getting Started

Ve relies on the FreeLing and MeCab language parsers. You must install FreeLing for English or MeCab for Japanese or both.

Installation instructions for FreeLing can be found here.

Installation instruction for MeCab can be found here.

Installing with HomeBrew

If you are using OSX, you can easily install FreeLing and MeCab with HomeBrew.

$ brew install freeling
$ brew install mecab-ipadic

Building the Gem

You can build the Ve gem with the following:

$ gem build ve.gemspec

To install the newly built gem:

$ gem install ve-<version>.gem

Be sure to substitute <version> with the version of the newly built gem, for example ve-0.0.3.gem.

Ruby

require 've'
words = Ve.in(:en).words('I like melons.')
# => [#<Ve::Word:0x8ee00cc @word="I", @lemma="i", @part_of_speech=Ve::PartOfSpeech::Pronoun, @tokens=[{:raw=>"I i PRP 1", :type=>:parsed, :literal=>"I", :lemma=>"i", :pos=>"PRP", :accuracy=>"1", :characters=>0..0}], @extra={:grammar=>:personal}, @info={}>, #<Ve::Word:0x8edff28 @word="like", @lemma="like", @part_of_speech=Ve::PartOfSpeech::Preposition, @tokens=[{:raw=>"like like IN 0.815649", :type=>:parsed, :literal=>"like", :lemma=>"like", :pos=>"IN", :accuracy=>"0.815649", :characters=>2..5}], @extra={:grammar=>nil}, @info={}>, #<Ve::Word:0x8edfe24 @word="melons", @lemma="melon", @part_of_speech=Ve::PartOfSpeech::Noun, @tokens=[{:raw=>"melons melon NNS 1", :type=>:parsed, :literal=>"melons", :lemma=>"melon", :pos=>"NNS", :accuracy=>"1", :characters=>7..12}], @extra={:grammar=>:plural}, @info={}>, #<Ve::Word:0x8edfcbc @word=".", @lemma=".", @part_of_speech=Ve::PartOfSpeech::Symbol, @tokens=[{:raw=>". . Fp 1", :type=>:parsed, :literal=>".", :lemma=>".", :pos=>"Fp", :accuracy=>"1", :characters=>13..13}], @extra={:grammar=>nil}, @info={}>]

words.collect(&:lemma) # => ["i", "like", "melon", "."]
words.collect(&:part_of_speec) # => [Ve::PartOfSpeech::Pronoun, Ve::PartOfSpeech::Preposition, Ve::PartOfSpeech::Noun, Ve::PartOfSpeech::Symbol]

Javascript

<script type="text/javascript" charset="utf-8" src="ve.js"></script>
<script type="text/javascript" charset="utf-8">
  new Ve('ja').words('ビールがおいしかった', function(words) {
    // [{"_class":"Word","word":"ビール","lemma":"ビール","part_of_speech":"noun","tokens":[{"raw":"ビール\t名詞,一般,*,*,*,*,ビール,ビール,ビール","type":"parsed","literal":"ビール","pos":"名詞","pos2":"一般","pos3":"*","pos4":"*","inflection_type":"*","inflection_form":"*","lemma":"ビール","reading":"ビール","hatsuon":"ビール","characters":"0..2"}],"extra":{"reading":"ビール","transcription":"ビール","grammar":null},"info":{"reading_script":"kata","transcription_script":"kata"}},{"_class":"Word","word":"が","lemma":"が","part_of_speech":"postposition","tokens":[{"raw":"が\t助詞,格助詞,一般,*,*,*,が,ガ,ガ","type":"parsed","literal":"が","pos":"助詞","pos2":"格助詞","pos3":"一般","pos4":"*","inflection_type":"*","inflection_form":"*","lemma":"が","reading":"ガ","hatsuon":"ガ","characters":"3..3"}],"extra":{"reading":"ガ","transcription":"ガ","grammar":null},"info":{"reading_script":"kata","transcription_script":"kata"}},{"_class":"Word","word":"おいしい","lemma":"おいしい","part_of_speech":"adjective","tokens":[{"raw":"おいしい\t形容詞,自立,*,*,形容詞・イ段,基本形,おいしい,オイシイ,オイシイ","type":"parsed","literal":"おいしい","pos":"形容詞","pos2":"自立","pos3":"*","pos4":"*","inflection_type":"形容詞・イ段","inflection_form":"基本形","lemma":"おいしい","reading":"オイシイ","hatsuon":"オイシイ","characters":"4..7"}],"extra":{"reading":"オイシイ","transcription":"オイシイ","grammar":null},"info":{"reading_script":"kata","transcription_script":"kata"}}]
    
    for ( i in words ) {
      var word = words[i];
      console.log(word.lemma + "/" + word.part_of_speech)
    }
    
    // ビール/noun
    // が/postposition
    // おいしい/adjective
  });
</script>

Structure

  • Ve::LocalInterface - Main interface that gives access to functionality in providers that exist locally
  • Ve::XInterface - Allows for different ways of accessing Ve providers. Locally, through an HTTP API, binary protocol or whatever
  • Ve::Manager - Keeps track of providers and what they can do
  • Ve::Provider::X - Talks to the underlying parser
  • Ve::Parse::X - Takes the output from the Provider and turns it into functions the end user can use

Todo

  • Expose more through the sinatra server
  • Alias lemma to base, so people don't need to know what lemmas are
  • Break out into separate projects for each component. Ve-ruby, Ve-js.
  • Better UTF-8 handling for Freeling
  • See all the TODO's in the code

License

(c) Kim Ahlström 2011

This is under the MIT license.



Ve

任何人的语言框架。不需要学位。

请阅读 kimtaro.github.com/ve 上的所有信息。

入门指南

Ve依赖于 FreeLing MeCab 语言解析器必须安装FreeLing for English或MeCab for Japanese or both。

FreeLing的安装说明可以在找到这里

可以在此处找到MeCab的安装说明。

使用HomeBrew进行安装

如果您使用OSX,您可以使用 HomeBrew 轻松安装FreeLing和MeCab。

$ brew install freeling
$ brew install mecab-ipadic

构建宝石

您可以使用以下内容构建Ve宝石:

$ gem build ve.gemspec

要安装新建的gem:

$ gem install ve-<version>.gem

请确保使用新建的gem的版本替换&lt; version&gt; ,例如 ve-0.0.3.gem

Ruby

require 've'
words = Ve.in(:en).words('I like melons.')

=> [#<Ve::Word:0x8ee00cc @word="I", @lemma="i", @part_of_speech=Ve::PartOfSpeech::Pronoun, @tokens=[{:raw=>"I i PRP 1", :type=>:parsed, :literal=>"I", :lemma=>"i", :pos=>"PRP", :accuracy=>"1", :characters=>0..0}], @extra={:grammar=>:personal}, @info={}>, #<Ve::Word:0x8edff28 @word="like", @lemma="like", @part_of_speech=Ve::PartOfSpeech::Preposition, @tokens=[{:raw=>"like like IN 0.815649", :type=>:parsed, :literal=>"like", :lemma=>"like", :pos=>"IN", :accuracy=>"0.815649", :characters=>2..5}], @extra={:grammar=>nil}, @info={}>, #<Ve::Word:0x8edfe24 @word="melons", @lemma="melon", @part_of_speech=Ve::PartOfSpeech::Noun, @tokens=[{:raw=>"melons melon NNS 1", :type=>:parsed, :literal=>"melons", :lemma=>"melon", :pos=>"NNS", :accuracy=>"1", :characters=>7..12}], @extra={:grammar=>:plural}, @info={}>, #<Ve::Word:0x8edfcbc @word=".", @lemma=".", @part_of_speech=Ve::PartOfSpeech::Symbol, @tokens=[{:raw=>". . Fp 1", :type=>:parsed, :literal=>".", :lemma=>".", :pos=>"Fp", :accuracy=>"1", :characters=>13..13}], @extra={:grammar=>nil}, @info={}>]

words.collect(&:lemma) # => ["i", "like", "melon", "."] words.collect(&:part_of_speec) # => [Ve::PartOfSpeech::Pronoun, Ve::PartOfSpeech::Preposition, Ve::PartOfSpeech::Noun, Ve::PartOfSpeech::Symbol]

Javascript

<script type="text/javascript" charset="utf-8" src="ve.js"></script>
<script type="text/javascript" charset="utf-8">
  new Ve('ja').words('ビールがおいしかった', function(words) {
    // [{"_class":"Word","word":"ビール","lemma":"ビール","part_of_speech":"noun","tokens":[{"raw":"ビール\t名詞,一般,,,,,ビール,ビール,ビール","type":"parsed","literal":"ビール","pos":"名詞","pos2":"一般","pos3":"","pos4":"","inflection_type":"","inflection_form":"","lemma":"ビール","reading":"ビール","hatsuon":"ビール","characters":"0..2"}],"extra":{"reading":"ビール","transcription":"ビール","grammar":null},"info":{"reading_script":"kata","transcription_script":"kata"}},{"_class":"Word","word":"が","lemma":"が","part_of_speech":"postposition","tokens":[{"raw":"が\t助詞,格助詞,一般,,,,が,ガ,ガ","type":"parsed","literal":"が","pos":"助詞","pos2":"格助詞","pos3":"一般","pos4":"","inflection_type":"","inflection_form":"","lemma":"が","reading":"ガ","hatsuon":"ガ","characters":"3..3"}],"extra":{"reading":"ガ","transcription":"ガ","grammar":null},"info":{"reading_script":"kata","transcription_script":"kata"}},{"_class":"Word","word":"おいしい","lemma":"おいしい","part_of_speech":"adjective","tokens":[{"raw":"おいしい\t形容詞,自立,,,形容詞・イ段,基本形,おいしい,オイシイ,オイシイ","type":"parsed","literal":"おいしい","pos":"形容詞","pos2":"自立","pos3":"","pos4":"","inflection_type":"形容詞・イ段","inflection_form":"基本形","lemma":"おいしい","reading":"オイシイ","hatsuon":"オイシイ","characters":"4..7"}],"extra":{"reading":"オイシイ","transcription":"オイシイ","grammar":null},"info":{"reading_script":"kata","transcription_script":"kata"}}]

for ( i in words ) {
  var word = words[i];
  console.log(word.lemma + &#34;/&#34; + word.part_of_speech)
}

// ビール/noun
// が/postposition
// おいしい/adjective

}); </script>

结构

  • Ve :: LocalInterface - 提供对本地存在的提供商功能的主界面
  • Ve :: XInterface - 允许访问Ve提供程序的不同方法。在本地,通过HTTP API,二进制协议或任何
  • Ve ::经理 - 跟踪供应商及他们可以做什么
  • Ve :: Provider :: X - 与底层解析器进行交谈
  • Ve :: Parse :: X - 从提供商获取输出,并将其转换为最终用户可以使用的功能

Todo

  • 透过sinatra伺服器曝光更多
  • 别名引种到底,所以人们不需要知道什么是引文
  • 分解为每个组件的单独项目。 Ve-ruby,Ve-js。
  • 更好的自由的UTF-8处理
  • 在代码
  • 中查看所有TODO

许可证

(c)KimAhlström2011年

这是麻省理工学院的许可证。




相关问题推荐