プログラミング言語のライブラリと言語機構をめぐる考察

programming

短いコードで多くの処理を実行させることができる高級言語の各種機能（いわゆる「強力な」機能）は、どう使うべきなのか？Java の場合、ライブラリとは Java API( java.lang, java.util, java.awt, ... ) 機構とは、継承、アクセス制御、リフレクション、…

2006-02-14

混合ディレクレ多項分布を用いた大域的言語モデルの構築

stat lm net

http://www.milab.is.tsukuba.ac.jp/~sadamitsu/archive/ N-gram でのスムージングと同じような効果を、より体系的な方法で得られるというのが興味深い。スムージングは、データが少ないときに信頼性が低いとして、最尤推定をやめるもの。ベイズ推定では、…

2006-02-14

An Introduction to Variational Methods for Graphical Methods (1998)

stat learning net

http://citeseer.ist.psu.edu/jordan98introduction.html

2006-02-14

ECMAScript - on Surface of the Depth -

javascript net

http://www.kmonos.net/alang/etc/ecmascript.php

2006-02-14

北海道大学学術成果コレクション

net

http://eprints.lib.hokudai.ac.jp/index.ja.jsp

2006-02-13

Technological singularity - Wikipedia, the free encyclopedia

neta net

http://en.wikipedia.org/wiki/technological_singularity コンピュータの進化が加速度的に続いている → 人間の知性を越える臨界点が存在 → 人間を越える知性は、さらに優れた知性を設計するGood-Turing 法の Good も1965年に同じようなことをいっていたとか…

2006-02-13

Vesa Siivola

lm people net

http://www.cis.hut.fi/vsiivola/ Growing an n-gram modelなど。

2006-02-13

Stanley F. Chen's Columbia Home Page

lm net

http://www.ee.columbia.edu/~stanchen/ Smoothing 手法の概説と比較のレポートと Modified Kneser-Neyなど。Absolute Discounting P(w_i|w_{i-n+1}^{i-1}) = \cfraq{ \delta + c(w_{i-n+1}^{i-1}) }{ \delta + c(w_{i-n+1}^{i-2}) } = \cfraq{ \delta + c(w…

2006-02-13

PS Trimming

postscript howto linux

まわりを 1/10 消して、10/9 倍に拡大。 alias pstrim='pstops "1:0@1.111(-10.5mm,-14.65mm)"' 1/20 消して、20/19 倍 alias pstrim='pstops "1:0@1.0526(-5.25mm,-7.325mm)"'

2006-02-13

Yee Whye Teh

learning lm net

http://www.cs.utoronto.ca/~ywteh/ A Bayesian Interpretation of Interpolated Kneser-Ney.など。 Nonparametric Graphical Model とか。（よくわからない）

2006-02-13

Reinhard Kneser and Hermann Ney, Improved Clustering Techniques for Class-based Statistical Language Modelling

lm

Leaving-one-out Perplexity の閉じた式の導出を含む。ここでは、クラスベース言語モデルのための単語クラスタリングをやるのが目的。あるクラスタリングがどれくらいよいかを評価するために、クラスベース言語モデルでの Leaving-one-out Perplexity を使…

2006-02-13

natural language processing research blog

Bayesian Methods for NLP (summary) [nlp][stat][net]: http://nlpers.blogspot.com/2005/12/bayesian-methods-for-nlp-summary.html Bayesian と NLPer の会合。論文なども少しだけど置かれている。

2006-02-13

Hal Daum'e III - about me

nlp people stat programming net

http://www.isi.edu/~hdaume/ Yet Another Haskell Tutorial Why Not C? Support Vector Machines for NLPBayes for NLPのチュートリアル、テキストも作成中とか。

2006-02-13

Automating Knowledge Acquisition for Machine Translation

mt net

http://www.isi.edu/natural-language/mt/aimag97.ps 前半の概論が分かりやすいとのこと。

2006-02-13

Particle Filter による文脈の動的ベイズ推定(2005)

lm segmentation net

http://chasen.org/~daiti-m/paper/nl165pf.pdf スライドも。トピック言語モデル、長距離言語モデルの最先端。

2006-02-13

統計学の授業やります 2005.7 2006.1

stat net

http://hosho.ees.hokudai.ac.jp/~kubo/stat/2005/index.html

2006-02-13

R Tips

stat net

http://cse.naro.affrc.go.jp/takezawa/r-tips/r.html

2006-02-13

統計学自習ノート

stat net

http://aoki2.si.gunma-u.ac.jp/lecture/

2006-02-10

やまざき@BinaryTechnology

cxx programming net

http://www.01-tec.com/

2006-02-10

Towards Better Language Models For Spontaneous Speech (1994)

lm net

http://citeseer.ist.psu.edu/suhm94toward.html

2006-02-10

An Automatic Method For Learning A Japanese Lexicon For Recognition Of Spontaneous Speech (1998)

lm net

http://citeseer.ist.psu.edu/tomokiyo98automatic.html 結局、日本語でも Class-Phrase なのか、Phrase だけなのか？モーラベースなので、Class を作る必要がないから、たぶん Phrase だけだと思う。

2006-02-10

Bi-directional Conversion Between Graphemes and Phonemes Using a Joint N-gram Model (2001)

lm net

http://citeseer.ist.psu.edu/galescu01bidirectional.html 文字と音素の変換法。たぶん音素的文字じゃないと使えない。

2006-02-10

Leaving-one-out Perplexity

segmentation lm

Ries さんの方法の全体像：まず、単語をクラスタリングする。以降、単語の出現の代わりに、クラスの出現を数える。全2-gramを列挙し、それを連結したときの全体の対数尤度の変化をテーブルに保持する。対数尤度は、2-gramの出現頻度のみで決まる、場合わけ…

2006-02-10

ボトムアップなクラスタリングとチャンキングの交互繰り返し

lm segmentation

「単語 Unit, Phrase」の獲得と「同類語 Class」の獲得をいっしょにやってしまおうというアイディア。単語とクラスは最初、どちらも文字（形態素でもよい）単語の獲得のために、尤度変化を最大にする連接をくっつけることを繰り返す。単語そのものの連接で…

2006-02-09

Deriving Phrase-based Language Models (1997)

lm net

http://citeseer.ist.psu.edu/270515.html

2006-02-09

Clustering と Chunking

lm segmentation

Ries さんの一連の Class-Phrase 言語モデルに関係する研究では、・比較的小さな（それでもドイツ語で数万語レベル）の特定ドメインのコーパスで言語モデルをつくりたい →小さなサンプルから母集団を推定するために、積極的なパターン抽出（クラスタリング）…

2006-02-08

Jianfeng Gao, The Use of Clustering Techniques for Asian Language Modeling ()

lm net

http://citeseer.ist.psu.edu/471362.html クラスター（クラス）ベースの言語モデルについての概論を含む。・基本的に、手法はクラスタリング・「出現頻度が同じ」という簡易なクラスタリングもあるPredictive Clustering P(w_i|w_{i-2} w_{i-1}) = P(c_i|w…

2006-02-08

On the Estimation of 'Small' Probabilities by Leaving-One-Out

lm nlp net

http://doi.ieeecomputersociety.org/10.1109/34.476512 [2006-01-24]の論文。前半は、一般性の高い確率論で、事象の出現数同値類と Leaving-one-out による確率平滑化について。後半は、言語モデリングにおけるCount equivalence class 出現数同値類「頻…

2006-02-08

Proceedings of the 9th Conference on Computational Natural Language Learning (2005)

segmentation net

http://citeseer.ist.psu.edu/734816.html Morphological Segmentation の新しい論文。単語変形規則の探索か？

2006-02-08

Class-Based n-gram Models of Natural Language (1992)

nlp lm net

http://citeseer.ist.psu.edu/577345.html n-gram の n を大きくしたとき、データ量もそれに応じて増やさない場合、精度向上につながらないことがある、と明言している。

mtbrの日記

2006-02-01から1ヶ月間の記事一覧

プログラミング言語のライブラリと言語機構をめぐる考察

混合ディレクレ多項分布を用いた大域的言語モデルの構築

An Introduction to Variational Methods for Graphical Methods (1998)

ECMAScript - on Surface of the Depth -

北海道大学学術成果コレクション

Technological singularity - Wikipedia, the free encyclopedia

Vesa Siivola

Stanley F. Chen's Columbia Home Page

PS Trimming

Yee Whye Teh

Reinhard Kneser and Hermann Ney, Improved Clustering Techniques for Class-based Statistical Language Modelling

natural language processing research blog

Hal Daum'e III - about me

Automating Knowledge Acquisition for Machine Translation

Particle Filter による文脈の動的ベイズ推定(2005)

統計学の授業やります 2005.7 2006.1

R Tips

統計学自習ノート

やまざき@BinaryTechnology

Towards Better Language Models For Spontaneous Speech (1994)

An Automatic Method For Learning A Japanese Lexicon For Recognition Of Spontaneous Speech (1998)

Bi-directional Conversion Between Graphemes and Phonemes Using a Joint N-gram Model (2001)

Leaving-one-out Perplexity

ボトムアップなクラスタリングとチャンキングの交互繰り返し

Deriving Phrase-based Language Models (1997)

Clustering と Chunking

Jianfeng Gao, The Use of Clustering Techniques for Asian Language Modeling ()

On the Estimation of 'Small' Probabilities by Leaving-One-Out

Proceedings of the 9th Conference on Computational Natural Language Learning (2005)

Class-Based n-gram Models of Natural Language (1992)