「音声認識用 N-gram 言語モデルの単位の最適化」見直し

segmentation lm

最小単位をなににするか？・文字・形態素・？形態素でないにしても、読みが決定されている必要がある。音声認識、同字語の曖昧性解消の点からも、読みは有用。音響コンフュサビリティを考慮した単位の必要性：１モーラしかない単語では、音響モデルで…

2006-02-24

「音声認識用 N-gram 言語モデルの単位の最適化」批判

segmentation lm

そもそも、N-gram でなく、適応的に文脈長を選ぶべきではないか（PPM*言語モデル？）形態素／活用の単位の2-gram文脈の情報量は、それなりに一様（1-gram文脈では明らかに不足しているが） 3-gram と 2-gram はけっこう違う。

2006-02-24

変分ベイズ学習理論入門

stat learning net

http://watanabe-www.pi.titech.ac.jp/~swatanab/var-bay.pdf

2006-02-24

Clay Mathematics Institute

Workshop on Algebraic Statistics and Computational Biology [bio][stat][net]: http://www.claymath.org/programs/cmiworkshops/ascb/ 代数幾何と生物の Sequence Analysis ？

2006-02-24

大語彙連続音声認識のための言語的音響的属性に基づく単語単位の最適化

lm net

http://www.furui.cs.titech.ac.jp/publication/2003/asj2003s_135.pdf

2006-02-23

AITO's HomePage

lm net

http://homepage2.nifty.com/aito/ Palmkit の作者であり、 w3m の作者でもある。大変御世話になっています。

2006-02-23

C++ プログラミング

cxx net

http://winnie.kuis.kyoto-u.ac.jp/~yoshii/cpp.html 後藤さんの共同研究者、吉井さんによる。

2006-02-22

Plone - A user-friendly and powerful open souce Content Management System

net

http://plone.org/ CMS とか Zope とか。

2006-02-22

Kenshi Muto

debian linux people net

http://kmuto.jp/ 多数のプログラミング本の著者。

2006-02-22

Julius 用の辞書と言語モデルの作成法

lm net

http://www.symbio.jst.go.jp/~nakadai/linguistic/ 辞書は形態素解析結果からスクリプトで生成。言語モデルはpalmkitで、2-gramと逆向き3-gramを生成。

2006-02-22

かな・漢字文字列を単位とした言語モデルの検討

lm net

http://lbc21.jp/temp/hokoku14/psn/k/kato_m.htm 金野弘明・加藤正治・小坂哲夫・好田正紀・伊藤彰則: 「かな・漢字文字列を単位とした言語モデルの検討」『電子情報通信学会技術研究報告』 SP2002-148, 1-6 (2002-12) 形態素でない言語モデルの単位の研究…

2006-02-21

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval - Zaragoza et al. (2003)

stat lm net

http://citeseer.ist.psu.edu/zaragoza03bayesian.html ベイズ推定を用いる1-gram（トピック）言語モデル ... one of the best smoothing techniques used today in LMs is Bayes-Smoothing or Dirichlet Smoothing [2006-02-21-4].

2006-02-21

Two-Stage Language Models for Information Retrieval (2002)

stat lm net

http://citeseer.ist.psu.edu/717851.html

2006-02-21

Bayesian smoothing and Information Geometry (2002)

stat math geom net

http://www.esat.kuleuven.ac.be/sista/natoasi/kulhavy.pdf

2006-02-21

A Hierarchical Dirichlet Language Model - David MacKay et al. (1994)

stat lm net

http://citeseer.ist.psu.edu/mackay94hierarchical.html 2-gram, 1-gram の線形補間式スムージングの根拠づけ。 Any rational predictive procnnedure can be made Bayesian. ..., the aim of this pare is to discover what implicit probabilistic model …

2006-02-20

Class-based 言語モデルと大域／トピック言語モデル

lm

どちらもクラスタリングの一種。前者の名前で行われているのは、基本的には文法的クラスの形成、「品詞」クラスタリング。単語の連接可能性が基準であり、近接言語モデルの範疇。後者の名前で行われているのは、意味的クラスの形成。ハードクラスタリン…

2006-02-20

A Bayesian Approach to DNA Sequence Segmentation

bio segmentation stat

http://www.mas.ncl.ac.uk/~njnsm/seminars/seminars0405/abstracts/boys.pdf

2006-02-20

Bayesian Bioinformatics

bio stat net

http://www.wadsworth.org/resnres/bioinfo/tut1/

2006-02-19

List of publications in computer science - Wikipedia, the free encyclopedia

math net

http://en.wikipedia.org/wiki/list_of_important_publications_in_computer_science

2006-02-19

Stephen A. Cook -- Home page

math net

http://www.cs.toronto.edu/~sacook/ 計算量理論の。

2006-02-17

Learning a Syntagmatic and Paradigmatic Structure from Language (1998)

net

http://www.cs.mu.oz.au/acl/p/p98/p98-1047.pdf multigram ベースの 2-gram モデル。 phrase ベースとの違いは、結合された単位の中の内部構造を保存し、あとで見ていること。class モデルも同時に推定。論文タイトルは、「統語構造と語彙構造の学習」 [20…

2006-02-17

英数字アルファベット化

perl

#! /usr/bin/env perl use strict; use warnings; my @a=qw(ZERO ONE TWO THREE FOUR FIVE SIX SEVEN EIGHT NINE); my $s = ''; while ( <> ) { chomp; s|[0-9]+|join ' ', @a[split //, $&]|exg; tr/a-z/A-Z/; s/[^A-Z ]//g; $s .= "$_ "; } $s =~ s/\n/ /g…

2006-02-17

各行が X 文字以内でもっとも多くの単語を含むように改行を挿入

perl

#! /usr/bin/env perl # convert a long line to fixed-length lines by adding line breaks use strict; use warnings; use Getopt::Long; my $line_length = 60; GetOptions('line-length=i' => \$line_length); my @to_print = (); my $to_print_length =…

2006-02-17

シーザー暗号

perl

perl -pe'tr/ A-Z/H-Z A-G/; $_' perl -pe'tr/ A-Z/P-Z A-O/; $_' perl -pe'tr/ A-Z/D-Z A-C/; $_'

2006-02-17

ランダムに行抽出

perl

perl -e'@a = <>; srand; print @a[map int rand()*scalar(@a), (1 .. 100)]'

2006-02-17

unsupervised な大域言語モデルと近接言語モデルの融合

lm

大域言語モデル：どのような語彙が使われているか。ユニグラム。近接言語モデル：どのような単語の連続が許されるか。バイグラム以上。言語モデルの適応は、厳密には、トピックに対してなされるべき。トピックは、入力データを知る前には分からない。音声…

2006-02-17

A reading list on Bayesian methods

stat net

http://cog.brown.edu/~gruffydd/bayes.html

2006-02-17

Latent Dirichlet Allocation

stat nlp net

http://citeseer.ist.psu.edu/blei03latent.html ベイズ推定によるテキストモデル（長距離言語モデル）

2006-02-17

Numerical Prefixes

lx net

http://phrontistery.info/numbers.html Greek prefixes: mono- di- tri- tetra- penta- hex- hept- oct- ennea- dec- deca- Latin prefixes uni- bi- duo- tri- quadri- quart- quinque- quint- sex- sept- oct- nonus- novem- dec- deca-

2006-02-16

Computational Complexity by Sanjeev Arora

book math net

http://www.cs.princeton.edu/~arora/book/book.html

mtbrの日記

2006-02-01から1ヶ月間の記事一覧

「音声認識用 N-gram 言語モデルの単位の最適化」見直し

「音声認識用 N-gram 言語モデルの単位の最適化」批判

変分ベイズ学習理論入門

Clay Mathematics Institute

大語彙連続音声認識のための言語的音響的属性に基づく単語単位の最適化

AITO's HomePage

C++ プログラミング

Plone - A user-friendly and powerful open souce Content Management System

Kenshi Muto

Julius 用の辞書と言語モデルの作成法

かな・漢字文字列を単位とした言語モデルの検討

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval - Zaragoza et al. (2003)

Two-Stage Language Models for Information Retrieval (2002)

Bayesian smoothing and Information Geometry (2002)

A Hierarchical Dirichlet Language Model - David MacKay et al. (1994)

Class-based 言語モデルと大域／トピック言語モデル

A Bayesian Approach to DNA Sequence Segmentation

Bayesian Bioinformatics

List of publications in computer science - Wikipedia, the free encyclopedia

Stephen A. Cook -- Home page

Learning a Syntagmatic and Paradigmatic Structure from Language (1998)

英数字アルファベット化

各行が X 文字以内でもっとも多くの単語を含むように改行を挿入

シーザー暗号

ランダムに行抽出

unsupervised な大域言語モデルと近接言語モデルの融合

A reading list on Bayesian methods

Latent Dirichlet Allocation

Numerical Prefixes

Computational Complexity by Sanjeev Arora