如何在scala中生成n元語法？

Question

我正在嘗試在Scala中編寫基於n-gram的解壓新聞算法。 如何為大型文件生成n元語法：例如，對於包含“蜜蜂是蜜蜂的蜜蜂”的文件。

首先，它必須選擇一個隨機的n-gram。 例如，蜜蜂。
然后，它必須尋找以（n-1）個單詞開頭的n-gram。 例如，蜜蜂。
它打印出該n-gram的最后一個單詞。 然后重復。

你能給我一些提示怎么做嗎？ 抱歉給你帶來不便。

Answer 1

您的問題可能會更具體一些，但這是我的嘗試。

val words = "the bee is the bee of the bees"
words.split(' ').sliding(2).foreach( p => println(p.mkString))

Answer 2

您可以使用參數n嘗試

val words = "the bee is the bee of the bees"
val w = words.split(" ")

val n = 4
val ngrams = (for( i <- 1 to n) yield w.sliding(i).map(p => p.toList)).flatMap(x => x)
ngrams foreach println

List(the)
List(bee)
List(is)
List(the)
List(bee)
List(of)
List(the)
List(bees)
List(the, bee)
List(bee, is)
List(is, the)
List(the, bee)
List(bee, of)
List(of, the)
List(the, bees)
List(the, bee, is)
List(bee, is, the)
List(is, the, bee)
List(the, bee, of)
List(bee, of, the)
List(of, the, bees)
List(the, bee, is, the)
List(bee, is, the, bee)
List(is, the, bee, of)
List(the, bee, of, the)
List(bee, of, the, bees)

Answer 3

這是一種基於流的方法。 計算n-gram時不需要太多內存。

object ngramstream extends App {

  def process(st: Stream[Array[String]])(f: Array[String] => Unit): Stream[Array[String]] = st match {
    case x #:: xs => {
      f(x)
      process(xs)(f)
    }
    case _ => Stream[Array[String]]()
  }

  def ngrams(n: Int, words: Array[String]) = {
    // exclude 1-grams
    (2 to n).map { i => words.sliding(i).toStream }
      .foldLeft(Stream[Array[String]]()) {
        (a, b) => a #::: b
      }
  }

  val words = "the bee is the bee of the bees"
  val n = 4
  val ngrams2 = ngrams(n, words.split(" "))

  process(ngrams2) { x =>
    println(x.toList)
  }

}

輸出：

List(the, bee)
List(bee, is)
List(is, the)
List(the, bee)
List(bee, of)
List(of, the)
List(the, bees)
List(the, bee, is)
List(bee, is, the)
List(is, the, bee)
List(the, bee, of)
List(bee, of, the)
List(of, the, bees)
List(the, bee, is, the)
List(bee, is, the, bee)
List(is, the, bee, of)
List(the, bee, of, the)
List(bee, of, the, bees)

如何在scala中生成n元語法？

問題描述

3 個解決方案

解決方案1
13 2011-11-24 15:08:46

解決方案2
4 2013-05-24 09:58:58

解決方案3
3 2013-12-17 12:48:58

如何在scala中生成n元語法？

問題描述

3 個解決方案

解決方案1 13 2011-11-24 15:08:46

解決方案2 4 2013-05-24 09:58:58

解決方案3 3 2013-12-17 12:48:58

解決方案1
13 2011-11-24 15:08:46

解決方案2
4 2013-05-24 09:58:58

解決方案3
3 2013-12-17 12:48:58