簡體 English 中英

對兩個結果詞pyspark

[英]Pairs of two consequent words pyspark

原文 2018-05-20 02:44:52 8 1 python/ apache-spark/ pyspark

我正在研究語言模型，並希望計算兩個后續單詞的數字對。 我在scala slicing功能上找到了此類問題的示例。 雖然我沒有設法在pyspark找到類比

data.splicing(2).map(lambda (x,y): ((x,y),1).redcueByKey(lambda x,y: x+y)

我想應該是這樣的。 解決方法可能是一個創建函數，該函數可以找到數組中的下一個單詞，但是我想應該有一個內置的解決方案。

1 個解決方案

也許這會有所幫助。 您可以在此處找到其他拆分方法：是否可以通過Python中的第n個分隔符來拆分字符串？

from itertools import izip

text = "I'm working on language model and want to count the number pairs of two consequent words.\
        I found an examples of such problem on language model and want to count the number pairs"

i = iter(text.split())

rdd = sc.parallelize([" ".join(x) for x in izip(i,i)])

print rdd.map(lambda x: (x, 1)).reduceByKey(lambda x, y: x + y).collect()

[（'found an'，1），（'count the'，2），（'want to'，2），（'examples of'，1），（'model and'，2），（'on language '，2），（'數字對'，2），（“我正在工作”，1），（'常用詞.I'，1），（'此類問題'，1），（'兩個' ，1）]

pyspark計數組內兩列中的對的非空值

[英]pyspark count not null values for pairs in two column within group

Pyspark：如何過濾兩個列值對的列表？

[英]Pyspark: How to filter on list of two column value pairs?

從 PySpark 數據幀中的結構中獲取前兩個單詞

[英]Get the first two words from a struct in PySpark data frame

pySpark 是否有可能在兩個單獨的單詞中搜索字符串？

[英]Is there a possibility in pySpark to search a string within two separate words?

如何對 pandas 中的兩個后續列求和並檢索一個作為結果？

[英]How to sum two consequent columns in pandas and retrieve one as result?

燒瓶保存請求文件無法處理兩個后續圖像

[英]Flask save request files cannot handle two consequent images

使用 beautifulsoup 從兩個后續跨度標簽中提取文本

[英]Text extract from two consequent span tags with beautifulsoup

Pyspark：最常用的詞

[英]Pyspark: the most frequent words

如何使用后續分組數據行的值來使用pyspark來決定當前行的值

[英]how to use values of consequent rows of grouped data to decide value of current row using pyspark

創建由單詞對組成的元組

[英]Create tuples consisting of pairs of words

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 pyspark計數組內兩列中的對的非空值 Pyspark：如何過濾兩個列值對的列表？從 PySpark 數據幀中的結構中獲取前兩個單詞 pySpark 是否有可能在兩個單獨的單詞中搜索字符串？如何對 pandas 中的兩個后續列求和並檢索一個作為結果？燒瓶保存請求文件無法處理兩個后續圖像使用 beautifulsoup 從兩個后續跨度標簽中提取文本 Pyspark：最常用的詞如何使用后續分組數據行的值來使用pyspark來決定當前行的值創建由單詞對組成的元組

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM