[英]In python, how do you convert a CSV file of two columns into bigrams?
I'd like to turn the given csv file into bigrams:我想将给定的 csv 文件转换为双字母组:
demo.csv:演示.csv:
words class
hi my name is Jeff. brown
Wow, I am awesome. red
I am a professional. red
Will you marry me? red
How are you today? brown
Today, I woke up with a smile on my face. red
My day today has been amazing. brown
First, make sure to read your data and select the word-column.首先,确保读取您的数据和 select 字列。 You can use
pandas.read_csv
for that.您可以为此使用
pandas.read_csv
。 Since I dont have your.csv-file, I have recreated the data like that:因为我没有你的 .csv 文件,所以我重新创建了这样的数据:
import pandas as pd
df = pd.DataFrame(
["hi my name is Jeff.",
"Wow, I am awesome.",
"I am a professional.",
"Will you marry me?",
"How are you today?",
"Today, I woke up with a smile on my face.",
"My day today has been amazing."], columns=['words'])
which looks like this:看起来像这样:
words
0 hi my name is Jeff.
1 Wow, I am awesome.
2 I am a professional.
3 Will you marry me?
4 How are you today?
5 Today, I woke up with a smile on my face.
6 My day today has been amazing.
A library you can use to create bigrams is nltk
.可用于创建二元语法的库是
nltk
。 In this example I create a function that returns the bigrams as a list.在此示例中,我创建了一个 function,它以列表的形式返回双字母组。
import nltk
def bigrams(words):
return list(nltk.bigrams(nltk.word_tokenize(words)))
And then apply this function to my DataFrame and assign the result to a new column called bigrams like this:然后将这个 function 应用到我的 DataFrame 并将结果分配给一个名为 bigrams 的新列,如下所示:
df["bigrams"] = df.words.apply(bigrams)
the new column now looks like this:新列现在看起来像这样:
0 [(hi, my), (my, name), (name, is), (is, Jeff),...
1 [(Wow, ,), (,, I), (I, am), (am, awesome), (aw...
2 [(I, am), (am, a), (a, professional), (profess...
3 [(Will, you), (you, marry), (marry, me), (me, ?)]
4 [(How, are), (are, you), (you, today), (today,...
5 [(Today, ,), (,, I), (I, woke), (woke, up), (u...
6 [(My, day), (day, today), (today, has), (has, ...
I hope this helps, feel free to ask any question or tell me if you want to change something:)我希望这会有所帮助,如果您想更改某些内容,请随时提出任何问题或告诉我 :)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.