[英]How can I perform bag of words model for each column in the data frame according to the Year-Week?
Year-Week Job_list Comments_2
2019-01 Doc-1 Doc-1
2019-01 Doc-2 Doc-2
2019-01 Doc-3 Doc-3
2019-02 Doc-4 Doc-4
2019-03 Doc-5 Doc-5
I want the output separately for each column according to year-week after applying bag of words model在应用词袋模型后,我希望根据年-周分别为每一列输出
You can use .groupby
to group the output according to year-week.您可以使用.groupby
根据年-周对输出进行分组。
After that you can use .apply(list)
to transform the grouped output into a list.之后,您可以使用.apply(list)
将分组输出转换为列表。
df = pd.DataFrame([['2019-01', 'Doc-1', 'Doc-1'], ['2019-01', 'Doc-2', 'Doc-2'],['2019-01','Doc-3','Doc-3'],['2019-02','Doc-4','Doc-4'],['2019-03','Doc-5','Doc-5']],columns= ['Year-Week', 'Job_list', 'Comments_2'])
#Use groupby and then .apply(list)
job_list_grouped = df.groupby('Year-Week')['Job_list'].apply(list)
print(job_list_grouped)
The output looks like that:输出如下所示:
Year-Week
2019-01 [Doc-1, Doc-2, Doc-3]
2019-02 [Doc-4]
2019-03 [Doc-5]
You could simply do the same for the other column.您可以简单地对另一列执行相同的操作。 And from there you can use it to transform it to anything you need.从那里您可以使用它来将其转换为您需要的任何内容。
EDIT:编辑:
You can use the module Counter
from the library collections
for that!为此,您可以使用图书馆collections
Counter
模块!
Here is my new code:这是我的新代码:
import pandas as pd
from collections import Counter
df = pd.DataFrame([['2019-01', 'Doc-1', 'Doc-1'], ['2019-01', 'Doc-2', 'Doc-2'],['2019-01','Doc-3','Doc-3'],['2019-02','Doc-4','Doc-4'],['2019-03','Doc-5','Doc-5']],columns= ['Year-Week', 'Job_list', 'Comments_2'])
job_list_grouped = df.groupby('Year-Week')['Job_list'].apply(list).apply(Counter)
print(job_list_grouped)
print(job_list_grouped.to_dict())
Note how I only added another apply
to the end of the groupby
function.请注意我如何只在groupby
函数的末尾添加了另一个apply
。
The first print puts out:第一个打印出来:
Year-Week
2019-01 {'Doc-1': 1, 'Doc-2': 1, 'Doc-3': 1}
2019-02 {'Doc-4': 1}
2019-03 {'Doc-5': 1}
If you need to have this in a dictionary format you can simply add to_dict()
to do this:如果您需要以字典格式使用它,您可以简单地添加to_dict()
来执行此操作:
{'2019-01': Counter({'Doc-1': 1, 'Doc-2': 1, 'Doc-3': 1}), '2019-02': Counter({'Doc-4': 1}), '2019-03': Counter({'Doc-5': 1})}
Don´t worry about the Counter
around your dictionaries.不要担心字典周围的Counter
。 It still behaves exactly like a dict
.它的行为仍然与dict
完全一样。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.