简体   繁体   English

如何根据年-周为数据框中的每一列执行词袋模型?

[英]How can I perform bag of words model for each column in the data frame according to the Year-Week?

 Year-Week  Job_list   Comments_2
    2019-01    Doc-1      Doc-1
    2019-01    Doc-2      Doc-2
    2019-01    Doc-3      Doc-3
    2019-02    Doc-4      Doc-4
    2019-03    Doc-5      Doc-5

I want the output separately for each column according to year-week after applying bag of words model在应用词袋模型后,我希望根据年-周分别为每一列输出

You can use .groupby to group the output according to year-week.您可以使用.groupby根据年-周对输出进行分组。

After that you can use .apply(list) to transform the grouped output into a list.之后,您可以使用.apply(list)将分组输出转换为列表。

df = pd.DataFrame([['2019-01', 'Doc-1', 'Doc-1'], ['2019-01', 'Doc-2', 'Doc-2'],['2019-01','Doc-3','Doc-3'],['2019-02','Doc-4','Doc-4'],['2019-03','Doc-5','Doc-5']],columns= ['Year-Week', 'Job_list', 'Comments_2'])

#Use groupby and then .apply(list)
job_list_grouped = df.groupby('Year-Week')['Job_list'].apply(list)
print(job_list_grouped)

The output looks like that:输出如下所示:

Year-Week
2019-01    [Doc-1, Doc-2, Doc-3]
2019-02                  [Doc-4]
2019-03                  [Doc-5]

You could simply do the same for the other column.您可以简单地对另一列执行相同的操作。 And from there you can use it to transform it to anything you need.从那里您可以使用它来将其转换为您需要的任何内容。

EDIT:编辑:

You can use the module Counter from the library collections for that!为此,您可以使用图书馆collections Counter模块!

Here is my new code:这是我的新代码:

import pandas as pd
from collections import Counter

df = pd.DataFrame([['2019-01', 'Doc-1', 'Doc-1'], ['2019-01', 'Doc-2', 'Doc-2'],['2019-01','Doc-3','Doc-3'],['2019-02','Doc-4','Doc-4'],['2019-03','Doc-5','Doc-5']],columns= ['Year-Week', 'Job_list', 'Comments_2'])

job_list_grouped = df.groupby('Year-Week')['Job_list'].apply(list).apply(Counter)
print(job_list_grouped)
print(job_list_grouped.to_dict())

Note how I only added another apply to the end of the groupby function.请注意我如何只在groupby函数的末尾添加了另一个apply

The first print puts out:第一个打印出来:

Year-Week
2019-01    {'Doc-1': 1, 'Doc-2': 1, 'Doc-3': 1}
2019-02                            {'Doc-4': 1}
2019-03                            {'Doc-5': 1}

If you need to have this in a dictionary format you can simply add to_dict() to do this:如果您需要以字典格式使用它,您可以简单地添加to_dict()来执行此操作:

{'2019-01': Counter({'Doc-1': 1, 'Doc-2': 1, 'Doc-3': 1}), '2019-02': Counter({'Doc-4': 1}), '2019-03': Counter({'Doc-5': 1})}

Don´t worry about the Counter around your dictionaries.不要担心字典周围的Counter It still behaves exactly like a dict .它的行为仍然与dict完全一样。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM