简体   繁体   English

熊猫通过多列分组,计数和重新采样

[英]Pandas groupby multiple columns, count, and resample

Having the following dataframe: 具有以下数据框:

                     UserID TweetLanguage
2014-08-25 21:00:00  001        english
2014-08-27 21:04:00  001        arabic
2014-08-29 22:07:00  001        espanish
2014-08-25 22:09:00  002        english
2014-08-26 22:09:00  002        espanish
2014-08-25 22:09:00  003        english 

I need to plot the weekly number of users who have posted in more than one language. 我需要绘制以多种语言发布的每周用户数。

For example, in the above dataframe, user 001 and 002 have tweeted in more than one languages. 例如,在上述数据框中,用户001和002用多种语言发布了推文。 So in the plot, the corresponding value for this week should be 2. Same story for other weeks. 因此,在图中,本周的对应值应为2。

df.groupby([pd.Grouper(freq='W'), 'User ID'])['TweetLanguage'].nunique().unstack().plot()
df.groupby(pd.Grouper(key='datetime', freq='W')).apply(lambda df:\
df.groupby('UserID').apply(lambda df: len(df.TweetLanguage.value_counts())))

This is a one liner that will seperate the week and get number of language in a week 这是一种可以分开一周并在一周内获得语言数量的班轮

df.groupby('UserID').apply(lambda df: len(df.TweetLanguage.value_counts()))

This will return a series with index: value of user ID : number of language used for each week.. 这将返回一个序列,其索引为:用户ID的值:每周使用的语言数。

Use 2 groupbys . 使用2个groupbys The first finds the users who post in more than one language every week, the second counts how many there are per week. 第一个查找每周使用多种语言发布的用户,第二个查找每周有多少种语言。

(df.groupby([df.index.year.rename('year'), df.index.week.rename('week'), 'UserID']).TweetLanguage.nunique() > 1).groupby(level=[0,1]).sum()

#year  week
#2014  35      2.0
#Name: TweetLanguage, dtype: float64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM