熊猫通过多列分组，计数和重新采样

Question

Having the following dataframe: 具有以下数据框：

                     UserID TweetLanguage
2014-08-25 21:00:00  001        english
2014-08-27 21:04:00  001        arabic
2014-08-29 22:07:00  001        espanish
2014-08-25 22:09:00  002        english
2014-08-26 22:09:00  002        espanish
2014-08-25 22:09:00  003        english

I need to plot the weekly number of users who have posted in more than one language. 我需要绘制以多种语言发布的每周用户数。

For example, in the above dataframe, user 001 and 002 have tweeted in more than one languages. 例如，在上述数据框中，用户001和002用多种语言发布了推文。 So in the plot, the corresponding value for this week should be 2. Same story for other weeks. 因此，在图中，本周的对应值应为2。

Answer 1

df.groupby([pd.Grouper(freq='W'), 'User ID'])['TweetLanguage'].nunique().unstack().plot()

Answer 2

df.groupby(pd.Grouper(key='datetime', freq='W')).apply(lambda df:\
df.groupby('UserID').apply(lambda df: len(df.TweetLanguage.value_counts())))

This is a one liner that will seperate the week and get number of language in a week 这是一种可以分开一周并在一周内获得语言数量的班轮

df.groupby('UserID').apply(lambda df: len(df.TweetLanguage.value_counts()))

This will return a series with index: value of user ID : number of language used for each week.. 这将返回一个序列，其索引为：用户ID的值：每周使用的语言数。

Answer 3

Use 2 groupbys . 使用2个groupbys 。 The first finds the users who post in more than one language every week, the second counts how many there are per week. 第一个查找每周使用多种语言发布的用户，第二个查找每周有多少种语言。

(df.groupby([df.index.year.rename('year'), df.index.week.rename('week'), 'UserID']).TweetLanguage.nunique() > 1).groupby(level=[0,1]).sum()

#year  week
#2014  35      2.0
#Name: TweetLanguage, dtype: float64

熊猫通过多列分组，计数和重新采样

问题描述

3 个解决方案

解决方案1
3 已采纳 2018-10-22 21:16:51

解决方案2
2 2018-10-22 21:08:42

解决方案3
2 2018-10-22 21:29:54

熊猫通过多列分组，计数和重新采样

问题描述

3 个解决方案

解决方案1 3 已采纳 2018-10-22 21:16:51

解决方案2 2 2018-10-22 21:08:42

解决方案3 2 2018-10-22 21:29:54

解决方案1
3 已采纳 2018-10-22 21:16:51

解决方案2
2 2018-10-22 21:08:42

解决方案3
2 2018-10-22 21:29:54