在Python pandas Dataframe列中计算Mulitple值

Question

I'm trying to count unique values in a pandas dataframe column that contains multiple values separated by a string. 我正在尝试计算pandas数据帧列中的唯一值，该列包含由字符串分隔的多个值。 I could do this using value_counts() if this were a series, but how would I do this in a dataframe? 我可以使用value_counts（）来做这个，如果这是一个系列，但我如何在数据框中执行此操作？ It seems like a dataframe should be easier. 似乎数据框架应该更容易。

Data: 数据：

                      ID       Tags
 Created at 
 2016-03-10 09:46:00  3074     tag_a
 2016-04-13 11:50:00  3524     tag_a tag_b 
 2016-05-18 15:22:00  3913     tag_a tag_b tag_c

Code: 码：

%matplotlib inline
import pandas as pd

# read csv into the data dataframe
allData = r'myData.csv'

tickets_df = pd.read_csv((allData),usecols=['Id','Created at','Tags'],parse_dates=['Created at'], index_col=['Created at'])
tickets_df.fillna(0,inplace=True)
tickets_df['2016':'2016']

# this would work with a series:

tickets_df[tickets_df['Tags'].str.split().apply(lambda x: pd.Series(x).value_counts()).sum()]

Error: 错误：

KeyError: '[   3.    2.    3.    5.    2.  102.    9.    5.    1.    4.    1.  161.\n    4.    4.    1.    6.    4.   34.    1.    1.    1.    6.    2.    5.\n    1.    1.    1.    1.   11.    2.    1.    1.    3.    1.    1.    1.\n    1.    1.    1.    1.    2.    1.    1.    2.    2.    6.    1.    4.\n    2.    1.    1.    2.    1.    1.    1.    3.    2.    1.    4.   35.\n   11.    2.    1.   13.    3.    8.   63.   87.    2.    2.    1.    1.\n    1.    1.    1.    1.  150.    1.   24.    3.    7.    5.    1.    1.\n    3.    4.    2.    6.    1.    2.    3.    5.    2.    5.   15.    1.\n   42.    1.   14.    1.    1.    1.    6.   13.   13.    9.    2.   11.\n    3.    1.    1.] not in index'

Desired Output: 期望的输出：

tag_a  3
tag_b  2 
tag_c  1

Answer 1

Use str.split with expand=True to separate each string into different columns, then use stack followed by value_counts : 使用带有expand=True str.split将每个字符串分成不同的列，然后使用stack后跟value_counts ：

df['Tags'].str.split(expand=True).stack().value_counts()

The resulting output: 结果输出：

tag_a    3
tag_b    2
tag_c    1

在Python pandas Dataframe列中计算Mulitple值

问题描述

1 个解决方案

解决方案1
3 已采纳 2016-09-27 20:48:00

在Python pandas Dataframe列中计算Mulitple值

问题描述

1 个解决方案

解决方案1 3 已采纳 2016-09-27 20:48:00

解决方案1
3 已采纳 2016-09-27 20:48:00