[英]Counting Mulitple Values in a Python pandas Dataframe Column
I'm trying to count unique values in a pandas dataframe column that contains multiple values separated by a string. 我正在尝试计算pandas数据帧列中的唯一值,该列包含由字符串分隔的多个值。 I could do this using value_counts() if this were a series, but how would I do this in a dataframe?
我可以使用value_counts()来做这个,如果这是一个系列,但我如何在数据框中执行此操作? It seems like a dataframe should be easier.
似乎数据框架应该更容易。
Data: 数据:
ID Tags
Created at
2016-03-10 09:46:00 3074 tag_a
2016-04-13 11:50:00 3524 tag_a tag_b
2016-05-18 15:22:00 3913 tag_a tag_b tag_c
Code: 码:
%matplotlib inline
import pandas as pd
# read csv into the data dataframe
allData = r'myData.csv'
tickets_df = pd.read_csv((allData),usecols=['Id','Created at','Tags'],parse_dates=['Created at'], index_col=['Created at'])
tickets_df.fillna(0,inplace=True)
tickets_df['2016':'2016']
# this would work with a series:
tickets_df[tickets_df['Tags'].str.split().apply(lambda x: pd.Series(x).value_counts()).sum()]
Error: 错误:
KeyError: '[ 3. 2. 3. 5. 2. 102. 9. 5. 1. 4. 1. 161.\n 4. 4. 1. 6. 4. 34. 1. 1. 1. 6. 2. 5.\n 1. 1. 1. 1. 11. 2. 1. 1. 3. 1. 1. 1.\n 1. 1. 1. 1. 2. 1. 1. 2. 2. 6. 1. 4.\n 2. 1. 1. 2. 1. 1. 1. 3. 2. 1. 4. 35.\n 11. 2. 1. 13. 3. 8. 63. 87. 2. 2. 1. 1.\n 1. 1. 1. 1. 150. 1. 24. 3. 7. 5. 1. 1.\n 3. 4. 2. 6. 1. 2. 3. 5. 2. 5. 15. 1.\n 42. 1. 14. 1. 1. 1. 6. 13. 13. 9. 2. 11.\n 3. 1. 1.] not in index'
Desired Output: 期望的输出:
tag_a 3
tag_b 2
tag_c 1
Use str.split
with expand=True
to separate each string into different columns, then use stack
followed by value_counts
: 使用带有
expand=True
str.split
将每个字符串分成不同的列,然后使用stack
后跟value_counts
:
df['Tags'].str.split(expand=True).stack().value_counts()
The resulting output: 结果输出:
tag_a 3
tag_b 2
tag_c 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.