[英]Corresponding indices of pandas value_counts() method
I often use value_counts() methods in pandas to get statistics.我经常在 pandas 中使用 value_counts() 方法来获取统计信息。
For example, I can get the value_counts() result like below.例如,我可以获得如下所示的 value_counts() 结果。
male 7825
female 6764
Is there any built-in function to get indices of the dataframe corresponding two labels(male and female).是否有任何内置函数来获取对应于两个标签(男性和女性)的数据帧的索引。
Expected result: male_indices = [1,3,5,6,7, ..., 14589]
, in which len(male_indices) = 7825预期结果: male_indices = [1,3,5,6,7, ..., 14589]
,其中 len(male_indices) = 7825
This is what groupby
does.这就是groupby
所做的。 Consider the example dataframe df
考虑示例数据帧df
np.random.seed([3,1415])
df = pd.DataFrame(dict(sex=np.random.choice(('male', 'female'), 10)))
print(df)
sex
0 male
1 female
2 male
3 female
4 male
5 male
6 female
7 male
8 female
9 female
Use groupby.groups
使用groupby.groups
df.groupby('sex').groups
{'female': Int64Index([1, 3, 6, 8, 9], dtype='int64'),
'male': Int64Index([0, 2, 4, 5, 7], dtype='int64')}
Here's a minimal, somewhat-robust function that returns the indices corresponding to a given group within a given column in a DataFrame:这是一个最小的、有点健壮的函数,它返回对应于 DataFrame 中给定列中给定组的索引:
# create some data
d = pd.DataFrame({'sex': ['male', 'male', 'female', 'male', 'female', 'female', 'male'], 'age': [23, 24, 20, 32, 45, 43, 32]})
# returns a dictionary with group names as keys and indices corresponding
# to those groups as values (can just use `list` or `set` to avoid pandas indexes
def get_indices(df, col):
return {group: df[df[col] == group].index for group in set(df[col])}
# test it out
get_indices(d, 'sex')
Out[178]:
{'female': Int64Index([2, 4, 5], dtype='int64'),
'male': Int64Index([0, 1, 3, 6], dtype='int64')}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.