简体   繁体   English

我想在PANDAS数据框中计算每个主题内的观察次数

[英]I want to count number of observations within each subject in PANDAS dataframe

I am quite new to using PANDAS and python in general. 我对使用PANDAS和python来说是个新手。

I have a hierarchical data set with several subjects, each of whom have some number of observations. 我有一个包含几个主题的分层数据集,每个主题都有一些观察结果。 The total df is about half a million rows. 总df约为50万行。

I want to calculate the observations number... 我想计算观测值...

## toy problem

d = {'one' : Series(['a', 'a', 'a', 'b', 'b', 'b'], index = [0,1,2,3,4,5]),
     'two' : Series([1.1, 2.5, 3.3, 2.5, 3.3, 9.5], index = [0,1,2,3,4,5])}
df = DataFrame(d)

for i in df.one.unique():
    for j in range(0,len(df[df.one == i])):
        print j

So I want to assign j to a column for each row. 所以我想将j分配给每一行的一列。 I have no problem calculating j but I cannot figure out how to assign it. 我没有问题计算j,但我不知道如何分配它。 I have tried using iloc which is incredibly slow, or writing to a list and then joining this to the df, also really slow (currently running for over 30 mins and counting...). 我曾尝试使用iloc,它速度非常慢,或者写入列表,然后将其加入df,它也非常慢(当前运行了30分钟以上,并且正在计数...)。 I understand that python is best with vectorised problems but I cannot think of a vector solution for this case. 我知道python最适合矢量化问题,但我无法想到这种情况下的矢量化解决方案。

What is the best way to do this? 做这个的最好方式是什么? It is really easy and quick in R. I am currently migrating to Python & PANDAS under the expectation that it is faster but this doesnt appear to be the case here. 在R中,它确实非常容易且快速。我目前正在迁移到Python&PANDAS,期望它速度更快,但事实并非如此。

Any advice please? 有什么建议吗?

You could use the GroupBy.cumcount method : 您可以使用GroupBy.cumcount方法

In [14]: df['j'] = df.groupby('one').cumcount()

In [15]: df
Out[15]: 
  one  two  j
0   a  1.1  0
1   a  2.5  1
2   a  3.3  2
3   b  2.5  0
4   b  3.3  1
5   b  9.5  2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我想计算Pandas DataFrame中每一列的具体数字? - I want to count the Specific number in each column in Pandas DataFrame? Pandas Dataframe 中每一行内的计数条件 - Count conditions within each row in Pandas Dataframe 如何在 pandas dataframe 中制作相同数量的观察值? - How do I make bins of equal number of observations in a pandas dataframe? 我想计算 pandas dataframe 中出现的一个子集 - I want to count occurence of a subset in pandas dataframe Pandas:按列中的观察数量扩展DataFrame - Pandas: expanding DataFrame by number of observations in column 给出了 n 列的数据框,我想计算每列中特定数字的频率 - A dataframe of n column given and I want to count frequency of a particular number from each column 如何计算熊猫数据框每行中字符串组合的数量? - How to do I count the number of string combinations in each row of a pandas dataframe? 使用Python如何在Pandas数据帧中的每一行的范围内生成一个随机数? - Using Python how do I generate a random number within a range for each row in Pandas dataframe? 熊猫:对于DataFrame中的每一行,计算符合条件的行数 - Pandas: for each row in a DataFrame, count the number of rows matching a condition 计数 Pandas Dataframe 中时间间隔内的行数 - Count Number of Rows within Time Interval in Pandas Dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM