[英]Numbering subsequences in a Pandas DataFrame
I've got a readings DataFrame that consists of two columns, experiment
and value
. 我有一个由两列组成的读数 DataFrame:
experiment
和value
。 experiment
keys into an experiments DataFrame; experiment
键插入实验数据框; there are 500 rows in a row with the same experiment
and different value
representing 500 readings on the same experiment where the order in the DF is the order the data was taken. 一行中有500行具有相同的
experiment
而不同的value
代表同一实验中的500个读数,其中DF中的顺序是获取数据的顺序。 Then 500 for the next experiment, etc. 然后500用于下一个实验,依此类推。
I want to look for time-based trends in the experiments, so I assume that I want to label each point pos
in 0-499 and then groupby('pos')
. 我想在实验中寻找基于时间的趋势,因此我假设我想在0-499中标记每个点
pos
,然后再标记groupby('pos')
。 How do I create that pos
column, an incrementing value that resets to 0 every time experiment
resets? 如何创建该
pos
列,一个递增的值,每次experiment
重置时该值都会重置为0? Which is, I guess, the same as the number of rows that experiment
has been constant for. 我猜这与
experiment
恒定行数相同。
If I understand you correctly... 如果我理解正确的话...
>>> df = pd.DataFrame({'Experiment' : [1,1,1,2,2,2,2,3,3,3],
'Value' : np.random.randn(10)})
>>> df
Experiment Value
0 1 -0.924851
1 1 -0.599875
2 1 0.069982
3 2 -1.106909
4 2 0.463922
5 2 0.210568
6 2 -0.171456
7 3 -0.768618
8 3 -0.269928
9 3 0.055613
You will use groupby
followed by cumcount()
to get the desired effect: 您将在
groupby
之后使用cumcount()
获得所需的效果:
>>> df['Position'] = df.groupby('Experiment').cumcount()
>>> df
Experiment Value Position
0 1 -0.924851 0
1 1 -0.599875 1
2 1 0.069982 2
3 2 -1.106909 0
4 2 0.463922 1
5 2 0.210568 2
6 2 -0.171456 3
7 3 -0.768618 0
8 3 -0.269928 1
9 3 0.055613 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.