简体   繁体   中英

Numbering subsequences in a Pandas DataFrame

I've got a readings DataFrame that consists of two columns, experiment and value . experiment keys into an experiments DataFrame; there are 500 rows in a row with the same experiment and different value representing 500 readings on the same experiment where the order in the DF is the order the data was taken. Then 500 for the next experiment, etc.

I want to look for time-based trends in the experiments, so I assume that I want to label each point pos in 0-499 and then groupby('pos') . How do I create that pos column, an incrementing value that resets to 0 every time experiment resets? Which is, I guess, the same as the number of rows that experiment has been constant for.

If I understand you correctly...

>>> df = pd.DataFrame({'Experiment' : [1,1,1,2,2,2,2,3,3,3], 
                       'Value' : np.random.randn(10)})
>>> df

   Experiment     Value
0           1 -0.924851
1           1 -0.599875
2           1  0.069982
3           2 -1.106909
4           2  0.463922
5           2  0.210568
6           2 -0.171456
7           3 -0.768618
8           3 -0.269928
9           3  0.055613

You will use groupby followed by cumcount() to get the desired effect:

>>> df['Position'] = df.groupby('Experiment').cumcount()
>>> df

   Experiment     Value  Position
0           1 -0.924851         0
1           1 -0.599875         1
2           1  0.069982         2
3           2 -1.106909         0
4           2  0.463922         1
5           2  0.210568         2
6           2 -0.171456         3
7           3 -0.768618         0
8           3 -0.269928         1
9           3  0.055613         2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM