Numbering subsequences in a Pandas DataFrame

Question

I've got a readings DataFrame that consists of two columns, experiment and value . experiment keys into an experiments DataFrame; there are 500 rows in a row with the same experiment and different value representing 500 readings on the same experiment where the order in the DF is the order the data was taken. Then 500 for the next experiment, etc.

I want to look for time-based trends in the experiments, so I assume that I want to label each point pos in 0-499 and then groupby('pos') . How do I create that pos column, an incrementing value that resets to 0 every time experiment resets? Which is, I guess, the same as the number of rows that experiment has been constant for.

Answer 1

If I understand you correctly...

>>> df = pd.DataFrame({'Experiment' : [1,1,1,2,2,2,2,3,3,3], 
                       'Value' : np.random.randn(10)})
>>> df

   Experiment     Value
0           1 -0.924851
1           1 -0.599875
2           1  0.069982
3           2 -1.106909
4           2  0.463922
5           2  0.210568
6           2 -0.171456
7           3 -0.768618
8           3 -0.269928
9           3  0.055613

You will use groupby followed by cumcount() to get the desired effect:

>>> df['Position'] = df.groupby('Experiment').cumcount()
>>> df

   Experiment     Value  Position
0           1 -0.924851         0
1           1 -0.599875         1
2           1  0.069982         2
3           2 -1.106909         0
4           2  0.463922         1
5           2  0.210568         2
6           2 -0.171456         3
7           3 -0.768618         0
8           3 -0.269928         1
9           3  0.055613         2

Numbering subsequences in a Pandas DataFrame

Question

1 answers

solution1
1 ACCPTED 2017-10-09 17:33:09

Numbering subsequences in a Pandas DataFrame

Question

1 answers

solution1 1 ACCPTED 2017-10-09 17:33:09

solution1
1 ACCPTED 2017-10-09 17:33:09