I have a DataFrame with several thousand rows that looks something like:
Index Chan Pick
1 1 0.001
2 2 0.001
3 3 0.001
4 4 0.001
5 1 0.003
6 2 0.003
7 3 0.003
8 1 0.006
9 2 0.006
10 1 0.002
11 2 0.002
12 3 0.002
13 4 0.002
14 5 0.002
15 6 0.002
The channel Chan
column has values that can range from 1 to 24 (sometimes there may be all 24 values, sometimes there may only be 2 values or 6 values, etc. as shown above). The values in the Pick
column will usually be the same for each group of channel values.
I need the average value in the Pick
column from a common channel block (ie the first block will avg to 0.001...the second block avgs to 0.003, because the Pick
values are all the same, but sometimes they may not be).
I know I need to use something similar to:
df.groupby('Chan')['Pick'].mean()
but I don't know how to implement the fact that Chan
can be from 1 to 24 and then the pattern starts over (ie the Chan
column can be 1 to 4, or 1 to 22, or 1 to 17, etc.)
A channel block essentially starts when the Chan
value is exactly 1. We have to exploit this property to accomplish the task.
Let channel_id
be a variable identifying each block with a unique progressive identifier. We can define it as follows:
channel_id = (df["Chan"] == 1).cumsum()
where (df["Chan"] == 1)
creates a mask with a True
where each block starts, then cumsum
does the job propagating the identifier over the block and increasing it each time a new block starts.
Now we have just to group by
according to this identifier and take the mean value of the Pick
column:
df.groupby(channel_id)["Pick"].mean()
You can do everything in one line without supplementary variables.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.