简体   繁体   中英

Python Pandas: calculate median for every row over every n rows (like overlapping groups)

I have a dataframe like this, but much bigger:

Index    Duration
1        100          
2        300
3        350
4        200
5        500
6        1000
7        350
8        200
9        400

I want to calculate a new column with the median for every 3 rows, but in every row. Like this:

Index    Duration     Median
1        100          
2        300          300
3        350          300
4        200          350
5        500          500
6        1000         500
7        350          350
8        200          350
9        400

So for every median row it takes 3 rows, starting at the beginning. But it always has to take one row before and one row after the own one. So that the row the median is being written, is in the middle. Because of that the first and last row has to be empty. The rest of the dataframe has to stay like it is.

Here is an example picture of it looks like in Excel:

在此处输入图片说明

在此处输入图片说明

In Excel it's easy. You can just move the formula down the rows. I tried many solutions I found here, but they're not doing what I want it to be.

For example I tried this, but it just makes me 3 groups, which I don't want:

df.groupby(np.arange(len(df))//3).median()

#output:

Index     Duration     Median
1         100          500
2         300          350
3         350          350

I want the groups to overlap like I showed above. I hope you understood my problem and can help me with that.

Use Series.rolling with center=True parameter:

df['Median'] = df['Duration'].rolling(3, center=True).median()
print (df)
   Index  Duration  Median
0      1       100     NaN
1      2       300   300.0
2      3       350   300.0
3      4       200   350.0
4      5       500   500.0
5      6      1000   500.0
6      7       350   350.0
7      8       200   350.0
8      9       400     NaN

Another idea is shifting by 1 row:

df['Median'] = df['Duration'].rolling(3).median().shift(-1)

You should consider rolling functionality of Pandas.

The syntax goes like: DF.rolling(n-rows, center=True / False).median()

This will make you work much better.
In your case:

df.rolling(3).median()

I hope it Helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM