简体   繁体   中英

Pandas dataframe slicing

I have the following dataframe:

    2012   2013   2014   2015  2016   2017   2018                 Kategorie
0   5.31   5.27   5.61   4.34   4.54   5.02   7.07  Gewinn pro Aktie in EUR
1  13.39  14.70  12.45  16.29  15.67  14.17  10.08                      KGV
2 -21.21  -0.75   6.45 -22.63  -7.75   9.76  47.52           Gewinnwachstum
3 -17.78   2.27  -0.55   3.39   1.48   0.34    NaN                      PEG

Now, I am selecting only the KGV row with:

df[df["Kategorie"] == "KGV"]

Which outputs:

    2012  2013   2014   2015  2016   2017   2018  Kategorie
1  13.39  14.7  12.45  16.29  15.67  14.17  10.08       KGV

How do I calculate the mean() of the last five years (2016,15,14,13,12 in this example)?
I tried

df[df["Kategorie"] == "KGV"]["2016":"2012"].mean()

but this throws a TypeError . Why can I not slice the columns here?

loc supports that type of slicing (from left to right):

df.loc[df["Kategorie"] == "KGV", "2012":"2016"].mean(axis=1)
Out: 
1    14.5
dtype: float64

Note that this does not necessarily mean 2012, 2013, 2014, 2015 and 2016. These are strings so it means all columns between df['2012'] and df['2016'] . There could be a column named foo in between and it would be selected.

Not sure why the last five years are 2012-2016 (they seem to be the first five years). Notwithstanding, to find the mean for 2012-2016 for 'KGV' , you can use

df[df['Kategorie'] == 'KGV'][[c for c in df.columns if c != 'Kategorie' and 2012 <= int(c) <= 2016]].mean(axis=1)

I used filter and iloc

row = df[df.Kategorie == 'KGV']

row.filter(regex='\d{4}').sort_index(1).iloc[:, -5:].mean(1)

1    13.732
dtype: float64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM