简体   繁体   中英

How to rename a header and add values (to this column) based on other header name?

I have multiple Pandas dataframes like this one (for different years):

df1=

        Unnamed: 0           b      c     Monthly Flow (2018)     
1              nan   -0.041619  43.91               -0.041619
2              nan    0.011913  43.91               -0.041619
3              nan   -0.048801  43.91               -0.041619
4              nan    0.002857  43.91               -0.041619
5              nan    0.002204  43.91               -0.041619
6              nan   -0.007692  43.91               -0.041619
7              nan   -0.014992  43.91               -0.041619
8              nan   -0.035381  43.91               -0.041619

And I would like to assign to the nan the year in the Monthly Flow (2018) column, thus achieving this output:

       Year           b      c     Monthly Flow (2018)     
1      2018   -0.041619  43.91               -0.041619
2      2018    0.011913  43.91               -0.041619
3      2018   -0.048801  43.91               -0.041619
4      2018    0.002857  43.91               -0.041619
5      2018    0.002204  43.91               -0.041619
6      2018   -0.007692  43.91               -0.041619
7      2018   -0.014992  43.91               -0.041619
8      2018   -0.035381  43.91               -0.041619

I know how to replace these nan by a specific year, one dataframe at a time.

But, since I have a lot of dataframes (and will have more in the future), I would like to know a way to do this automatically, for example by extracting the year value from column Monthly Flow (2018) .

Assuming Monthly flow is always the 5th column, you can do it like this:

import re
df = df.rename(columns={'Unnamed: 0': 'Year'})
df.iloc[:,0] = re.search('\d{4}', df.columns[4]).group(0)

Explanation:

re.search looks for 4 numbers in a row and extracts them from the fifth column.

I rename the Unnamed column as Year .

Working code:

import pandas as pd
import numpy as np
import re
df = pd.DataFrame({'Unnamed: 0': {0: np.nan},
 'a': {0: 1},
 'a2': {0: 1},
 'a3': {0: 1},
 'Monthly Flow (2018)': {0: 'b'}})
df = df.rename(columns={'Unnamed: 0': 'Year'})
df.iloc[:,0] = re.search('\d{4}', df.columns[4]).group(0)

Using re

import re
def find_year(column):
    year = column.name
    return int(re.search(r'\d{4}',year).group(0))


df = df.rename(columns={'Unnamed: 0' : 'Year'})
# change 3 to match the column location of your target column
df['Year'] = df['Year'].fillna(find_year(df.iloc[:,3]))

print(df)
     Year         b      c  Monthly Flow (2018)
0  2018.0 -0.041619  43.91            -0.041619
1  2018.0  0.011913  43.91            -0.041619
2  2018.0 -0.048801  43.91            -0.041619
3  2018.0  0.002857  43.91            -0.041619
4  2018.0  0.002204  43.91            -0.041619
5  2018.0 -0.007692  43.91            -0.041619
6  2018.0 -0.014992  43.91            -0.041619
7  2018.0 -0.035381  43.91            -0.041619

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM