简体   繁体   中英

How to extract information from one column to create a new column in a pandas data frame

I have a lot of excel files, I want to combine, but in a first step, I'm trying to manipulate the files. My data more or less looks like this:

session type role
parliament: 12
1 standing member
1 standing member
parliament: 13
1 standing member
2 standing member

Now, what I'm trying to do, is to add a new column containing the parliament information from the session column, while at the same time keeping all the other information as it is. So my final excel should look like this:

session type role parliament
1 standing member 12
1 standing member 12
1 standing member 13
2 standing member 13

Can you guys please help me understanding how to solve this?

You can groupby each partliament group using cumsum() , and then just restructure the data in the apply function to match the final output you want:

(df.groupby(df.session.str.contains('parliament').cumsum())
   .apply(lambda s: s[1:].assign(parliament=s.head(1).session.item().strip('parliament: ')))
   .reset_index(drop=True))

  session      type    role parliament
0       1  standing  member         12
1       1  standing  member         12
2       1  standing  member         13
3       2  standing  member         13

here is one way to do it

df[['txt','parliament']]=df['session'].str.split(':', expand=True).ffill()
df=df[(df['txt']!='parliament')]
df.drop(columns='txt')
    session     type        role    txt     parliament
1         1     standing    member  1         12
2         1     standing    member  1         12
4         1     standing    member  1         13
5         2     standing    member  2         13

You can extract the number after parliament: then front fill the value:

out = (df[~df['session'].str.startswith('parliament')]
           .join(df['session'].str.extract(r':\s(?P<parliament>\d+)').ffill()))
print(out)

# Output
  session      type    role parliament
1       1  standing  member         12
2       1  standing  member         12
4       1  standing  member         13
5       2  standing  member         13

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM