简体   繁体   中英

How to iterate over pandas dataframe rows, find string and separate into colums?

So here is my issue, I have a dataframe df with a column "Info" like this:

0 US[edit]  
1 Boston(B1)  
2 Washington(W1)  
3 Chicago(C1)  
4 UK[edit]  
5 London(L2)   
6 Manchester(L2) 

I would like to put all the strings containing "[ed]" into a separate column df['state'], the remaining strings should be put into another column df['city']. I wanna do some clean up too and remove things in [] and (). This is what I tried:

for ind, row in df.iterrows():
    if df['Info'].str.contains('[ed', regex=False):
        df['state']=df['info'].str.split('\[|\(').str[0]
    else:
        df['city']=df['info'].str.split('\[|\(').str[0]

At the end I would like to have something like this

US Boston  
US Washington  
US Chicago  
UK London     
UK Manchester  

When I try this I always get "The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()"

Any help? Thanks!!

Use Series.where with forward filling missing values for state column, for city assign Series s and then filter by boolean indexing with inverted mask by ~ :

m = df['Info'].str.contains('[ed', regex=False)
s = df['Info'].str.split('\[|\(').str[0]

df['state'] = s.where(m).ffill()
df['city'] = s

df = df[~m]
print (df)
             Info state        city
1      Boston(B1)    US      Boston
2  Washington(W1)    US  Washington
3     Chicago(C1)    US     Chicago
5      London(L2)    UK      London
6  Manchester(L2)    UK  Manchester

If you want you can also remove original column by adding DataFrame.pop :

m = df['Info'].str.contains('[ed', regex=False)
s = df.pop('Info').str.split('\[|\(').str[0]

df['state'] = s.where(m).ffill()
df['city'] = s

df = df[~m]
print (df)
  state        city
1    US      Boston
2    US  Washington
3    US     Chicago
5    UK      London
6    UK  Manchester

I would do:

s = df.Info.str.extract('([\w\s]+)(\[edit\])?')

df['city'] = s[0]
df['state'] = s[0].mask(s[1].isna()).ffill()
df = df[s[1].isna()]

Output:

                Info        city state
1  1      Boston(B1)      Boston    US
2  2  Washington(W1)  Washington    US
3  3     Chicago(C1)     Chicago    US
5  5      London(L2)      London    UK
6  6  Manchester(L2)  Manchester    UK

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM