简体   繁体   中英

Pandas: How to remove words in string which appear before a certain word from another column

I have a large csv file with a column containing strings. At the beginning of these strings there are a set of id numbers which appear in another column as below.

0      Home /buy /York /Warehouse /P000166770Ou...             P000166770
1      Home /buy /York /Plot /P000165923A plot of la...        P000165923
2      Home /buy /London /Commercial /P000165504A str...       P000165504
                         ...                        
804    Brand new apartment on the first floor, situat...       P000185616

I want to remove all text which appears before the ID number so here we would get:

0      Ou...             
1      A plot of la...        
2      A str...       
                         ...                        
804    Brand new apartment on the first floor, situat...       

I tried something like

df['column_one'].str.split(df['column_two'])

and

df['column_one'].str.replace(df['column_two'],'')

You could replace the pattern using regex as follows:

>> my_pattern = "^(Alpha|Beta|QA|Prod)\s[A-Z0-9]{7}"
>> my_series = pd.Series(['Alpha P17089OText starts here'])
>> my_series.str.replace(my_pattern, '', regex=True)
0    Text starts here

There is a bit of work to be done to determine the nature of your pattern. I would suggest experimenting a bit with https://regex101.com/

To extend your split() idea:

df.apply(lambda x: x['column_one'].split(x['column_two'])[1], axis=1)

0    Text starts here

I managed to get it to work using:

df.apply(lambda x: x['column1'].split(x['column2'])[1] if x['column2'] in x['column1'] else x['column1'], axis=1)

This also works when the ID is not in the description. Thanks for the help!

Here is one way to do it, by applying regex to each of the row based on the code

import re
def ext(row):
    mch = re.findall(r"{0}(.*)".format(row['code']), row['txt'])
    if len(mch) >0:
        rtn = mch.pop()
    else:
        rtn = row['txt']
    return rtn

df['ext'] = df.apply(ext, axis=1) 
df
0                                               Ou...
1                                     A plot of la...
2                                            A str...
3    Brand new apartment on the first floor situat...


    x   txt                                                 code            ext
0   0   Home /buy /York /Warehouse / P000166770 Ou...        P000166770     Ou...
1   1   Home /buy /York /Plot /P000165923A plot of la...     P000165923     A plot of la...
2   2   Home /buy /London /Commercial /P000165504A str...    P000165504     A str...
3   804     Brand new apartment on the first floor situat... P000185616     Brand new apartment on the first floor situat...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM