简体   繁体   中英

Python/Pandas remove specific string from ending

I am trying to remove ending 'OF' from a column in the pandas dataframe. I tried 'rstrip', 'split', but it also removes 'O' and 'F', I just need to remove 'OF'. How to do that? Not sure why rstrip removes 'O' and 'F' when I have specifically passed 'OF'. Sorry if this question was asked before, I just couldn't find one yet. Thanks.

Sample Data:

l1 = [1,2,3,4]
l2 = ['UNIVERSITY OF CONN. OF','ONTARIO','UNIV. OF TORONTO','ALASKA DEPT.OF']
df = pd.DataFrame({'some_id':l1,'org':l2})
df

some_id org
1       UNIVERSITY OF CONN. OF
2       ONTARIO
3       UNIV. OF TORONTO
4       ALASKA DEPT.OF

Tried:

df.org.str.rstrip('OF')
# df.org.str.split('OF')[0] # Not what I am looking for

Results:

0    UNIVERSITY OF CONN. # works
1                  ONTARI # 'O' was removed
2         UNIV. OF TORONT # 'O' was removed
3            ALASKA DEPT. # works

Final output needed:

0    UNIVERSITY OF CONN. 
1                  ONTARIO
2         UNIV. OF TORONTO
3            ALASKA DEPT.

You can try this regex:

df.org = df.org.str.replace('(OF)$','')

where $ indicates the end of string. Or

df.org.str.rstrip('(OF)')

seems to work as expected.

Output:

0    UNIVERSITY OF CONN. 
1                 ONTARIO
2        UNIV. OF TORONTO
3            ALASKA DEPT.
Name: org, dtype: object

str.extract

Capture everything up until, and not including, a single optional 'OF' at the end of the word. I added a few more rows for test cases.

df['extract'] = df.org.str.extract('(.*?)(?=(?:OF$)|$)')

#   some_id                     org               extract
#0        1  UNIVERSITY OF CONN. OF  UNIVERSITY OF CONN. 
#1        2                 ONTARIO               ONTARIO
#2        3        UNIV. OF TORONTO      UNIV. OF TORONTO
#3        4          ALASKA DEPT.OF          ALASKA DEPT.
#4        5            fooOFfooOFOF            fooOFfooOF
#5        6                      fF                    fF
#6        7                   Seven                 Seven

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM