[英]Python/Pandas remove specific string from ending
I am trying to remove ending 'OF' from a column in the pandas dataframe. 我试图从pandas数据帧中的列中删除结尾的'OF'。 I tried 'rstrip', 'split', but it also removes 'O' and 'F', I just need to remove 'OF'.
我尝试'rstrip','拆分',但它也删除'O'和'F',我只需要删除'OF'。 How to do that?
怎么做? Not sure why rstrip removes 'O' and 'F' when I have specifically passed 'OF'.
我不知道为什么当我专门通过'OF'时,rstrip会删除'O'和'F'。 Sorry if this question was asked before, I just couldn't find one yet.
对不起,如果以前问过这个问题,我还是找不到一个。 Thanks.
谢谢。
Sample Data: 样本数据:
l1 = [1,2,3,4]
l2 = ['UNIVERSITY OF CONN. OF','ONTARIO','UNIV. OF TORONTO','ALASKA DEPT.OF']
df = pd.DataFrame({'some_id':l1,'org':l2})
df
some_id org
1 UNIVERSITY OF CONN. OF
2 ONTARIO
3 UNIV. OF TORONTO
4 ALASKA DEPT.OF
Tried: 尝试:
df.org.str.rstrip('OF')
# df.org.str.split('OF')[0] # Not what I am looking for
Results: 结果:
0 UNIVERSITY OF CONN. # works
1 ONTARI # 'O' was removed
2 UNIV. OF TORONT # 'O' was removed
3 ALASKA DEPT. # works
Final output needed: 需要最终输出:
0 UNIVERSITY OF CONN.
1 ONTARIO
2 UNIV. OF TORONTO
3 ALASKA DEPT.
You can try this regex: 你可以尝试这个正则表达式:
df.org = df.org.str.replace('(OF)$','')
where $
indicates the end of string. 其中
$
表示字符串的结尾。 Or 要么
df.org.str.rstrip('(OF)')
seems to work as expected. 似乎按预期工作。
Output: 输出:
0 UNIVERSITY OF CONN.
1 ONTARIO
2 UNIV. OF TORONTO
3 ALASKA DEPT.
Name: org, dtype: object
str.extract
Capture everything up until, and not including, a single optional 'OF'
at the end of the word. 捕获所有内容,直到并且不包括单词末尾的单个可选
'OF'
。 I added a few more rows for test cases. 我为测试用例添加了几行。
df['extract'] = df.org.str.extract('(.*?)(?=(?:OF$)|$)')
# some_id org extract
#0 1 UNIVERSITY OF CONN. OF UNIVERSITY OF CONN.
#1 2 ONTARIO ONTARIO
#2 3 UNIV. OF TORONTO UNIV. OF TORONTO
#3 4 ALASKA DEPT.OF ALASKA DEPT.
#4 5 fooOFfooOFOF fooOFfooOF
#5 6 fF fF
#6 7 Seven Seven
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.