简体   繁体   English

在数据框中搜索子字符串并将其替换

[英]Searching for a substring in a dataframe and replacing it

I have a condition where spurious data is created and I am trying to clean it. 我有创建虚假数据的情况,我正在尝试清理它。

eg... 例如...

www.one@foxturn.com/!ut/5 #RealLink
www.one@foxturn.com/ut1/5_RTFDEERERTGFEFD # System adds junks to it
www.one@foxturn.com/ut1/5_dvkerfddfrejermsdkasmf # System adds junks to it

I am trying to clean this up by dropping everything after !ut 我正在尝试通过删除!ut之后的所有内容来清理此问题

So far I have tried : 到目前为止,我已经尝试过:

SPA_MX = Mexico['Page URL'].str.startswith("http://www.www.one@foxturn.com/ut1")

but this returns a boolean. 但这返回一个布尔值。

I would like advise on the most efficient way to achieve this. 我想建议最有效的方法来实现这一目标。

You can do this using apply on the column and then use find to return the index of the pattern and slice the str if found: 您可以在列上使用apply来执行此操作,然后使用find返回模式的索引并切片str(如果找到):

In[69]:

df['url'].apply(lambda x: x[:x.find('!ut') + 3] if x.find('!ut') != -1 else x)

Out[69]: 
0                             www.one@foxturn.com/!ut
1           www.one@foxturn.com/ut1/5_RTFDEERERTGFEFD
2    www.one@foxturn.com/ut1/5_dvkerfddfrejermsdkasmf
Name: url, dtype: object
my_string="www.one@foxturn.com/!ut/5"
final =  my_string.split("!ut")[0]

output: 输出:

www.one@foxturn.com/ www.one@foxturn.com/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM