简体   繁体   English

如何使用 Pandas 删除列值中 URL 字符串的第一部分?

[英]How to remove first part of URL string in column value with Pandas?

I'm struggling to remove the first part of my URLs in column myId in csv file.我正在努力删除 csv 文件中 myId 列中我的 URL 的第一部分。

my.csv我的.csv

myID

https://mybrand.com/trigger:open?Myservice=Email&recipient=brn:zib:b1234567-9ee6-11b7-b4a2-7b8c2344daa8d

desired output for myID myID 所需的 output

b1234567-9ee6-11b7-b4a2-7b8c2344daa8d

my code:我的代码:

df['myID'] = df['myID'].map(lambda x: x.lstrip('https://mybrand.com/trigger:open?Myservice=Email&recipient=brn:zib:'))

output in myID (first letter 'b' is missing in front of the string): myID 中的 output(字符串前面缺少第一个字母“b”):

1234567-9ee6-11b7-b4a2-7b8c2344daa8d

the above code removes https://mybrand.com/trigger:open?Myservice=Email&recipient=brn:zib: However it also removes the first letter from myID if there is one in front of the ID, if it's a number then it remains unchanged.上面的代码删除https://mybrand.com/trigger:open?Myservice=Email&recipient=brn:zib:但是,如果 ID 前面有一个,它也会删除 myID 中的第一个字母,如果它是一个数字,那么它仍然存在不变。

Could someone help with this?有人可以帮忙吗? thanks!谢谢!

You could try a regex replacement here:您可以在这里尝试正则表达式替换:

df['myID'] = df['myID'].str.replace('^.*:', '', regex=True)

This approach is to simply remove all content from the start of MyID up to, and including, the final colon.这种方法是简单地删除从MyID开始到最后一个冒号的所有内容,包括最后一个冒号。 This would leave behind the UUID you want to keep.这将留下您想要保留的 UUID。

With lstrip you remove all characters from a string that match the set of characters you pass as an argument.使用 lstrip 从字符串中删除与作为参数传递的字符集匹配的所有字符。 So:所以:

string = abcd
test = string.lstrip(ad)
print(test)

If you want to strip the first x characters of the string, you can just slice it like an array.如果你想去掉字符串的前 x 个字符,你可以像数组一样切片。 For you, that would be something like:对你来说,这将是这样的:

df['myID'] = df['myID'].map(lambda x: x[:-37])

However, for this to work, the part you want to get from the string should have a constant size.但是,要使其正常工作,您要从字符串中获取的部分应该具有恒定的大小。

You can use re (if the part before what you want to extract is always the same)您可以使用re (如果您要提取的部分之前的部分始终相同)

import re

idx = re.search(r':zib:', myID)
myNewID = myID[idx.end():]

Then you will have:然后你将拥有:

myNewID

'b1234567-9ee6-11b7-b4a2-7b8c2344daa8d'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM