简体   繁体   中英

How to remove first part of URL string in column value with Pandas?

I'm struggling to remove the first part of my URLs in column myId in csv file.

my.csv

myID

https://mybrand.com/trigger:open?Myservice=Email&recipient=brn:zib:b1234567-9ee6-11b7-b4a2-7b8c2344daa8d

desired output for myID

b1234567-9ee6-11b7-b4a2-7b8c2344daa8d

my code:

df['myID'] = df['myID'].map(lambda x: x.lstrip('https://mybrand.com/trigger:open?Myservice=Email&recipient=brn:zib:'))

output in myID (first letter 'b' is missing in front of the string):

1234567-9ee6-11b7-b4a2-7b8c2344daa8d

the above code removes https://mybrand.com/trigger:open?Myservice=Email&recipient=brn:zib: However it also removes the first letter from myID if there is one in front of the ID, if it's a number then it remains unchanged.

Could someone help with this? thanks!

You could try a regex replacement here:

df['myID'] = df['myID'].str.replace('^.*:', '', regex=True)

This approach is to simply remove all content from the start of MyID up to, and including, the final colon. This would leave behind the UUID you want to keep.

With lstrip you remove all characters from a string that match the set of characters you pass as an argument. So:

string = abcd
test = string.lstrip(ad)
print(test)

If you want to strip the first x characters of the string, you can just slice it like an array. For you, that would be something like:

df['myID'] = df['myID'].map(lambda x: x[:-37])

However, for this to work, the part you want to get from the string should have a constant size.

You can use re (if the part before what you want to extract is always the same)

import re

idx = re.search(r':zib:', myID)
myNewID = myID[idx.end():]

Then you will have:

myNewID

'b1234567-9ee6-11b7-b4a2-7b8c2344daa8d'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM