简体   繁体   中英

Remove last part of string in column if strings match pattern in pandas

I'm trying to remove the end part of strings if the strings matches a specified pattern.

These are the two string formats I'm working with:

ColA
2-OX-1011054-LWJ04-HT-01
2-VH-0611052-LWJ04-HT-001

I'd like to remove the -01 and -001 from both.

I know I could write something that removes everything after the last - however there's a lot of other strings in that column of different patterns, which will get messed up if I do that.

So I'd only like to remove the last part if the string exactly matches the pattern.

I've used something like this before, however I'm not 100% sure how it works, but I'd guess it could be refactored to my purpose:

report['ColA'] = report['ColA'].str.replace(r'(?<=^\w{2}-\d{5}-\d{3})(-\d+)', '', regex=True)

EDIT: Should specify that not all the end numbers are 01 or 001 . It could be any number from 000 to 999

report['ColA'] = report['ColA'].str.replace(r'(?<=^\d{1}-\w{2}-\d{7}-\w{3}\d{2}-\w{2})(-\d+)', '', regex=True)

Try

report['ColA'] = report['ColA'].str.replace(r'-0{1,2}1', '', regex=True)

It only works for -01 and -001 . Is it what you wanted?
EDIT
If it's any number, this should work

report['ColA'] = report['ColA'].str.replace(r'-\d+$', '', regex=True)

How about str.rsplit to split from the right side but use a boolean mask to decide which one to delete and which one to keep?

splits = report['ColA'].str.rsplit('-', 1)
mask = splits.str[-1].isin(['01','001'])
report.loc[mask, 'ColA'] = splits[mask].str[0]

Output:

                    ColA
0  2-OX-1011054-LWJ04-HT
1  2-VH-0611052-LWJ04-HT

"There's a lot of other strings in that column of different patterns"

You can use:

report['ColA'] = report['ColA'].str.replace(r'^(\d+-[A-Z]{2}-\d{7}-[A-Z]{3}\d\d-[A-Z]{2})-\d{1,3}$', r'\1', regex=True)

The pattern ^(\d+-[AZ]{2}-\d{7}-[AZ]{3}\d\d-[AZ]{2})-\d{1,3}$ means:

  • ^ - Start-line anchor;
  • ( - Open 1st capture group;
    • \d+ - 1+ Digits (remove the '+' if you are certain this will always just be a single digit);
    • -[AZ]{2} - A literal hyphen followed by two uppercase alpha-chars;
    • -\d{7} - An hyphen followed by 7 digits;
    • -[AZ]{3}\d\d - A literal hyphen followed by three uppercase alpha-chars and two digits;
    • -[AZ]{2} - A literal hyphen followed by two uppercase alpha-chars;
    • ) - Close 1st capture group;
  • -\d{1,3} - An hyphen followed by one to three digits;
  • $ - End-line anchor.

We use the content of the 1st capture group to replace the whole (matched) string with. This way you have validated that you only replace those digits where needed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM