简体   繁体   中英

Match capture group to given pattern in pandas column

I have a DataFrame 'tdf' with a column "Cars" where the values are 4 letters and then 1-6 numbers.

Cars
JAXT450678
KYXS 56746
LMOP01456
...

I have compiled the regex to match it:
r'(?=[a-zA-Z]{4}\\s*\\d{1,6}\\b)([a-zA-Z]{4})(\\s?\\d+)'

What I want to do is strip the whitespace out and then make sure that if there are less than 6 digits in the second group, insert 0's starting at the second group until a count of 6 is reached, such that the result is:

Cars
JAXT450678
KYXS056746
LMOP001456
...

Any help is appreciated. I have tried playing around with .replace and .sub and can get it to replace the entire match, but I don't know how to reference the group and have it dynamically match.

tdf = tdf.replace(r'(?=[a-zA-Z]{4}\\s*\\d{1,6}\\b)([a-zA-Z]{4})(\\s?\\d+)','000000', regex = True)

You may use

df['Cars'] = df['Cars'].str.replace(r'^([a-zA-Z]{4})\s*(\d{1,6})$', lambda x: "{}{}".format(x.group(1), x.group(2).zfill(6)))

Details

  • ^ - matches the start of a string
  • ([a-zA-Z]{4}) - Group 1: four letters
  • \\s* - 0+ whitespaces
  • (\\d{1,6}) - Group 2: 1 to 6 digits
  • $ - end of string.

The lambda x: "{}{}".format(x.group(1), x.group(2).zfill(6)) callable concats Group 1 value and Group 2 value that is padded with zeros up to 6 positions.

Since \\s* is out of parentheses, the whitespaces matched with this pattern are omitted from the result.

Use replace to replace white space and zfill to fill number part of the string,

df['Cars'].str.replace(' ', '').apply(lambda x: x[:4] + x[4:].zfill(6))

0    JAXT450678
1    KYXS056746
2    LMOP001456

Not quite a one-liner, but you can avoid apply :

s = df.Cars.str.slice(4).str.strip().str.zfill(6)
df.Cars.str.slice(0,4) + s

Output:

0    JAXT450678
1    KYXS056746
2    LMOP001456
Name: Cars, dtype: object

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM