Match capture group to given pattern in pandas column

Question

I have a DataFrame 'tdf' with a column "Cars" where the values are 4 letters and then 1-6 numbers.

Cars
JAXT450678
KYXS 56746
LMOP01456
...

I have compiled the regex to match it:
r'(?=[a-zA-Z]{4}\\s*\\d{1,6}\\b)([a-zA-Z]{4})(\\s?\\d+)'

What I want to do is strip the whitespace out and then make sure that if there are less than 6 digits in the second group, insert 0's starting at the second group until a count of 6 is reached, such that the result is:

Cars
JAXT450678
KYXS056746
LMOP001456
...

Any help is appreciated. I have tried playing around with .replace and .sub and can get it to replace the entire match, but I don't know how to reference the group and have it dynamically match.

tdf = tdf.replace(r'(?=[a-zA-Z]{4}\\s*\\d{1,6}\\b)([a-zA-Z]{4})(\\s?\\d+)','000000', regex = True)

Answer 1

You may use

df['Cars'] = df['Cars'].str.replace(r'^([a-zA-Z]{4})\s*(\d{1,6})$', lambda x: "{}{}".format(x.group(1), x.group(2).zfill(6)))

Details

^ - matches the start of a string
([a-zA-Z]{4}) - Group 1: four letters
\\s* - 0+ whitespaces
(\\d{1,6}) - Group 2: 1 to 6 digits
$ - end of string.

The lambda x: "{}{}".format(x.group(1), x.group(2).zfill(6)) callable concats Group 1 value and Group 2 value that is padded with zeros up to 6 positions.

Since \\s* is out of parentheses, the whitespaces matched with this pattern are omitted from the result.

Answer 2

Use replace to replace white space and zfill to fill number part of the string,

df['Cars'].str.replace(' ', '').apply(lambda x: x[:4] + x[4:].zfill(6))

0    JAXT450678
1    KYXS056746
2    LMOP001456

Answer 3

Not quite a one-liner, but you can avoid apply :

s = df.Cars.str.slice(4).str.strip().str.zfill(6)
df.Cars.str.slice(0,4) + s

Output:

0    JAXT450678
1    KYXS056746
2    LMOP001456
Name: Cars, dtype: object

Match capture group to given pattern in pandas column

Question

3 answers

solution1
2 ACCPTED 2019-05-27 19:26:32

solution2
2 2019-05-27 19:26:43

solution3
1 2019-05-27 19:37:33

Match capture group to given pattern in pandas column

Question

3 answers

solution1 2 ACCPTED 2019-05-27 19:26:32

solution2 2 2019-05-27 19:26:43

solution3 1 2019-05-27 19:37:33

solution1
2 ACCPTED 2019-05-27 19:26:32

solution2
2 2019-05-27 19:26:43

solution3
1 2019-05-27 19:37:33