I have a DataFrame 'tdf' with a column "Cars" where the values are 4 letters and then 1-6 numbers.
Cars
JAXT450678
KYXS 56746
LMOP01456
...
I have compiled the regex to match it:
r'(?=[a-zA-Z]{4}\\s*\\d{1,6}\\b)([a-zA-Z]{4})(\\s?\\d+)'
What I want to do is strip the whitespace out and then make sure that if there are less than 6 digits in the second group, insert 0's starting at the second group until a count of 6 is reached, such that the result is:
Cars
JAXT450678
KYXS056746
LMOP001456
...
Any help is appreciated. I have tried playing around with .replace
and .sub
and can get it to replace the entire match, but I don't know how to reference the group and have it dynamically match.
tdf = tdf.replace(r'(?=[a-zA-Z]{4}\\s*\\d{1,6}\\b)([a-zA-Z]{4})(\\s?\\d+)','000000', regex = True)
You may use
df['Cars'] = df['Cars'].str.replace(r'^([a-zA-Z]{4})\s*(\d{1,6})$', lambda x: "{}{}".format(x.group(1), x.group(2).zfill(6)))
Details
^
- matches the start of a string ([a-zA-Z]{4})
- Group 1: four letters \\s*
- 0+ whitespaces (\\d{1,6})
- Group 2: 1 to 6 digits $
- end of string. The lambda x: "{}{}".format(x.group(1), x.group(2).zfill(6))
callable concats Group 1 value and Group 2 value that is padded with zeros up to 6 positions.
Since \\s*
is out of parentheses, the whitespaces matched with this pattern are omitted from the result.
Use replace to replace white space and zfill to fill number part of the string,
df['Cars'].str.replace(' ', '').apply(lambda x: x[:4] + x[4:].zfill(6))
0 JAXT450678
1 KYXS056746
2 LMOP001456
Not quite a one-liner, but you can avoid apply
:
s = df.Cars.str.slice(4).str.strip().str.zfill(6)
df.Cars.str.slice(0,4) + s
Output:
0 JAXT450678
1 KYXS056746
2 LMOP001456
Name: Cars, dtype: object
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.