I want to replace some characters within a string in pandas (based on a match to the entirety of the string), while leaving the rest of the string unchanged.
For instance, replace dashes with decimals in a number string IF the dash isn't at the start of the number string:
'26.15971' -> '26.15971'
'1030899' -> '1030899'
'26-404700' -> '26.404700'
'-26-403268' -> '-26.403268'
Code:
# --- simple dataframe
df = pd.DataFrame({'col1':['26.15971','1030899','26-404700']})
# --- regex that only matches items of interest
regex_match = '^\d{1,2}-\d{1,8}'
df.col1.str.match(regex_match)
# --- not sure how to only replace the middle hypens?
# something like df.col1.str.replace('^\d{1,2}(-)\d{1,8}','^\d{1,2}\.\d{1,8}') ??
# unclear how to get a repl that only alters a capture group and leaves the rest
# of the string unchanged
You could try using a regex replacement with lookarounds:
df["col1"] = df["col1"].str.replace("(?<=\d)-(?=\d)", ".")
The regex pattern (?<=\\d)-(?=\\d)
targets every dash sitting in between two numbers and replaces it with dot.
We could also approach this using capture groups:
df["col1"] = df["col1"].str.replace("(\d{2,3})-(\d{4,8})", "\\1.\\2")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.