熊猫根据正则表达式替换字符串中的字符？

Question

I want to replace some characters within a string in pandas (based on a match to the entirety of the string), while leaving the rest of the string unchanged.我想替换 pandas 中字符串中的一些字符（基于与整个字符串的匹配），同时保持字符串的其余部分不变。

For instance, replace dashes with decimals in a number string IF the dash isn't at the start of the number string:例如，如果破折号不在数字字符串的开头，则用数字字符串中的小数替换破折号：

'26.15971' -> '26.15971' '26.15971' -> '26.15971'

'1030899' -> '1030899' '1030899' -> '1030899'

'26-404700' -> '26.404700' '26-404700' -> '26.404700'

'-26-403268' -> '-26.403268' '-26-403268' -> '-26.403268'

Code:代码：

# --- simple dataframe
df = pd.DataFrame({'col1':['26.15971','1030899','26-404700']})

# --- regex that only matches items of interest
regex_match = '^\d{1,2}-\d{1,8}'
df.col1.str.match(regex_match)

# --- not sure how to only replace the middle hypens?
# something like  df.col1.str.replace('^\d{1,2}(-)\d{1,8}','^\d{1,2}\.\d{1,8}') ??
# unclear how to get a repl that only alters a capture group and leaves the rest 
# of the string unchanged

Answer 1

You could try using a regex replacement with lookarounds:您可以尝试使用带有环视的正则表达式替换：

df["col1"] = df["col1"].str.replace("(?<=\d)-(?=\d)", ".")

The regex pattern (?<=\\d)-(?=\\d) targets every dash sitting in between two numbers and replaces it with dot.正则表达式(?<=\\d)-(?=\\d)以两个数字之间的每个破折号为目标，并将其替换为点。

We could also approach this using capture groups:我们也可以使用捕获组来解决这个问题：

df["col1"] = df["col1"].str.replace("(\d{2,3})-(\d{4,8})", "\\1.\\2")

熊猫根据正则表达式替换字符串中的字符？

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-10-08 02:48:58

熊猫根据正则表达式替换字符串中的字符？

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-10-08 02:48:58

解决方案1
1 已采纳 2020-10-08 02:48:58