python替换没有空格的正则表达式匹配

Question

I basically want to 'join' numbers that should clearly go together.我基本上想“加入”应该清楚地在一起的数字。 I want to replace the regex match with itself but without any spaces.我想用它自己替换正则表达式匹配，但没有任何空格。

I have:我有：

df
               a
'Fraxiparine 9 500 IU (anti-Xa)/1 ml'
'Colobreathe 1 662 500 IU inhalačný prášok v tvrdej kapsule'

I want to have:我希望有：

df
               a
'Fraxiparine 9500 IU (anti-Xa)/1 ml'
'Colobreathe 1662500 IU inhalačný prášok v tvrdej kapsule'

I'm using r'\d+\s+\d+\s*\d+' to match the numbers, and I've created the following function to remove the spaces within the string:我正在使用r'\d+\s+\d+\s*\d+'来匹配数字，并且我创建了以下函数来删除字符串中的空格：

def spaces(x):
    match = re.findall(r'\d+\s+\d+\s*\d+', x)
    return match.replace(" ","")

Now I'm having trouble applying that function to the full dataframe, but I also don't know exactly how to replace the original match with the string without any spaces.现在我无法将该函数应用于完整的数据帧，但我也不知道如何用没有任何空格的字符串替换原始匹配。

Answer 1

Try using the following code:尝试使用以下代码：

def spaces(s):
    return re.sub('(?<=\d) (?=\d)', '', s)

df['a'] = df['a'].apply(spaces)

The regex will match:正则表达式将匹配：

any space任何空间
preceeded by a digit (?<=\d)前面有一个数字(?<=\d)
and followed by a digit (?=\d) .后跟一个数字(?=\d) 。

Then, the pandas.Series.apply function will apply your function to all rows of your dataframe.然后， pandas.Series.apply函数会将您的函数应用于数据框的所有行。

Output:输出：

0   Fraxiparine 9500 IU (anti-Xa)/1 ml
1   Colobreathe 1662500 IU inhalačný prášok v tvrd...

Answer 2

I believe that your problem can be solved by tweaking a bit your function in order to be applied on the whole string 'match' as follows :我相信您的问题可以通过稍微调整您的函数来解决，以便应用于整个字符串“匹配”，如下所示：

import pandas as pd
import re

df = pd.DataFrame({'a' : ['Fraxiparine 9 500 IU (anti-Xa)/1 ml','Colobreathe 1 662 500 IU inhalačný prášok v tvrdej kapsule']})

# your function
def spaces(x):
    match = re.findall(r'\d+\s+\d+\s*\d+', x)
    replace_with = match[0].replace(" ","")
    return x.replace(match[0], replace_with)

# now apply it on the whole dataframe, row per row
df['a'] = df['a'].apply(lambda x: spaces(x))

Answer 3

Use利用

df['a'] = df['a'].str.replace(r'(?<=\d)\s+(?=\d)', '', regex=True)

EXPLANATION解释

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (?<=                     look behind to see if there is:
--------------------------------------------------------------------------------
    \d                       digits (0-9)
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and &quot; &quot;) (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    \d                       digits (0-9)
--------------------------------------------------------------------------------
  )                        end of look-ahead

If your plan is to remove spaces only in \d+\s+\d+\s*\d+ :如果您的计划是仅删除\d+\s+\d+\s*\d+中的空格：

df['a'] = df['a'].str.replace(r'\d+\s+\d+\s*\d+', lambda m: re.sub(r'\s+', '', m.group()), regex=True)

See str.replace :见str.replace ：

repl : str or callable repl : str 或可调用
Replacement string or a callable.替换字符串或可调用对象。 The callable is passed the regex match object and must return a replacement string to be used.可调用对象传递正则表达式匹配对象，并且必须返回要使用的替换字符串。 See re.sub().参见 re.sub()。

python替换没有空格的正则表达式匹配

问题描述

3 个解决方案

解决方案1
0 已采纳 2022-06-02 15:30:11

解决方案2
0 2022-06-02 15:40:02

解决方案3
0 2022-06-02 21:17:19

python替换没有空格的正则表达式匹配

问题描述

3 个解决方案

解决方案1 0 已采纳 2022-06-02 15:30:11

解决方案2 0 2022-06-02 15:40:02

解决方案3 0 2022-06-02 21:17:19

解决方案1
0 已采纳 2022-06-02 15:30:11

解决方案2
0 2022-06-02 15:40:02

解决方案3
0 2022-06-02 21:17:19