python - 使用 re.sub 删除两个字符之间的空格

Question

I have a pair of columns, like so:我有一对列，如下所示：

x = ["a b williams", "e g", "z z specialists"]
y = ["j j winston", "hb d party supplies", "t t ice cream"]
df = pd.DataFrame(x,y)

I would like to be able to remove the white space between two single characters using re.sub .我希望能够使用re.sub删除两个单个字符之间的空格。 I have tried the following:我尝试了以下方法：

re.sub("(?<=\\w\\b)"\\s"(?=\\w\\b)", "", df)

However, when I run the code, I get the following error.但是，当我运行代码时，出现以下错误。

SyntaxError: unexpected character after line continuation character

I'm unsure of what I am doing wrong.我不确定我做错了什么。 The desired result is:期望的结果是：

jj winston             ab williams
hb d party supplies              eg
tt ice cream           zz specialists

Please advise.请指教。 Any advice is appreciated.任何建议表示赞赏。

Answer 1

You can use您可以使用

(?<=\b[^\W\d_])\s(?=[^\W\d_]\b)
(?<=\b\w)\s(?=\w\b)

See the regex demo .请参阅正则表达式演示。 Note the [^\W\d_] pattern matches any Unicode letter in Python re .请注意[^\W\d_]模式匹配 Python re中的任何 Unicode 字母。 \w matches Unicode letters, digits, _ and some diacritics and other connector punctuation. \w匹配 Unicode 字母、数字、 _和一些变音符号和其他连接符标点符号。

Details细节

(?<=\b[^\W\d_]) - a positive lookbehind that matches a location that is immediately preceded with a single letter as a whole word (as it is prepended with a word boundary) (?<=\b[^\W\d_]) - 一个正向的后视，它匹配一个紧接在一个字母前面的位置作为一个完整的单词（因为它前面有一个单词边界）
\s - a whitespace char \s - 一个空白字符
(?=[^\W\d_]\b) - a positive lookahead that matches a location that is immediately followed with a single letter as a whole word (as it is followed with a word boundary). (?=[^\W\d_]\b) - 一个正向前瞻，它匹配一个紧跟一个字母作为整个单词的位置（因为它后面跟着一个单词边界）。

Here is a Pandas demo:这是一个 Pandas 演示：

x = ["a b williams", "e g", "z z specialists"]
y = ["j j winston", "h d party supplies", "t t ice cream"]
df = pd.DataFrame(x,y)
rx = r'(?<=\b[^\W\d_])\s(?=[^\W\d_]\b)'
df.index = df.index.to_series().replace(rx, '', regex=True)
df = df.replace(rx, '', regex=True)
# => df
#                                 0
# jj winston            ab williams
# hd party supplies              eg
# tt ice cream       zz specialists

As DataFrame.replace with regex=True does not touch the index column, it must be handled separately, hence the df.index = df.index.to_series().replace(rx, '', regex=True) line of code is added.由于DataFrame.replace with regex=True不涉及索引列，因此必须单独处理，因此df.index = df.index.to_series().replace(rx, '', regex=True)行代码是添加。

Answer 2

Your regex is pretty close to the required and can be slightly modified as follows:您的正则表达式非常接近要求，可以稍作修改，如下所示：

r'(?<=\b\w)(\s)(?=\w\b)'

Note to use the raw quote r'...' so that you don't need double \ for in the regex.请注意使用原始引号 r'...' 以便您在正则表达式中不需要双 \ for。

Regex Demo正则表达式演示

Better compile the regex to speed up the processing as it is used multiple times更好地编译正则表达式以加快处理速度，因为它被多次使用

pattern = re.compile(r'(?<=\b\w)(\s)(?=\w\b)')

Then reuse your codes:然后重用您的代码：

x = ["a b williams", "e g", "z z specialists"]
y = ["j j winston", "h d party supplies", "t t ice cream"]
df = pd.DataFrame(x,y)

Convert the index:转换索引：

df.index = df.index.to_series().str.replace(pattern, '')

Convert the data column:转换数据列：

df[0] = df[0].str.replace(pattern, '')

Explanation of your errors:您的错误解释：

You cannot use re.sub directly on the whole pandas DataFrame不能在整个 pandas DataFrame 上直接使用 re.sub
Your regex contains 4 quotation marks " where the 2nd " ends the regex and so the subsequent portion of regex is treated as continuation line by the \ mark and the characters after it was considered invalid after continuation line您的正则表达式包含 4 个引号“其中第二个”结束正则表达式，因此正则表达式的后续部分被 \ 标记视为续行，并且在续行之后被视为无效的字符

Answer 3

Using re.sub , I suggest the following:使用re.sub ，我建议如下：

# your lists    
x = ["a b williams", "e g", "z z specialists"]
y = ["j j winston", "hb d party supplies", "t t ice cream"]

# replacements
x = [re.sub(r'(\b\w)(\s)(\w\b)', r'\1\3', el) for el in x]
y = [re.sub(r'(\b\w)(\s)(\w\b)', r'\1\3', el) for el in y]

# pd dataframe after the process
df = pd.DataFrame(x,y)

python - 使用 re.sub 删除两个字符之间的空格

问题描述

3 个解决方案

解决方案1
1 已采纳 2021-01-28 20:55:27

解决方案2
0 2021-01-28 21:15:46

解决方案3
0 2021-01-28 22:45:34

python - 使用 re.sub 删除两个字符之间的空格

问题描述

3 个解决方案

解决方案1 1 已采纳 2021-01-28 20:55:27

解决方案2 0 2021-01-28 21:15:46

解决方案3 0 2021-01-28 22:45:34

解决方案1
1 已采纳 2021-01-28 20:55:27

解决方案2
0 2021-01-28 21:15:46

解决方案3
0 2021-01-28 22:45:34