如何在列中的单词和标点符号之间添加空格？

Question

I have a column (string) in a dataframe with multiple spaces between words and punctuation.我在 dataframe 中有一个列（字符串），单词和标点符号之间有多个空格。
I need to:我需要：

Add space between punctuation在标点之间添加空格
Remove duplicated spaces删除重复的空格

Punctuation I am looking for is /+- .我正在寻找的标点符号是/+- 。

My dataframe:我的 dataframe：

col A
'this/is a+ string'
'this+is+a    string'

The output I expect: output 我期望：

col B
'this / is a + string'
'this + is + a string'

Answer 1

The way I solved this is in two steps: first, add space between pontuation, then check to see if there are any continuous spaces.我解决这个问题的方法是分两步：首先，在 pontuation 之间添加空格，然后检查是否有任何连续的空格。 For first step I used a function called punctuation_space to pass as "repl" argument to re.sub() .第一步，我使用了一个名为punctuation_space的 function 作为“repl”参数传递给re.sub() 。

import re

def punctuation_space(match_obj):
    """ return whatever matched surrounded by spaces """

    return ' ' + match_obj.group() + ' '

def fn(string):

    # first step
    string = re.sub(r'[+/-]', punctuation_space, string)

    # second step
    return re.sub(r' {2,}', ' ', string)

To check the above code:要检查上面的代码：

import pandas as pd
original_col = ['this/is a+ string', 'this+is+a    string']

s = pd.Series(original_col)
print(s)
print(s.apply(fn))

Output: Output：

0      this/is a+ string
1    this+is+a    string
dtype: object
0    this / is a + string
1    this + is + a string
dtype: object

Answer 2

you can try:你可以试试：

df['col A'] = df['col A'].apply(lambda y: " ".join((re.sub(r'([+/-])', lambda x: ' ' + x.group()+' ' , y)).split()) , 1)

OR:或者：

df['col A'] = df['col A'].str.replace(r'([+/-])',  lambda x: ' ' + x.group()+' ', regex=True).apply(lambda x: ' '.join(x.split()))

如何在列中的单词和标点符号之间添加空格？

问题描述

2 个解决方案

解决方案1
0 2021-05-26 22:55:44

解决方案2
0 2021-05-27 04:46:44

如何在列中的单词和标点符号之间添加空格？

问题描述

2 个解决方案

解决方案1 0 2021-05-26 22:55:44

解决方案2 0 2021-05-27 04:46:44

解决方案1
0 2021-05-26 22:55:44

解决方案2
0 2021-05-27 04:46:44