简体   繁体   English

如何在列中的单词和标点符号之间添加空格?

[英]How to add space between words and punctuation in a column?

I have a column (string) in a dataframe with multiple spaces between words and punctuation.我在 dataframe 中有一个列(字符串),单词和标点符号之间有多个空格。
I need to:我需要:

  1. Add space between punctuation在标点之间添加空格
  2. Remove duplicated spaces删除重复的空格

Punctuation I am looking for is /+- .我正在寻找的标点符号是/+-

My dataframe:我的 dataframe:

col A
'this/is a+ string'
'this+is+a    string'

The output I expect: output 我期望:

col B
'this / is a + string'
'this + is + a string'

The way I solved this is in two steps: first, add space between pontuation, then check to see if there are any continuous spaces.我解决这个问题的方法是分两步:首先,在 pontuation 之间添加空格,然后检查是否有任何连续的空格。 For first step I used a function called punctuation_space to pass as "repl" argument to re.sub() .第一步,我使用了一个名为punctuation_space的 function 作为“repl”参数传递给re.sub()

import re

def punctuation_space(match_obj):
    """ return whatever matched surrounded by spaces """

    return ' ' + match_obj.group() + ' '

def fn(string):

    # first step
    string = re.sub(r'[+/-]', punctuation_space, string)

    # second step
    return re.sub(r' {2,}', ' ', string)

To check the above code:要检查上面的代码:

import pandas as pd
original_col = ['this/is a+ string', 'this+is+a    string']

s = pd.Series(original_col)
print(s)
print(s.apply(fn))

Output: Output:

0      this/is a+ string
1    this+is+a    string
dtype: object
0    this / is a + string
1    this + is + a string
dtype: object

you can try:你可以试试:

df['col A'] = df['col A'].apply(lambda y: " ".join((re.sub(r'([+/-])', lambda x: ' ' + x.group()+' ' , y)).split()) , 1)

OR:或者:

df['col A'] = df['col A'].str.replace(r'([+/-])',  lambda x: ' ' + x.group()+' ', regex=True).apply(lambda x: ' '.join(x.split()))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM