[英]How to add space between words and punctuation in a column?
I have a column (string) in a dataframe with multiple spaces between words and punctuation.我在 dataframe 中有一个列(字符串),单词和标点符号之间有多个空格。
I need to:我需要:
Punctuation I am looking for is /+-
.我正在寻找的标点符号是
/+-
。
My dataframe:我的 dataframe:
col A
'this/is a+ string'
'this+is+a string'
The output I expect: output 我期望:
col B
'this / is a + string'
'this + is + a string'
The way I solved this is in two steps: first, add space between pontuation, then check to see if there are any continuous spaces.我解决这个问题的方法是分两步:首先,在 pontuation 之间添加空格,然后检查是否有任何连续的空格。 For first step I used a function called
punctuation_space
to pass as "repl" argument to re.sub()
.第一步,我使用了一个名为
punctuation_space
的 function 作为“repl”参数传递给re.sub()
。
import re
def punctuation_space(match_obj):
""" return whatever matched surrounded by spaces """
return ' ' + match_obj.group() + ' '
def fn(string):
# first step
string = re.sub(r'[+/-]', punctuation_space, string)
# second step
return re.sub(r' {2,}', ' ', string)
To check the above code:要检查上面的代码:
import pandas as pd
original_col = ['this/is a+ string', 'this+is+a string']
s = pd.Series(original_col)
print(s)
print(s.apply(fn))
Output: Output:
0 this/is a+ string
1 this+is+a string
dtype: object
0 this / is a + string
1 this + is + a string
dtype: object
you can try:你可以试试:
df['col A'] = df['col A'].apply(lambda y: " ".join((re.sub(r'([+/-])', lambda x: ' ' + x.group()+' ' , y)).split()) , 1)
OR:或者:
df['col A'] = df['col A'].str.replace(r'([+/-])', lambda x: ' ' + x.group()+' ', regex=True).apply(lambda x: ' '.join(x.split()))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.