[英]Remove single letters from strings in Pandas dataframe
I have a DataFrame where a column is filled with strings. 我有一个DataFrame,其中一列填充了字符串。 I want to remove any appearance of single letters from the column. 我想删除列中任何单个字母的外观。 So far, I have tried: 到目前为止,我尝试过:
df['STRI'] = df['STRI'].map(lambda x: " ".join(x.split() if len(x) >1)
I wish to input ABCD X WYZ
and get ABCD WYZ
. 我想输入ABCD X WYZ
并获得ABCD WYZ
。
You can use str.replace
and regex. 您可以使用str.replace
和regex。 The pattern \\b\\w\\b
will replace any single word character with a word boundary. 模式\\b\\w\\b
将用单词边界替换任何单个单词字符。 See working example below: 见下面的工作示例:
Example using series: 使用系列的示例:
s = pd.Series(['Katherine','Katherine and Bob','Katherine I','Katherine', 'Robert', 'Anne', 'Fred', 'Susan', 'other'])
s.str.replace(r'\b\w\b','').str.replace(r'\s+', ' ')
0 Katherine
1 Katherine and Bob
2 Katherine
3 Katherine
4 Robert
5 Anne
6 Fred
7 Susan
8 other
dtype: object
Another example with your test data: 测试数据的另一个例子:
s = pd.Series(['ABCD','X','WYZ'])
0 ABCD
1 X
2 WYZ
dtype: object
s.str.replace(r'\b\w\b','').str.replace(r'\s+', ' ')
0 ABCD
1
2 WYZ
dtype: object
With your data it is: 您的数据是:
df['STRI'].str.replace(r'\b\w\b','').str.replace(r'\s+', ' ')
Try this: 尝试这个:
df['STRI'] = npi['STRI'].str.replace(r'\b\w\b', '').str.replace(r'\s+', ' ')
Eg: 例如:
import pandas as pd
df = pd.DataFrame(data=['X ABCD X X WEB X'], columns=['c1'])
print df, '\n'
df.c1 = df.c1.str.replace(r'\b\w\b', '').str.replace(r'\s+', ' ')
print df
Output: 输出:
c1
0 X ABCD X X WEB X
c1
0 ABCD WEB
list comprehension 列表理解
[
' '.join([i for i in s.split() if len(i) > 1])
for s in npi.STRI.values.tolist()
]
str.split
s = npi.STRI.str.split(expand=True).stack()
s[s.str.len() > 1].groupby(level=0).apply(' '.join)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.