[英]Python pandas extract, how to extract remaining part of string
I have looked for hours and this should be simple. 我已经看了几个小时,这应该很简单。 I am trying to extract all the letters from a string with a mixture or digits and letters.
我试图从带有混合或数字和字母的字符串中提取所有字母。 Here is an example:
这是一个例子:
df = pd.Series(['ENGLANDSR11SW'])
df = df.to_frame('column')
df['ValueAfterExtract'] = df['column'].str.extract("(?P<letter>[a-zA-Z]+)")
print(df)
From the string value ENGLANDSR11SW
in the dataframe, the result is ENGLANDSR
but i want to bring even the last letters of the string which is the SW
which should result in ENGLANDSRSW
, meaning only the digits 11
would be removed. 从数据帧中的字符串值
ENGLANDSR11SW
中,结果是ENGLANDSR
但是我想带上字符串的最后一个字母,即SW
,这将导致ENGLANDSRSW
,这意味着只删除了数字11
。
How can i do this? 我怎样才能做到这一点?
Replace all digits ( \\d
) with empty strings: 用空字符串替换所有数字(
\\d
):
In [6]: df['column'].str.replace(r'\d', '')
Out[10]:
0 ENGLANDSRSW
Name: column, dtype: object
Or, to remove everything which is not in [a-zA-Z]
use the regexp [^a-zA-Z]
. 或者,要删除不在
[a-zA-Z]
使用regexp [^a-zA-Z]
。 This would remove, for instance, whitespace and punctuation marks as well as digits: 例如,这将删除空格和标点符号以及数字:
In [20]: df['column'].str.replace(r'[^a-zA-Z]', '')
Out[20]:
0 ENGLANDSRSW
Name: column, dtype: object
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.