[英]delete rows containing numeric values in strings from pandas dataframe
I have a pandas data frame with 2 columns, type and text The text column contains string values. 我有一个包含2列的pandas数据框,类型和文本text列包含字符串值。 How can I delete rows which contains some numeric values in the text column.
如何删除文本列中包含一些数值的行。 eg:
例如:
`ABC 1.3.2`, `ABC12`, `2.2.3`, `ABC 12 1`
I have tried below, but get an error. 我在下面试过,但得到一个错误。 Any idea why this is giving error?
知道为什么会出错吗?
df.drop(df[bool(re.match('^(?=.*[0-9]$)', df['text'].str))].index)
In your case, I think it's better to use simple indexing rather than drop. 在你的情况下,我认为最好使用简单的索引而不是丢弃。 For example:
例如:
>>> df
text type
0 abc b
1 abc123 a
2 cde a
3 abc1.2.3 b
4 1.2.3 a
5 xyz a
6 abc123 a
7 9999 a
8 5text a
9 text a
>>> df[~df.text.str.contains(r'[0-9]')]
text type
0 abc b
2 cde a
5 xyz a
9 text a
That locates any rows with no numeric text 找到没有数字文本的任何行
To explain: 解释:
df.text.str.contains(r'[0-9]')
returns a boolean series of where there are any digits: 返回一个布尔系列,其中有任何数字:
0 False
1 True
2 False
3 True
4 True
5 False
6 True
7 True
8 True
9 False
and you can use this with the ~
to index your dataframe wherever that returns false 并且您可以使用
~
来索引数据帧,只要返回false
Data from jpp 来自jpp的数据
s[s.str.isalpha()]
Out[261]:
0 ABC
2 DEF
6 GHI
dtype: object
Assuming you define numeric as x.isdigit()
evaluating to True
, you can use any
with a generator expression and create a Boolean mask via pd.Series.apply
: 假设您将numeric定义为
x.isdigit()
为True
,您可以使用any
生成器表达式并通过pd.Series.apply
创建布尔掩码:
s = pd.Series(['ABC', 'ABC 1.3.2', 'DEF', 'ABC12', '2.2.3', 'ABC 12 1', 'GHI'])
mask = s.apply(lambda x: not any(i.isdigit() for i in x))
print(s[mask])
0 ABC
2 DEF
6 GHI
dtype: object
Well as I asked in the comment, what is your defintion of numeric. 正如我在评论中提到的那样,你对数字的定义是什么? If we follow python's
isnumeric
with split()
we get the following: 如果我们使用
split()
跟踪python的isnumeric
,我们得到以下内容:
import pandas as pd
import pandas as pd 将pandas导入为pd
df = pd.DataFrame({
'col1': ['ABC', 'ABC 1.3.2', 'DEF', 'ABC12', '2.2.3', 'ABC 12 1', 'GHI']
})
m1 = df['col1'].apply(lambda x: not any(i.isnumeric() for i in x.split()))
m2 = df['col1'].str.isalpha()
m3 = df['col1'].apply(lambda x: not any(i.isdigit() for i in x))
m4 = ~df['col1'].str.contains(r'[0-9]')
print(df.assign(hasnonumeric=m1,isalhpa=m2, isdigit=m3, contains=m4))
# Opting for hasnonumeric
df = df[m1]
prints: 打印:
col1 hasnonumeric isalhpa isdigit contains
0 ABC True True True True
1 ABC 1.3.2 True False False False
2 DEF True True True True
3 ABC12 True False False False
4 2.2.3 True False False False
5 ABC 12 1 False False False False
6 GHI True True True True
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.