简体   繁体   English

从pandas dataframe中删除包含字符串中数字值的行

[英]delete rows containing numeric values in strings from pandas dataframe

I have a pandas data frame with 2 columns, type and text The text column contains string values. 我有一个包含2列的pandas数据框,类型和文本text列包含字符串值。 How can I delete rows which contains some numeric values in the text column. 如何删除文本列中包含一些数值的行。 eg: 例如:

`ABC 1.3.2`, `ABC12`, `2.2.3`, `ABC 12 1`

I have tried below, but get an error. 我在下面试过,但得到一个错误。 Any idea why this is giving error? 知道为什么会出错吗?

df.drop(df[bool(re.match('^(?=.*[0-9]$)', df['text'].str))].index)

In your case, I think it's better to use simple indexing rather than drop. 在你的情况下,我认为最好使用简单的索引而不是丢弃。 For example: 例如:

>>> df
       text type
0       abc    b
1    abc123    a
2       cde    a
3  abc1.2.3    b
4     1.2.3    a
5       xyz    a
6    abc123    a
7      9999    a
8     5text    a
9      text    a


>>> df[~df.text.str.contains(r'[0-9]')]
   text type
0   abc    b
2   cde    a
5   xyz    a
9  text    a

That locates any rows with no numeric text 找到没有数字文本的任何行

To explain: 解释:

df.text.str.contains(r'[0-9]')

returns a boolean series of where there are any digits: 返回一个布尔系列,其中有任何数字:

0    False
1     True
2    False
3     True
4     True
5    False
6     True
7     True
8     True
9    False

and you can use this with the ~ to index your dataframe wherever that returns false 并且您可以使用~来索引数据帧,只要返回false

Data from jpp 来自jpp的数据

s[s.str.isalpha()]
Out[261]: 
0    ABC
2    DEF
6    GHI
dtype: object

Assuming you define numeric as x.isdigit() evaluating to True , you can use any with a generator expression and create a Boolean mask via pd.Series.apply : 假设您将numeric定义为x.isdigit()True ,您可以使用any生成器表达式并通过pd.Series.apply创建布尔掩码:

s = pd.Series(['ABC', 'ABC 1.3.2', 'DEF', 'ABC12', '2.2.3', 'ABC 12 1', 'GHI'])

mask = s.apply(lambda x: not any(i.isdigit() for i in x))

print(s[mask])

0    ABC
2    DEF
6    GHI
dtype: object

Well as I asked in the comment, what is your defintion of numeric. 正如我在评论中提到的那样,你对数字的定义是什么? If we follow python's isnumeric with split() we get the following: 如果我们使用split()跟踪python的isnumeric ,我们得到以下内容:

import pandas as pd

import pandas as pd 将pandas导入为pd

df = pd.DataFrame({
    'col1': ['ABC', 'ABC 1.3.2', 'DEF', 'ABC12', '2.2.3', 'ABC 12 1', 'GHI']
})

m1 = df['col1'].apply(lambda x: not any(i.isnumeric() for i in x.split()))
m2 = df['col1'].str.isalpha()
m3 = df['col1'].apply(lambda x: not any(i.isdigit() for i in x))
m4 = ~df['col1'].str.contains(r'[0-9]')

print(df.assign(hasnonumeric=m1,isalhpa=m2, isdigit=m3, contains=m4))

# Opting for hasnonumeric
df = df[m1]

prints: 打印:

        col1  hasnonumeric  isalhpa  isdigit  contains
0        ABC          True     True     True      True
1  ABC 1.3.2          True    False    False     False
2        DEF          True     True     True      True
3      ABC12          True    False    False     False
4      2.2.3          True    False    False     False
5   ABC 12 1         False    False    False     False
6        GHI          True     True     True      True

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如果匹配字符串超过某个值,则从熊猫数据框中删除行 - Delete Rows from a pandas dataframe if matching strings exceed a certain value 从熊猫数据框中选择包含某些值的行 - Select rows containing certain values from pandas dataframe "旋转包含字符串的 Pandas 数据框 - “没有要聚合的数字类型”错误" - Pivoting a Pandas Dataframe containing strings - 'No numeric types to aggregate' error 使用格式化为字符串的数字列表删除包含这些值的数据框中的行 - using list of numbers formatted as strings to delete the rows in a dataframe containing those values 从包含非常大的数值的字典创建熊猫数据框时发生溢出错误 - Overflow error when creating pandas dataframe from dictionaries containing very large numeric values 删除包含数字字符的行 - Delete rows containing Numeric characters 从pandas数据框的列中过滤数值 - Filter numeric values from a column of pandas dataframe 用连续的相应数值替换熊猫数据框中的多个重复字符串 - Substituting multiple repetitive strings in pandas dataframe with consecutive respective numeric values 仅选择数字字段,包括包含来自熊猫数据框的浮点数 - Select only numeric fields including containing floats from pandas dataframe 熊猫:基于包含某些值的字符串有效地对DataFrame进行子集化 - Pandas: Efficiently subset DataFrame based on strings containing certain values
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM