用熊貓刪除一列中的非數字行

Question

有一個如下所示的數據框，它有一個不干凈的列“id”，它應該是數字列

id, name
1,  A
2,  B
3,  C
tt, D
4,  E
5,  F
de, G

是否有一種簡潔的方法來刪除行，因為 tt 和 de 不是數值

tt,D
de,G

使數據框干凈？

id, name
1,  A
2,  B
3,  C
4,  E
5,  F

Answer 1

使用pd.to_numeric

In [1079]: df[pd.to_numeric(df['id'], errors='coerce').notnull()]
Out[1079]:
  id  name
0  1     A
1  2     B
2  3     C
4  4     E
5  5     F

Answer 2

您可以使用字符串isnumeric的標准方法並將其應用於id列中的每個值：

import pandas as pd
from io import StringIO

data = """
id,name
1,A
2,B
3,C
tt,D
4,E
5,F
de,G
"""

df = pd.read_csv(StringIO(data))

In [55]: df
Out[55]: 
   id name
0   1    A
1   2    B
2   3    C
3  tt    D
4   4    E
5   5    F
6  de    G

In [56]: df[df.id.apply(lambda x: x.isnumeric())]
Out[56]: 
  id name
0  1    A
1  2    B
2  3    C
4  4    E
5  5    F

或者，如果您想使用id作為索引，您可以這樣做：

In [61]: df[df.id.apply(lambda x: x.isnumeric())].set_index('id')
Out[61]: 
   name
id     
1     A
2     B
3     C
4     E
5     F

編輯。添加時間

盡管pd.to_numeric的情況沒有使用apply方法，但它幾乎比對str列應用np.isnumeric慢兩倍。 我還添加了使用 pandas str.isnumeric的選項，該選項比使用pd.to_numeric輸入更少，而且速度更快。 但是pd.to_numeric更通用，因為它可以處理任何數據類型（不僅僅是字符串）。

df_big = pd.concat([df]*10000)

In [3]: df_big = pd.concat([df]*10000)

In [4]: df_big.shape
Out[4]: (70000, 2)

In [5]: %timeit df_big[df_big.id.apply(lambda x: x.isnumeric())]
15.3 ms ± 2.02 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [6]: %timeit df_big[df_big.id.str.isnumeric()]
20.3 ms ± 171 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [7]: %timeit df_big[pd.to_numeric(df_big['id'], errors='coerce').notnull()]
29.9 ms ± 682 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Answer 3

鑒於df是您的數據框，

import numpy as np
df[df['id'].apply(lambda x: isinstance(x, (int, np.int64)))]

它所做的是將id列中的每個值傳遞給isinstance函數並檢查它是否為int 。 然后它返回一個布爾數組，最后只返回存在True的行。

如果您還需要考慮float值，另一種選擇是：

import numpy as np
df[df['id'].apply(lambda x: type(x) in [int, np.int64, float, np.float64])]

請注意，任何一種方式都不是就地的，因此您需要將其重新分配給原始 df，或創建一個新的：

df = df[df['id'].apply(lambda x: type(x) in [int, np.int64, float, np.float64])]
# or
new_df = df[df['id'].apply(lambda x: type(x) in [int, np.int64, float, np.float64])]

Answer 4

當x為float類型時， x.isnumeric()不會測試返回True 。

過濾掉可以轉換為float的值的一種方法：

df[df['id'].apply(lambda x: is_float(x))]

def is_float(x):
    try:
        float(x)
    except ValueError:
        return False
    return True

Answer 5

這個怎么樣？ .str訪問器是我的最愛之一 :)

import pandas as pd


df = pd.DataFrame(
    {
        'id':   {0: '1', 1: '2', 2: '3', 3: 'tt', 4: '4', 5: '5', 6: 'de'},
        'name': {0: 'A', 1: 'B', 2: 'C', 3: 'D',  4: 'E', 5: 'F', 6: 'G'}
    }
)

df_clean = df[df.id.str.isnumeric()]

補充 (2021-06-22)

如果id包含某種令人頭疼的東西（例如float 、 None 、 nan ），您可以使用astype('str')將它們強制轉換為str數據類型。

import numpy as np
import pandas as pd


df = pd.DataFrame(
    {
        'id':   {0: '1', 1: '2', 2: '3', 3: 3.14, 4: '4', 5: '5', 6: None, 7: np.nan},
        'name': {0: 'A', 1: 'B', 2: 'C', 3: 'D',  4: 'E', 5: 'F', 6: 'G',  7: 'H'}
    }
)

df_clean = df[df.id.astype('str').str.isnumeric()]

原始的，但它仍然有效。

Answer 6

這是一種動態方法，僅適用於 int64 和 float 64，如果您的數據框中有其他數字數據類型，請確保將它們添加到 if 語句

# make dataframe of column data types
col_types = df.dtypes.to_frame()
col_types.columns = ['dtype']

#make list of zeros
drop_it = [0]*col_types.shape[0]
k = 0

#make it a one if the data isn't numeric
#if you have other numeric types you need to add them to if statement
for t in col_types.dtype:
    if t != 'int64' and t != 'float64':
        drop_it[k] = 1
    k = k + 1

#delete types from drop list that aren't numeric
col_types['drop_it'] = drop_it
col_types = col_types.loc[col_types["drop_it"] == 1]

#finally drop columns that are in drop list
for col_to_drop in col_types.index.values.tolist():
    df = df.drop([col_to_drop], axis = 1)

Answer 7

另一種選擇是使用query方法：

In [5]: df.query('id.str.isnumeric()')
Out[5]: 
  id  name
0  1     A
1  2     B
2  3     C
4  4     E
5  5     F

用熊貓刪除一列中的非數字行

問題描述

7 個解決方案

解決方案1
98 2017-10-04 20:08:46

解決方案2
48 已采納 2015-11-28 19:53:08

編輯。添加時間

解決方案3
14 2015-11-27 16:10:11

解決方案4
2 2019-10-01 08:24:59

解決方案5
2 2020-11-15 01:48:52

解決方案6
0

解決方案7
0 2022-05-24 19:28:44

用熊貓刪除一列中的非數字行

問題描述

7 個解決方案

解決方案1 98 2017-10-04 20:08:46

解決方案2 48 已采納 2015-11-28 19:53:08

編輯。 添加時間

解決方案3 14 2015-11-27 16:10:11

解決方案4 2 2019-10-01 08:24:59

解決方案5 2 2020-11-15 01:48:52

解決方案6 0

解決方案7 0 2022-05-24 19:28:44

解決方案1
98 2017-10-04 20:08:46

解決方案2
48 已采納 2015-11-28 19:53:08

編輯。添加時間

解決方案3
14 2015-11-27 16:10:11

解決方案4
2 2019-10-01 08:24:59

解決方案5
2 2020-11-15 01:48:52

解決方案6
0

解決方案7
0 2022-05-24 19:28:44