简体   繁体   中英

How to drop rows from pandas dataframe based on condition that the data type contained is float?

I am working with a dataframe. I am aware that you could do something like:

dataframe[dataframe["column_name"] :  some condition]

But what I would like is something like:

 dataframe[type(dataframe["column_name"]) == float ]

For instance if we had the following dataset:

A    B    C    D
1    2    3    4
5    6         4
7    2    3    2
1    2    3    4

Then, I would like to remove the second row, because under column C of row2 the value is either missing, or is not a number(indicating the value is missing.)

But the way I tried it isn't working. And I get the following error. Can someone please help?

Warning (from warnings module):
  File "/Users/oishikachaudhury/Desktop/NYU/Risk Econ/Week 6/Hourly/trial.py", line 1
    import matplotlib.pyplot as plt
DtypeWarning: Columns (9,15,20,27,33,34,35,36,38,39,60) have mixed types.Specify dtype option on import or set low_memory=False.
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: False

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/oishikachaudhury/Desktop/NYU/Risk Econ/Week 6/Hourly/trial.py", line 8, in <module>
    dewpoint = fileObj[type(fileObj["HourlyDewPointTemperature"]) == float]
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py", line 2800, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2648, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: False

You would want something like:

import numpy as np, pandas as pd
df1 = pd.DataFrame({
                   "B":[5, 2, 54, 3, 2], 
                   "C":[20, 16, np.nan, 3, 8], 
                   "D":[14, 3, 17, 2, 6]}) 
df1.loc[df1.isna().apply(sum,axis=1) == 0]

Output:

   B     C   D
0  5  20.0  14
1  2  16.0   3
3  3   3.0   2
4  2   8.0   6

Since OP is seeking to drop rows of float type, and not columns, here is a solution to do that:

df = pd.DataFrame({'A':['a', 'b', 'c', 'd'],'B': ['e', 'f', 1.2, 'g'], 'C': ["asdf",3.2,"s","d"]})

# Setup list of rows to keep
keeprows=[]

# Loop through each row in DF
for idx,row in enumerate(df.iterrows()):
    validcols = 0 # Count number of columns without float types
    for val in list(row[1]):
        if not type(val) == float:
            validcols+=1 # add one to column counter if value not float type
    if validcols != len(df.columns):
        continue
    else:
        keeprows.append(row[1]) # if all cols are not float, append to keep list

filtered = pd.concat(keeprows, axis = 1)
print(filtered)

This gives:

    A   B   C
0   a   e   asdf
3   d   g   d

Compared to the original dataframe:

    A   B   C
0   a   e   asdf
1   b   f   3.2
2   c   1.2 s
3   d   g   d

This is unfortunately verbose and slow (since it loops over every row), and can likely be improved.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM