I am working with a dataframe. I am aware that you could do something like:
dataframe[dataframe["column_name"] : some condition]
But what I would like is something like:
dataframe[type(dataframe["column_name"]) == float ]
For instance if we had the following dataset:
A B C D
1 2 3 4
5 6 4
7 2 3 2
1 2 3 4
Then, I would like to remove the second row, because under column C of row2 the value is either missing, or is not a number(indicating the value is missing.)
But the way I tried it isn't working. And I get the following error. Can someone please help?
Warning (from warnings module):
File "/Users/oishikachaudhury/Desktop/NYU/Risk Econ/Week 6/Hourly/trial.py", line 1
import matplotlib.pyplot as plt
DtypeWarning: Columns (9,15,20,27,33,34,35,36,38,39,60) have mixed types.Specify dtype option on import or set low_memory=False.
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: False
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/oishikachaudhury/Desktop/NYU/Risk Econ/Week 6/Hourly/trial.py", line 8, in <module>
dewpoint = fileObj[type(fileObj["HourlyDewPointTemperature"]) == float]
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py", line 2800, in __getitem__
indexer = self.columns.get_loc(key)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2648, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: False
You would want something like:
import numpy as np, pandas as pd
df1 = pd.DataFrame({
"B":[5, 2, 54, 3, 2],
"C":[20, 16, np.nan, 3, 8],
"D":[14, 3, 17, 2, 6]})
df1.loc[df1.isna().apply(sum,axis=1) == 0]
Output:
B C D
0 5 20.0 14
1 2 16.0 3
3 3 3.0 2
4 2 8.0 6
Since OP is seeking to drop rows of float type, and not columns, here is a solution to do that:
df = pd.DataFrame({'A':['a', 'b', 'c', 'd'],'B': ['e', 'f', 1.2, 'g'], 'C': ["asdf",3.2,"s","d"]})
# Setup list of rows to keep
keeprows=[]
# Loop through each row in DF
for idx,row in enumerate(df.iterrows()):
validcols = 0 # Count number of columns without float types
for val in list(row[1]):
if not type(val) == float:
validcols+=1 # add one to column counter if value not float type
if validcols != len(df.columns):
continue
else:
keeprows.append(row[1]) # if all cols are not float, append to keep list
filtered = pd.concat(keeprows, axis = 1)
print(filtered)
This gives:
A B C
0 a e asdf
3 d g d
Compared to the original dataframe:
A B C
0 a e asdf
1 b f 3.2
2 c 1.2 s
3 d g d
This is unfortunately verbose and slow (since it loops over every row), and can likely be improved.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.