简体   繁体   中英

Can't pass iterable into pandas df to drop a row with a specific value

I am trying to load a bunch of csvs into a database and would like to get rid of any rows from these tables that have the value "-". I'm trying to do the same thing in the folllowing link but using an iterable instead of predetermined column as I don't know which tables and columns will have these values:

Deleting DataFrame row in Pandas based on column value

My code: dfs = {}

for doc in fList:
    i = "{}\\{}".format(path, doc)

    df = pd.read_csv(i)

    for col in df.columns:
        df = df[df.col != "-"]

This returns the following error:

AttributeError                            Traceback (most recent call last)
<ipython-input-291-43edac7a4ed7> in <module>()
      8     #print dfs
      9     for col in df:
---> 10         df = df[df.col != "-"]

C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   2968             if name in self._info_axis:
   2969                 return self[name]
-> 2970             return object.__getattribute__(self, name)
   2971 
   2972     def __setattr__(self, name, value):

AttributeError: 'DataFrame' object has no attribute 'col'

It seems that I cannot use the iterable in the loop. It would defeat the perpose of writing the script if I have to open each file and change the values. Is there anyway to loop through the tables and delete rows with the bad values?

You cannot dynamically access df 's column using a variable as you are trying, that leads to an AttributeError . Because the . will search for df's attribute col , and not df 's attribute <value in col> . There's a difference.

If you wanted to, you'd need the __getitem__ accessor; df[col] . However, you should avoid loopy solutions where you can. Here are a couple of alternatives.

Option 1
For your case, eq + any should suffice.

df = df[df.astype(str).eq('-').any(1)]                # `astype` conversion

Or,

df = df[df.select_dtypes(['object']).eq('-').any(1)]  # `select_dtypes`, thanks MaxU!

Option 2
Another option would be to use a na_values argument with read_csv , so when reading in your data, these values are converted to NaN, which you can drop.

df = pd.read_csv('file.csv', na_values=['-'])

And now, call dropna on your data -

df.dropna(inplace=True)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM