Can't pass iterable into pandas df to drop a row with a specific value

Question

I am trying to load a bunch of csvs into a database and would like to get rid of any rows from these tables that have the value "-". I'm trying to do the same thing in the folllowing link but using an iterable instead of predetermined column as I don't know which tables and columns will have these values:

Deleting DataFrame row in Pandas based on column value

My code: dfs = {}

for doc in fList:
    i = "{}\\{}".format(path, doc)

    df = pd.read_csv(i)

    for col in df.columns:
        df = df[df.col != "-"]

This returns the following error:

AttributeError                            Traceback (most recent call last)
<ipython-input-291-43edac7a4ed7> in <module>()
      8     #print dfs
      9     for col in df:
---> 10         df = df[df.col != "-"]

C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   2968             if name in self._info_axis:
   2969                 return self[name]
-> 2970             return object.__getattribute__(self, name)
   2971 
   2972     def __setattr__(self, name, value):

AttributeError: 'DataFrame' object has no attribute 'col'

It seems that I cannot use the iterable in the loop. It would defeat the perpose of writing the script if I have to open each file and change the values. Is there anyway to loop through the tables and delete rows with the bad values?

Answer 1

You cannot dynamically access df 's column using a variable as you are trying, that leads to an AttributeError . Because the . will search for df's attribute col , and not df 's attribute <value in col> . There's a difference.

If you wanted to, you'd need the __getitem__ accessor; df[col] . However, you should avoid loopy solutions where you can. Here are a couple of alternatives.

Option 1
For your case, eq + any should suffice.

df = df[df.astype(str).eq('-').any(1)]                # `astype` conversion

Or,

df = df[df.select_dtypes(['object']).eq('-').any(1)]  # `select_dtypes`, thanks MaxU!

Option 2
Another option would be to use a na_values argument with read_csv , so when reading in your data, these values are converted to NaN, which you can drop.

df = pd.read_csv('file.csv', na_values=['-'])

And now, call dropna on your data -

df.dropna(inplace=True)

Can't pass iterable into pandas df to drop a row with a specific value

Question

1 answers

solution1
3 ACCPTED 2018-01-15 21:57:48

Can't pass iterable into pandas df to drop a row with a specific value

Question

1 answers

solution1 3 ACCPTED 2018-01-15 21:57:48

solution1
3 ACCPTED 2018-01-15 21:57:48