简体   繁体   中英

Converting non-numeric values to numeric in some, but not all, columns in pandas dataframe using column numbers rather than column names

Assume a pandas df with many columns. I am trying to convert all non-numeric values into np.nan values using pd.to_numeric as specified below. However, I do not want to apply this to the first two columns; rather, it would only be applied to all columns other than the first two.

For instance, assume the following:

import pandas as pd
import numpy as np

df = pd.DataFrame({'name': ['Adam', 'Bob', 'Chuck', 'David'],
                   'color': ['blue', 'green', 'red', 'yellow']
                   'number1': [50, 750, 'ad098', 'baseball'],
                   'number2': [25, 'text', 1000, '200']},
                  )

Generally, I would just call out the names of the two columns that should be excluded. However, in this case, I am trying to create a framework that can be applied to any df regardless of the names of the columns. Hence, I want to exclude the first two columns on the basis of their column numbers [0:1].

I am able to successfully convert all non-numeric values in all columns to np.nam using the following:

df = df.apply(pd.to_numeric, errors='coerce')

However, when I try to exclude the first two columns using either of the two methods below, I get an error.

df = df[df.columns[2:].apply(pd.to_numeric, errors='coerce')]

gives the error: "AttributeError: 'Index' object has no attribute 'apply'"

df = df[df.iloc[:,2:].apply(pd.to_numeric, errors='coerce')]

gives the error: "ValueError: Boolean array expected for the condition, not object"

Clearly I am doing something wrong, but I can't figure out what it is. Any help would be greatly appreciated. Thank you.

Try with:

df.iloc[:, 2:] = df.iloc[:, 2:].apply(pd.to_numeric, errors='coerce')

This reads as "replace the columns after the first two with those same columns after applying method X".

Writing df[something] is simply selecting the columns of df using the object something - a sequence of indices or column names, for example.

So when you write an expression like

df[df.iloc[:,2:].apply(pd.to_numeric, errors='coerce')]

your something is a DataFrame (the value returned from the expression df.iloc[:,2:].apply(pd.to_numeric, errors='coerce') ).

Effectively, you were confusing the values used to select columns with the values you wanted to replace those columns with.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM