简体   繁体   中英

How to apply a function to more than one column with Python?

I have a data frame with "revenue" and "quantity". Both columns are intended to be numeric, yet consist some garbage that should be cleaned before converting to numeric - such as "," etc.(originally "object"). The following two lines do the trick:

data['revenue'] = pd.to_numeric(data['revenue'].apply(lambda x: re.sub("[^0-9]", "", x)))
data['quantity'] = pd.to_numeric(data['quantity'].apply(lambda x: re.sub("[^0-9]", "", x)))
data.dtypes

revenue int64

quantity int64

Now, I wonder if there's a one line code to do so. I tried the following:

data = data.apply(lambda x: pd.to_numeric(re.sub("[^0-9]", "", x)) if x.name in [['revenue','quantity']] else x)

That didn't change the object type to Int. Then I tried:

data[['revenue','quantity']] = pd.to_numeric(data[['revenue','quantity']].apply(lambda x: re.sub("[^0-9]", "", x)))

got the error:

TypeError: ('expected string or bytes-like object', 'occurred at index revenue')

Any ideas for more efficient code than two lines?

try this

data = data.apply(lambda x: pd.to_numeric(x.apply(lambda v: re.sub("[^0-9]", "", v))) if x.name in ['revenue','quantity'] else x)

I usually just do

for col in ['revenue', 'quantity']:
    data[col] = data[col].apply(function)

It's not a one liner, but what you lose in lines you win in readability, in my opinion.

data['revenue'] is a series and apply is called with the data items of the series. But data[['revenue', 'quantity']] is a dataframe, and apply is called with the Series objects. So twice, with the series data['revenue'] and then data['quantity'] . The x in re.sub("[^0-9]", "", x) is a Series object and that's why it fails.

You could change your lambda to

lambda s: s.apply(re.sub("[^0-9]", "", x))

but the Dataframe.replace method accepts regular expressions, so there isn't a need to do apply at all.

data[['revenue', 'quantity']].replace("[^0-9]", "", regex=True)

to_numeric doesn't work on dataframes, but astype does. So the full conversion would be (assuming you want int64)

data[['revenue', 'quantity']] = data[['revenue', 'quantity']].replace(
    "[^0-9]", "", regex=True).astype('int64')

My proposition is:

data[['revenue', 'quantity']] = data[['revenue', 'quantity']].\
    applymap(lambda v: pd.to_numeric(re.sub("[^0-9]", "", v)))

Actually a one-liner, but for readability, due to limited screen width, split into 2 lines.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM