简体   繁体   中英

TypeError when using `df.apply` (Pandas)

I have a pandas dataframe that looks like this:

数据帧

I want to take the log of each value in the dataframe.

So that seemed like no problem at first, and then: data.apply(lambda x:math.log(x)) returned a type error (cannot convert series to class 'float').

Okay, fine--so, while type checking is often frowned upon, I gave it a shot (also tried casting x to a float, same problem):

isinstance((data['A1BG'][0]), np.float64) returns true, so I tried:

data.apply(lambda x: math.log(x) if isinstance(x, np.float64) else x) . That ran without any errors, but it didn't change any values in my dataframe.

What am I doing wrong?

Thanks!

When you do apply on a dataframe, the apply function will be cast upon a Pandas.Series not a float (opposing to when you use apply on a Series). Then instead of math.log you should use np.log )

EDIT:

With examples it's always better:

test = pd.DataFrame(columns = ['a','b'])
test.a = np.random.random(5)
test.b = np.random.random(5)

    a           b
0   0.430111    0.420516
1   0.367704    0.785093
2   0.034130    0.839822
3   0.310254    0.755089
4   0.098302    0.136995

If you try the following, it won't work:

test.apply(lambda x: math.log(x))

TypeError: ("cannot convert the series to <class 'float'>", 'occurred at index a')

But this will do the job:

test.apply(lambda x: np.log(x))

    a           b
0   -0.843711   -0.866273
1   -1.000476   -0.241953
2   -3.377588   -0.174565
3   -1.170364   -0.280919
4   -2.319708   -1.987811

What happens is that df.apply returns a pd.Series object for the lambda to operate over... It basically operates over a Series at a time, not one float at a time.

So, with

data.apply(lambda x: math.log(x) if isinstance(x, np.float64) else x)

isinstance(x, np.float64) is never true (because x is a pd.Series type) and so the else is always executed.

To remedy this, you can operate a column at a time, using df.applymap :

data.applymap(math.log)

Using apply, the solution is similar, but you cannot escape the lambda:

data.apply(lambda x: np.log(x))

Or, alternatively (pd 0.20):

data.transform(lambda x: np.log(x))

Coincidentally, df.applymap is the fastest, followed by df.apply and df.transform .

Try this

 import math
 data.apply(lambda x:math.log(list(x)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM