TypeError when using `df.apply` (Pandas)

Question

I have a pandas dataframe that looks like this:

I want to take the log of each value in the dataframe.

So that seemed like no problem at first, and then: data.apply(lambda x:math.log(x)) returned a type error (cannot convert series to class 'float').

Okay, fine--so, while type checking is often frowned upon, I gave it a shot (also tried casting x to a float, same problem):

isinstance((data['A1BG'][0]), np.float64) returns true, so I tried:

data.apply(lambda x: math.log(x) if isinstance(x, np.float64) else x) . That ran without any errors, but it didn't change any values in my dataframe.

What am I doing wrong?

Thanks!

Answer 1

When you do apply on a dataframe, the apply function will be cast upon a Pandas.Series not a float (opposing to when you use apply on a Series). Then instead of math.log you should use np.log )

EDIT:

With examples it's always better:

test = pd.DataFrame(columns = ['a','b'])
test.a = np.random.random(5)
test.b = np.random.random(5)

    a           b
0   0.430111    0.420516
1   0.367704    0.785093
2   0.034130    0.839822
3   0.310254    0.755089
4   0.098302    0.136995

If you try the following, it won't work:

test.apply(lambda x: math.log(x))

TypeError: ("cannot convert the series to <class 'float'>", 'occurred at index a')

But this will do the job:

test.apply(lambda x: np.log(x))

    a           b
0   -0.843711   -0.866273
1   -1.000476   -0.241953
2   -3.377588   -0.174565
3   -1.170364   -0.280919
4   -2.319708   -1.987811

Answer 2

What happens is that df.apply returns a pd.Series object for the lambda to operate over... It basically operates over a Series at a time, not one float at a time.

So, with

data.apply(lambda x: math.log(x) if isinstance(x, np.float64) else x)

isinstance(x, np.float64) is never true (because x is a pd.Series type) and so the else is always executed.

To remedy this, you can operate a column at a time, using df.applymap :

data.applymap(math.log)

Using apply, the solution is similar, but you cannot escape the lambda:

data.apply(lambda x: np.log(x))

Or, alternatively (pd 0.20):

data.transform(lambda x: np.log(x))

Coincidentally, df.applymap is the fastest, followed by df.apply and df.transform .

Answer 3

Try this

 import math
 data.apply(lambda x:math.log(list(x)))

TypeError when using `df.apply` (Pandas)

Question

3 answers

solution1
1 2017-08-02 13:53:38

solution2
1 ACCPTED 2017-08-02 13:54:38

solution3
0 2017-08-02 13:54:12

TypeError when using `df.apply` (Pandas)

Question

3 answers

solution1 1 2017-08-02 13:53:38

solution2 1 ACCPTED 2017-08-02 13:54:38

solution3 0 2017-08-02 13:54:12

solution1
1 2017-08-02 13:53:38

solution2
1 ACCPTED 2017-08-02 13:54:38

solution3
0 2017-08-02 13:54:12