Python: What is the difference between df[‘Col’].apply(lambda row: len(row)) and df.apply(lambda row: len(row[‘Col’]), axis=1)?

Question

df = pd.DataFrame([['user1', 'Hey whats up I am Rob', 73],
                    ['user2', 'Hey whats up I am Bob', 44],
                    ['user3', 'Hey whats up I am a Snob', 12]],
                    columns=['User', 'Text', 'Age'])

#Method 1
df['TextLen'] = df['Text'].apply(lambda row: len(row))

#Method 2
df['TextLen2'] = df.apply(lambda row: len(row['Text']), axis=1)

print(df)

Result:

   User                      Text  Age  TextLen  TextLen2
0  user1     Hey whats up I am Rob   73       21        21
1  user2     Hey whats up I am Bob   44       21        21
2  user3  Hey whats up I am a Snob   12       24        24

What is the difference between Method 1 and Method 2?

Which is more Pythonic / Which should I be using on large datasets?

Answer 1

Using %%timeit magic in jupyter notebook , I timed each of your two methods. Method 1 is much faster than method 2.

I have also found that list comperehension is twice as fast as Method 1. See Method 3 below...

Method 1 (FASTER)

%%timeit
df['TextLen'] = df['Text'].apply(lambda row: len(row))
#434 µs ± 6.73 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Method 2 (SLOWER)

%%timeit
df['TextLen2'] = df.apply(lambda row: len(row['Text']), axis=1)
#1.24 ms ± 19.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Method 3 (FASTEST)

%%timeit
df['TextLen3'] = [len(i) for i in df['Text']]
#202 µs ± 4.48 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Method 4 (FAST but READABLE)

%%timeit
df['TextLen4'] = df['Text'].str.len()
#525 µs ± 53.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

I used the sample example data as you to run these tests.

Edit 1: I tried a third method and thought it was faster before I realized it did not work. I have removed it from the answer.

Edit 2: I tried a list comprehension method (Method 3) and determined that it worked, so I have added it to the answer

Edit 3: Added in the method (Method 4) suggested in the comments on the OP

Python: What is the difference between df[‘Col’].apply(lambda row: len(row)) and df.apply(lambda row: len(row[‘Col’]), axis=1)?

Question

1 answers

solution1
1 ACCPTED 2020-05-27 19:03:18

Python: What is the difference between df[‘Col’].apply(lambda row: len(row)) and df.apply(lambda row: len(row[‘Col’]), axis=1)?

Question

1 answers

solution1 1 ACCPTED 2020-05-27 19:03:18

solution1
1 ACCPTED 2020-05-27 19:03:18