How to transform Pandas DF to show count of tokens in the original DF?

Question

I am trying to convert a Pandas DF containing sentences into one which shows the number of words in those sentences across all columns and rows.

I have tried apply, transform, lambda functions and nested for loops.

Works beautifully for one column

dat.direction.str.split().str.len()

Failed Approach 1

def token_count(x):
    if type(x) == str:
        return x.split().str.len()
    else:
        return 0

dat.apply(token_count)
dat.transform(token_count)

Failed Approach 2

dat.apply(lambda x:x.str.split().str.len())
dat.apply(lambda x:x.split().str.len())
dat.transform(lambda x:x.str.split().str.len())
dat.transform(lambda x:x.split().str.len())

Failed Approach 3 (Before the nested for loops)

dat.iloc[1,3].split(" ").str.len()

Output for one column

Error for Approach 1 (Shouldn't be 0)

....................

Error for Approach 3

AttributeError: 'list' object has no attribute 'str'

Expected Output

Answer 1

How about this

import pandas as pd

df = pd.DataFrame({
    "col1": ["this is a sentence", "this is another sentence"],
    "col2": ["one more", "this is the last sentence"],
})

pd.concat([df[col].str.split().str.len() for col in df.columns], axis = 1)

Answer 2

`stack`

stack to one dimension
Do your thing
unstack back

df.stack().str.split().str.len().unstack()

   col1  col2
0     4     2
1     4     5

Using `count` instead

df.stack().str.count('\s+').unstack() + 1

`applymap`

df.applymap(lambda s: len(s.split()))

`apply`

df.apply(lambda s: s.str.split().str.len())

Setup

Thanks to Ian

df = pd.DataFrame({
    "col1": ["this is a sentence", "this is another sentence"],
    "col2": ["one more", "this is the last sentence"],
})

Answer 3

You can iterate over each column in your data frame using your first approach.

out = pd.DataFrame(index=dat.index)
for col in dat:
    out[col] = dat[col].str.split().str.len()

How to transform Pandas DF to show count of tokens in the original DF?

Question

Works beautifully for one column

Failed Approach 1

Failed Approach 2

Failed Approach 3 (Before the nested for loops)

Output for one column

Error for Approach 1 (Shouldn't be 0)

Error for Approach 3

Expected Output

3 answers

solution1
1 2019-06-14 20:23:49

solution2
1 2019-06-14 21:10:07

`stack`

Using `count` instead

`applymap`

`apply`

Setup

solution3
0 2019-06-14 20:23:30

How to transform Pandas DF to show count of tokens in the original DF?

Question

Works beautifully for one column

Failed Approach 1

Failed Approach 2

Failed Approach 3 (Before the nested for loops)

Output for one column

Error for Approach 1 (Shouldn't be 0)

Error for Approach 3

Expected Output

3 answers

solution1 1 2019-06-14 20:23:49

solution2 1 2019-06-14 21:10:07

stack

Using count instead

applymap

apply

Setup

solution3 0 2019-06-14 20:23:30

solution1
1 2019-06-14 20:23:49

solution2
1 2019-06-14 21:10:07

`stack`

Using `count` instead

`applymap`

`apply`

solution3
0 2019-06-14 20:23:30