Tokenizing words into a new column in a pandas dataframe

Question

I am trying to go through a list of comments collected on a pandas dataframe and tokenize those words and put those words in a new column in the dataframe but I have having an error running through this, is

The error is stating that AttributeError: 'unicode' object has no attribute 'apwords'

Is there any other way to do this? Thanks

def apwords(words):
    filtered_sentence = []
    words = word_tokenize(words)
    for w in words:
        filtered_sentence.append(w)
    return filtered_sentence
addwords = lambda x: x.apwords()
df['words'] = df['complaint'].apply(addwords)
print df

Answer 1

Don't you just want to do this:

   df['words'] = df['complaint'].apply(apwords)

you don't need to define the function addwords . Which should be defined as:

addwords = lambda x: apwords(x)

Answer 2

Your way to apply the lambda function is correct, it is the way you define addwords that doesn't work.

When you define apwords you define a function not an attribute therefore when you want to apply it, use:

addwords = lambda x: apwords(x)

And not:

addwords = lambda x: x.apwords()

If you want to use apwords as an attribute, you would need to define a class that inheritates from string and define apwords as an attribute in this class.

It is far easier to stay with the function :

def apwords(words):
    filtered_sentence = []
    words = word_tokenize(words)
    for w in words:
        filtered_sentence.append(w)
    return filtered_sentence
addwords = lambda x: apwords(x)
df['words'] = df['complaint'].apply(addwords)

Tokenizing words into a new column in a pandas dataframe

Question

2 answers

solution1
0 2016-06-30 10:18:20

solution2
0 ACCPTED 2016-06-30 11:11:44

Tokenizing words into a new column in a pandas dataframe

Question

2 answers

solution1 0 2016-06-30 10:18:20

solution2 0 ACCPTED 2016-06-30 11:11:44

solution1
0 2016-06-30 10:18:20

solution2
0 ACCPTED 2016-06-30 11:11:44