简体   繁体   中英

Tokenizing words into a new column in a pandas dataframe

I am trying to go through a list of comments collected on a pandas dataframe and tokenize those words and put those words in a new column in the dataframe but I have having an error running through this, is

The error is stating that AttributeError: 'unicode' object has no attribute 'apwords'

Is there any other way to do this? Thanks

def apwords(words):
    filtered_sentence = []
    words = word_tokenize(words)
    for w in words:
        filtered_sentence.append(w)
    return filtered_sentence
addwords = lambda x: x.apwords()
df['words'] = df['complaint'].apply(addwords)
print df

Don't you just want to do this:

   df['words'] = df['complaint'].apply(apwords)

you don't need to define the function addwords . Which should be defined as:

addwords = lambda x: apwords(x)

Your way to apply the lambda function is correct, it is the way you define addwords that doesn't work.

When you define apwords you define a function not an attribute therefore when you want to apply it, use:

addwords = lambda x: apwords(x)

And not:

addwords = lambda x: x.apwords()

If you want to use apwords as an attribute, you would need to define a class that inheritates from string and define apwords as an attribute in this class.

It is far easier to stay with the function :

def apwords(words):
    filtered_sentence = []
    words = word_tokenize(words)
    for w in words:
        filtered_sentence.append(w)
    return filtered_sentence
addwords = lambda x: apwords(x)
df['words'] = df['complaint'].apply(addwords)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM