Replacing specific values within a dataframe column

Question

I am running the following code in jupyter notebook which checks strings of text within nametest_df['text'] and returns Persons names. I managed to get this working and would like to push these names to the respective fields within the nametest_df['name'] where currently all values are NaN .

I tried the Series.replace() method however all entries within the 'name' column are all showing the same name.

Any clue how I can do this efficiently?

for word in nametest_df['text']:

    for sent in nltk.sent_tokenize(word):
        tokens = nltk.tokenize.word_tokenize(sent)
        tags = st.tag(tokens)

        for tag in tags:
            if tag[1]=='PERSON':
                name = tag[0]
                print(name)

    nametest_df.name = nametest_df.name.replace({"NaN": name})

Sample nametest_df

      **text**                    **name**
0   His name is John                NaN
1   I went to the beach             NaN
2   My friend is called Fred        NaN

Expected output

      **text**                    **name**
0   His name is John                John                
1   I went to the beach             NaN
2   My friend is called Fred        Fred

Answer 1

Don't try and fill series values one by one. This is inefficient prone to error. A better idea is to create a list of names and assign directly.

L = []
for word in nametest_df['text']:
    for sent in nltk.sent_tokenize(word):
        tokens = nltk.tokenize.word_tokenize(sent)
        tags = st.tag(tokens)
        for tag in tags:
            if tag[1]=='PERSON':
                L.append(tag[0])

nametest_df.loc[nametest_df['name'].isnull(), 'name'] = L

Replacing specific values within a dataframe column

Question

1 answers

solution1
1 2018-11-02 13:21:20

Replacing specific values within a dataframe column

Question

1 answers

solution1 1 2018-11-02 13:21:20

solution1
1 2018-11-02 13:21:20