简体   繁体   中英

How to extract an entire data frame row if column value fulfills a statement?

I have a dataframe with 2 columns:

+-----------+----------+
|   Tweet   | Language |
+-----------+----------+
| some text | en       |
| more text | en       |
| ein text  | de       |
+-----------+----------+

(the text in the Tweet column are actual tweets)

I want to apply a language detection algorithm to see how many german(de) tweets I have in my df.

from langdetect import detect 
nlp = detect

This works, but only adds the tweet to temp_list

temp_list = [row for row in df['Tweet'] if nlp(row)=='de']

However, what I want, is to add the entire row to temp_list if the language detection algorithm labels it as german. I want to include both columns, so I can cross-check with my Language column(which I labeled manually).

If you want the full dataframe output, and your dataframe is called nlp then you should use:

filtered_df = nlp[nlp['Language'] == 'de']

If you want only the Tweets column, then:

filtered_df = nlp[nlp['Language'] == 'de']['Tweets']

Finally, if you want to make a list out of those values:

df_filtered = df[df['Language'] =='de']['Tweets'].tolist()

Outputs:

1st:

    Tweets Language
2  Deutsch       de

2nd:

2    Deutsch

3rd:

['Deutsch']

You could use apply

df[df['Language']==df['Tweet'].apply(nlp)]

and that would return a dataframe

You could also create a new column like detected_lang

df['detected_lang']=df['Tweet'].apply(nlp)
print(df)

       Tweet Language detected_lang
0  some text       en            sv
1  more text       en            en
2   ein text       de            de

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM