How to extract an entire data frame row if column value fulfills a statement?

Question

I have a dataframe with 2 columns:

+-----------+----------+
|   Tweet   | Language |
+-----------+----------+
| some text | en       |
| more text | en       |
| ein text  | de       |
+-----------+----------+

(the text in the Tweet column are actual tweets)

I want to apply a language detection algorithm to see how many german(de) tweets I have in my df.

from langdetect import detect 
nlp = detect

This works, but only adds the tweet to temp_list

temp_list = [row for row in df['Tweet'] if nlp(row)=='de']

However, what I want, is to add the entire row to temp_list if the language detection algorithm labels it as german. I want to include both columns, so I can cross-check with my Language column(which I labeled manually).

Answer 1

If you want the full dataframe output, and your dataframe is called nlp then you should use:

filtered_df = nlp[nlp['Language'] == 'de']

If you want only the Tweets column, then:

filtered_df = nlp[nlp['Language'] == 'de']['Tweets']

Finally, if you want to make a list out of those values:

df_filtered = df[df['Language'] =='de']['Tweets'].tolist()

Outputs:

1st:

    Tweets Language
2  Deutsch       de

2nd:

2    Deutsch

3rd:

['Deutsch']

Answer 2

You could use apply

df[df['Language']==df['Tweet'].apply(nlp)]

and that would return a dataframe

You could also create a new column like detected_lang

df['detected_lang']=df['Tweet'].apply(nlp)
print(df)

       Tweet Language detected_lang
0  some text       en            sv
1  more text       en            en
2   ein text       de            de

How to extract an entire data frame row if column value fulfills a statement?

Question

2 answers

solution1
1 2019-12-13 17:55:07

Outputs:

solution2
1 ACCPTED 2019-12-13 18:09:47

How to extract an entire data frame row if column value fulfills a statement?

Question

2 answers

solution1 1 2019-12-13 17:55:07

Outputs:

solution2 1 ACCPTED 2019-12-13 18:09:47

solution1
1 2019-12-13 17:55:07

solution2
1 ACCPTED 2019-12-13 18:09:47