Removing retweets from data frame in R based on text column

Question

I pulled tweets from twitter using the academictwitter package. I would now like to remove all retweets = tweets starting with "RT" in the first column "text" (eg third row). You can download a similar data frame from github including tweets from Trump: https://github.com/cbail/cbail.github.io/blob/master/Trump_Tweets.Rdata

Except my data frame has no column called "is_retweet", which makes it more difficult.

The output from my data frame looks like this (I have removed some redundant columns to make it clearer):

Thank you in advance for any suggestions

Answer 1

You can use regular expressions to figure out which rows start with 'RT'. If your data is in a data frame called tweets , maybe something like this?

tweets[grepl("^(?!RT)", tweets$text, perl = TRUE),]

Or if you're using tidyverse :

tweets %>% 
  filter(grepl("^(?!RT)", text, perl = TRUE))

Removing retweets from data frame in R based on text column

Question

1 answers

solution1
0 2022-02-04 17:32:30

Removing retweets from data frame in R based on text column

Question

1 answers

solution1 0 2022-02-04 17:32:30

solution1
0 2022-02-04 17:32:30