简体   繁体   中英

Removing retweets from data frame in R based on text column

I pulled tweets from twitter using the academictwitter package. I would now like to remove all retweets = tweets starting with "RT" in the first column "text" (eg third row). You can download a similar data frame from github including tweets from Trump: https://github.com/cbail/cbail.github.io/blob/master/Trump_Tweets.Rdata

Except my data frame has no column called "is_retweet", which makes it more difficult.

The output from my data frame looks like this (I have removed some redundant columns to make it clearer):

在此处输入图像描述

Thank you in advance for any suggestions

You can use regular expressions to figure out which rows start with 'RT'. If your data is in a data frame called tweets , maybe something like this?

tweets[grepl("^(?!RT)", tweets$text, perl = TRUE),]

Or if you're using tidyverse :

tweets %>% 
  filter(grepl("^(?!RT)", text, perl = TRUE))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM