简体   繁体   中英

R match expression multiple times in the same line

I am working with a set of Tweets (very original, I know) in R and would like to extract the text after each @ sign and after each # and put them into separate variables. For example:

This is a test tweet using #twitter. @johnsmith @joesmith.

Ideally I would like it to create new variables in the dataframe that has twitter johnsmith joesmith, etc.

Currently I am using data$at <- str_match(data$tweet_text,"\\s@\\w+") data$hash <- str_match(data$tweet_text,"\\s#\\w+")

Which obviously gives me the first occurrence of each into a new variable. Any suggestions?

strsplit and grep will work:

x <-strsplit("This is a test tweet using #twitter. @johnsmith @joesmith."," ")
grep("#|@",unlist(x), value=TRUE)
#[1] "#twitter."  "@johnsmith" "@joesmith."

If you only want to keep the words, no #,@ or .:

out <-grep("#|@",unlist(x), value=TRUE)
gsub("#|@|\\.","",out)
[1] "twitter"   "johnsmith" "joesmith" 

UPDATE Putting the results in a list :

my_list <-NULL

x <-strsplit("This is a test tweet using #twitter. @johnsmith @joesmith."," ")
my_list$hash <-c(my_list$hash,gsub("#|@|\\.","",grep("#",unlist(x), value=TRUE)))
my_list$at <-c(my_list$at,gsub("#|@|\\.","",grep("@",unlist(x), value=TRUE)))

x <-strsplit("2nd tweet using #second. @jillsmith @joansmith."," ")
my_list$hash <-c(my_list$hash,gsub("#|@|\\.","",grep("#",unlist(x), value=TRUE)))
my_list$at <-c(my_list$at,gsub("#|@|\\.","",grep("@",unlist(x), value=TRUE)))

my_list
$hash
[1] "twitter" "second" 

$at
[1] "johnsmith" "joesmith"  "jillsmith" "joansmith"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM