[英]R How to add hashtag in data frame into new column
如何將數據框中的主題標簽添加到新列中?
這是我的數據框:
dataframe <- data.frame(a = c('A', 'B', 'C', 'D', 'E'),
b = c("hello friends! #goodday",
"the flood getting worse #peoplefirst #sos",
"i love adele new song, it is remarkable",
"john doe loves judo",
"the new variant of covid19 is worrying #staysafe"))
最終的數據框應該是這樣的:
a b c
A hello friends! #goodday #goodday
B the flood getting worse #peoplefirst #sos #peoplefirst #sos
C i love adele new song, it is remarkable NA
D john doe loves judo NA
E the new variant of covid19 is worrying #staysafe #staysafe
使用stringr
:
dataframe$c <- lapply(str_extract_all(dataframe$b, "#\\w+"),
function(x) paste(x, collapse=" "))
dataframe
a b c
1 A hello friends! #goodday #goodday
2 B the flood getting worse #peoplefirst #sos #peoplefirst #sos
3 C i love adele new song, it is remarkable
4 D john doe loves judo
5 E the new variant of covid19 is worrying #staysafe #staysafe
使用mutate
、 map
、 str_extract_all
和na_if
的更整潔的解決方案如下。
library(tidyverse)
dataframe |>
# For every row extract all the letters following a hashtag
# and paste them into a single character string (for multiple matches)
mutate(c = map(.x = b,
.f = function(x) paste0(str_extract_all(x, "#[A-z]+",
simplify = T),
collapse = " ",
recycle0 = "NA"))) |>
# Change empty spaces to NA
na_if("")
# a b c
#1 A hello friends! #goodday #goodday
#2 B the flood getting worse #peoplefirst #sos #peoplefirst #sos
#3 C i love adele new song, it is remarkable NA
#4 D john doe loves judo NA
#5 E the new variant of covid19 is worrying #staysafe #staysafe
另一種方法是使用gsub
:
dataframe$c <- gsub("^[^#]*", "", dataframe$b)
# a b c
# 1 A hello friends! #goodday #goodday
# 2 B the flood getting worse #peoplefirst #sos #peoplefirst #sos
# 3 C i love adele new song, it is remarkable
# 4 D john doe loves judo
# 5 E the new variant of covid19 is worrying #staysafe #staysafe
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.