簡體   English   中英

R 如何將數據框中的主題標簽添加到新列中

[英]R How to add hashtag in data frame into new column

如何將數據框中的主題標簽添加到新列中?

這是我的數據框:

dataframe <- data.frame(a = c('A', 'B', 'C', 'D', 'E'),
                 b = c("hello friends! #goodday", 
                       "the flood getting worse #peoplefirst #sos", 
                       "i love adele new song, it is remarkable", 
                       "john doe loves judo", 
                       "the new variant of covid19 is worrying #staysafe"))

最終的數據框應該是這樣的:

a   b                                                 c
A   hello friends! #goodday                           #goodday
B   the flood getting worse #peoplefirst #sos         #peoplefirst #sos              
C   i love adele new song, it is remarkable           NA
D   john doe loves judo                               NA
E   the new variant of covid19 is worrying #staysafe  #staysafe

使用stringr

dataframe$c <- lapply(str_extract_all(dataframe$b, "#\\w+"),
                      function(x) paste(x, collapse=" "))
dataframe

  a                                                b                 c
1 A                          hello friends! #goodday          #goodday
2 B        the flood getting worse #peoplefirst #sos #peoplefirst #sos
3 C          i love adele new song, it is remarkable                  
4 D                              john doe loves judo                  
5 E the new variant of covid19 is worrying #staysafe         #staysafe

使用mutatemapstr_extract_allna_if的更整潔的解決方案如下。

library(tidyverse)

dataframe |>
  # For every row extract all the letters following a hashtag
  # and paste them into a single character string (for multiple matches)
  mutate(c = map(.x = b, 
                 .f = function(x) paste0(str_extract_all(x, "#[A-z]+", 
                                                         simplify = T), 
                                         collapse = " ",
                                         recycle0 = "NA"))) |>
  # Change empty spaces to NA
  na_if("")

#  a                                                b                 c
#1 A                          hello friends! #goodday          #goodday
#2 B        the flood getting worse #peoplefirst #sos #peoplefirst #sos
#3 C          i love adele new song, it is remarkable                NA
#4 D                              john doe loves judo                NA
#5 E the new variant of covid19 is worrying #staysafe         #staysafe

另一種方法是使用gsub

dataframe$c <- gsub("^[^#]*", "", dataframe$b)

# a                                                b                 c
# 1 A                          hello friends! #goodday          #goodday
# 2 B        the flood getting worse #peoplefirst #sos #peoplefirst #sos
# 3 C          i love adele new song, it is remarkable                  
# 4 D                              john doe loves judo                  
# 5 E the new variant of covid19 is worrying #staysafe         #staysafe

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM