R 如何將數據框中的主題標簽添加到新列中

Question

如何將數據框中的主題標簽添加到新列中？

這是我的數據框：

dataframe <- data.frame(a = c('A', 'B', 'C', 'D', 'E'),
                 b = c("hello friends! #goodday", 
                       "the flood getting worse #peoplefirst #sos", 
                       "i love adele new song, it is remarkable", 
                       "john doe loves judo", 
                       "the new variant of covid19 is worrying #staysafe"))

最終的數據框應該是這樣的：

a   b                                                 c
A   hello friends! #goodday                           #goodday
B   the flood getting worse #peoplefirst #sos         #peoplefirst #sos              
C   i love adele new song, it is remarkable           NA
D   john doe loves judo                               NA
E   the new variant of covid19 is worrying #staysafe  #staysafe

Answer 1

使用stringr ：

dataframe$c <- lapply(str_extract_all(dataframe$b, "#\\w+"),
                      function(x) paste(x, collapse=" "))
dataframe

  a                                                b                 c
1 A                          hello friends! #goodday          #goodday
2 B        the flood getting worse #peoplefirst #sos #peoplefirst #sos
3 C          i love adele new song, it is remarkable                  
4 D                              john doe loves judo                  
5 E the new variant of covid19 is worrying #staysafe         #staysafe

Answer 2

使用mutate 、 map 、 str_extract_all和na_if的更整潔的解決方案如下。

library(tidyverse)

dataframe |>
  # For every row extract all the letters following a hashtag
  # and paste them into a single character string (for multiple matches)
  mutate(c = map(.x = b, 
                 .f = function(x) paste0(str_extract_all(x, "#[A-z]+", 
                                                         simplify = T), 
                                         collapse = " ",
                                         recycle0 = "NA"))) |>
  # Change empty spaces to NA
  na_if("")

#  a                                                b                 c
#1 A                          hello friends! #goodday          #goodday
#2 B        the flood getting worse #peoplefirst #sos #peoplefirst #sos
#3 C          i love adele new song, it is remarkable                NA
#4 D                              john doe loves judo                NA
#5 E the new variant of covid19 is worrying #staysafe         #staysafe

Answer 3

另一種方法是使用gsub ：

dataframe$c <- gsub("^[^#]*", "", dataframe$b)

# a                                                b                 c
# 1 A                          hello friends! #goodday          #goodday
# 2 B        the flood getting worse #peoplefirst #sos #peoplefirst #sos
# 3 C          i love adele new song, it is remarkable                  
# 4 D                              john doe loves judo                  
# 5 E the new variant of covid19 is worrying #staysafe         #staysafe

R 如何將數據框中的主題標簽添加到新列中

問題描述

3 個解決方案

解決方案1
0 2022-01-04 04:07:28

解決方案2
0 2022-01-04 04:24:06

解決方案3
0 2022-01-04 07:34:20

R 如何將數據框中的主題標簽添加到新列中

問題描述

3 個解決方案

解決方案1 0 2022-01-04 04:07:28

解決方案2 0 2022-01-04 04:24:06

解決方案3 0 2022-01-04 07:34:20

解決方案1
0 2022-01-04 04:07:28

解決方案2
0 2022-01-04 04:24:06

解決方案3
0 2022-01-04 07:34:20