簡體   English   中英

根據列的多個值在數據框中創建新行

[英]Create new rows in data frame based on multiple values of column

我已將問題調整得更具體一些

我一直在尋找問題的具體答案,但沒有成功。

首先,我有一個由48個變量組成的數據框,看起來像這樣:

> df

    Text                                               Screen_Name   ...  
1   a text where @Sam and @Su and @Jim are addressed   Peter
2   a text where @Eric is addressed                    Margret
3   a text where @Sarah and @Adam are addressed        John

現在,我提取所有等於(“ @ \\ S +”)的字符串並將其存儲在新列中

df$addressees <- str_extract_all(df$text, "@\\S+")

這使我:

    ...   Screen_Name   Addressees               ...  
1         Peter         c("@Sam", "@Su", "@Jim")
2         Margret       @Eric
3         John          c("@Sarah", "@Adam")

現在,我想為兩列創建一個新的數據框,其中通過重復列“ Screen_Name”列的相應值來為每個“收件人”創建新行:

> df

    Screen_Name  Addressees
 1  Peter        Sam
 2  Peter        Su
 3  Peter        Jim
 4  Margret      Eric
 5  John         Sarah
 6  John         Adam

我嘗試過類似方法的解決方案,但似乎都沒有用。

非常感謝您的幫助!

好,有一個可重現的示例:

# create df
ego <- c("peter","margaret","john")
friends <- list(c("sam","su","jim"),c("eric"),c("sarah","adam"))
df <- data.frame(ego,friends= I(friends),stringsAsFactors = F)

# use repeat function to repeat rows
times <- sapply(df$friends,length)
df <- df[rep(seq_len(nrow(df)), times),]
# assign back unlisted friends
df$friends <- unlist(friends)

你也可以嘗試data.table使用df通過@raistlin創建:

library(data.table)
setDT(df)[, .(friends = unlist(friends)), by = "ego"]

        ego friends
1:    peter     sam
2:    peter      su
3:    peter     jim
4: margaret    eric
5:     john   sarah
6:     john    adam

編輯

現在,借助OP提供的附加上下文 ,可以簡化data.table解決方案,以data.table解決潛在的問題。

要按照OP的要求在“ Addressees列中刪除前導@ ,需要將正則表達式修改為使用正向后看

library(data.table)

# read data (to make it a reproducible example)
dt <- fread("Text;                                  Screen_Name 
a text where @Sam and @Su and @Jim are addressed;   Peter
a text where @Eric is addressed;                    Margret
a text where @Sarah and @Adam are addressed;        John")

# use str_extract_all with modified regex
dt[, .(Addressees = unlist(stringr::str_extract_all(Text, "(?<=@)\\S+"))), 
   by = .(Screen_Name)]

#   Screen_Name Addressees
#1:       Peter        Sam
#2:       Peter         Su
#3:       Peter        Jim
#4:     Margret       Eric
#5:        John      Sarah
#6:        John       Adam

這有幫助嗎?

輸入:

Screen_Name <- c("Peter", "Margaret", "John") Addressees <- c(c("@Sam", "@Su", "@Jim"), "@Eric", c("@Sarah", "@Adam") )

tidyverse方式:

df <- data.frame(Screen_Name, Addressees) %>% tidyr::expand(Screen_Name, Addressees)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM