简体   繁体   中英

Create new rows in data frame based on multiple values of column

I have adjusted my question to be a bit more specific

I have searched for a specific answer to my question but without success.

First of all I have a data frame consisting of 48 variables, which looks something likes this:

> df

    Text                                               Screen_Name   ...  
1   a text where @Sam and @Su and @Jim are addressed   Peter
2   a text where @Eric is addressed                    Margret
3   a text where @Sarah and @Adam are addressed        John

Now I am extracting all strings that equal ("@\\S+") and store them in a new column

df$addressees <- str_extract_all(df$text, "@\\S+")

This gets me:

    ...   Screen_Name   Addressees               ...  
1         Peter         c("@Sam", "@Su", "@Jim")
2         Margret       @Eric
3         John          c("@Sarah", "@Adam")

Now I want to create a new data frame for the two columns where new rows for each "Addressee" are created by repeating the respective value of column "Screen_Name":

> df

    Screen_Name  Addressees
 1  Peter        Sam
 2  Peter        Su
 3  Peter        Jim
 4  Margret      Eric
 5  John         Sarah
 6  John         Adam

I have tried solutions to similar approaches, but none of them seems to work.

Thank you very much for your help!

OK, with a reproducible example:

# create df
ego <- c("peter","margaret","john")
friends <- list(c("sam","su","jim"),c("eric"),c("sarah","adam"))
df <- data.frame(ego,friends= I(friends),stringsAsFactors = F)

# use repeat function to repeat rows
times <- sapply(df$friends,length)
df <- df[rep(seq_len(nrow(df)), times),]
# assign back unlisted friends
df$friends <- unlist(friends)

You may also try data.table using the df created by @raistlin:

library(data.table)
setDT(df)[, .(friends = unlist(friends)), by = "ego"]

        ego friends
1:    peter     sam
2:    peter      su
3:    peter     jim
4: margaret    eric
5:     john   sarah
6:     john    adam

Edit

Now, with the additional context supplied by the OP , the data.table solution can be streamlined to solve the underlying problem in a one-liner.

To remove the leading @ in the Addressees column as requested by the OP, the regular expression needs to be modified to use positive lookbehind .

library(data.table)

# read data (to make it a reproducible example)
dt <- fread("Text;                                  Screen_Name 
a text where @Sam and @Su and @Jim are addressed;   Peter
a text where @Eric is addressed;                    Margret
a text where @Sarah and @Adam are addressed;        John")

# use str_extract_all with modified regex
dt[, .(Addressees = unlist(stringr::str_extract_all(Text, "(?<=@)\\S+"))), 
   by = .(Screen_Name)]

#   Screen_Name Addressees
#1:       Peter        Sam
#2:       Peter         Su
#3:       Peter        Jim
#4:     Margret       Eric
#5:        John      Sarah
#6:        John       Adam

Does this help?

Input:

Screen_Name <- c("Peter", "Margaret", "John") Addressees <- c(c("@Sam", "@Su", "@Jim"), "@Eric", c("@Sarah", "@Adam") )

the tidyverse way:

df <- data.frame(Screen_Name, Addressees) %>% tidyr::expand(Screen_Name, Addressees)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM