简体   繁体   中英

Create new data frame rows based on a column from another data frame

I have 2 data frames, the first column of one is a list (df A), the first column of the other contains items from the list but in some cases each row has multiple items (df B). What I want to do is go through and create new rows for each item from df A, that occurs in the first column of df B.

DF A

dfA
  Index  X
1  1    alpha
2  2    beta
3  3    gamma
4  4    delta

DF B

dfB
  list    X  
1  1 4    alpha
2  3 2 1  beta
3  4 1 2  gamma
4  3      delta

Desired

dfC
  Index   x
1  1     alpha
2  4     alpha
3  3     beta
4  2     beta
5  1     beta
6  4     gamma
7  1     gamma
8  2     gamma
9  3     delta

The actual data I am using: DF A

dput(head(allwines))
structure(list(Wine = c("Albariño", "Aligoté", "Amarone", "Arneis", 
"Asti Spumante", "Auslese"), Description = c("Spanish white wine grape that makes crisp, refreshing, and light-bodied wines.", 
"White wine grape grown in Burgundy making medium-bodied, crisp, dry wines with spicy character.", 
"From Italy’s Veneto Region a strong, dry, long- lived red, made from a blend of partially dried red grapes.", 
"A light-bodied dry wine the Piedmont Region of Italy", "From the Piedmont Region of Italy, A semidry sparkling wine produced from the Moscato di Canelli grape in the village of Asti", 
"German white wine from grapes that are very ripe and thus high in sugar"
)), .Names = c("Wine", "Description"), row.names = c(NA, 6L), class = "data.frame")

DF B

> dput(head(cheesePairing))
structure(list(Wine = c("Cabernet Sauvignon\r\n                                \r\n                            \r\n                        \r\n                            \r\n                                \r\n                                    Pinot Noir\r\n                                \r\n                            \r\n                        \r\n                            \r\n                                \r\n                                    Sauvignon Blanc\r\n                                \r\n                            \r\n                        \r\n                            \r\n                                \r\n                                    Zinfandel", 
"Chianti\r\n                                \r\n                            \r\n                        \r\n                            \r\n                                \r\n                                    Pinot Noir\r\n                                \r\n                            \r\n                        \r\n                            \r\n                                \r\n                                    Sangiovese", 
"Chardonnay", "Bardolino\r\n                                \r\n                            \r\n                        \r\n                            \r\n                                \r\n                                    Malbec\r\n                                \r\n                            \r\n                        \r\n                            \r\n                                \r\n                                    Riesling\r\n                                \r\n                            \r\n                        \r\n                            \r\n                                \r\n                                    Rioja\r\n                                \r\n                            \r\n                        \r\n                            \r\n                                \r\n                                    Sauvignon Blanc", 
"Tempranillo", "Asti Spumante"), Cheese = c("Abbaye De Belloc Cheese", 
"Ardrahan cheese", "Asadero cheese", "Asiago cheese", "Azeitao", 
"Baby Swiss Cheese"), Suggestions = c("Pair with apples,  sliced pears OR  a sampling of olives and thin sliced salami.  Pass around slices of baguette.", 
"Serve with a substantial wheat cracker and apples or grapes.", 
"Rajas (blistered chile strips) fresh corn tortillas", "Table water crackers, raw nuts (almond, walnuts)", 
"Nutty brown bread, grapes", "Server with dried fruits, whole grain, nutty breads, nuts"
)), .Names = c("Wine", "Cheese", "Suggestions"), row.names = c(NA, 
6L), class = "data.frame")

Building off of Curt's answer, I think I found a more efficient solution...assuming I interpreted your objective correctly.

My commented code is below. You should be able to run this as-is and get the desired dfC. One thing to note is that I assumed (based on your actual data) that the delimiter splitting dfB$Index is "\\r\\n".

# set up fake data
dfA<-data.frame(Index=c('1','2','3','4'), X=c('alpha','beta','gamma','delta'))
dfB<-data.frame(Index=c('1 \r\n 4','3 \r\n 2 \r\n 1','4 \r\n 1 \r\n 2','3'), X=c('alpha','beta','gamma','delta'))

dfA$Index<-as.character(dfA$Index)
dfA$X<-as.character(dfA$X)
dfB$Index<-as.character(dfB$Index)
dfB$X<-as.character(dfB$X)


dfB_index_parsed<-strsplit(x=dfB$Index,"\r\n") # split Index of dfB by delimiter "\r\n" and store in a list
names(dfB_index_parsed)<-dfB$X
dfB_split_num<-lapply(dfB_index_parsed, length) # find the number of splits per row of dfB and store in a list
dfB_split_num_vec<-do.call('c', dfB_split_num) # convert number of splits above from list to vector

g<-do.call('c',dfB_index_parsed) # store all split values in a single vector
g<-gsub(' ','',g) # remove trailing/leading spaces that occur after split
names(g)<-rep(names(dfB_split_num_vec), dfB_split_num_vec ) # associate each split Index from dfB with X from dfB
g<-g[g %in% dfA$Index] # check which dfB$Index are in dfA$Index

dfC<-data.frame(Index=g, X=names(g)) # construct data.frame

First, let me provide a functional answer to your question. I doubt my answer is very efficient, but it works.

# construct toy data
dfA <- data.frame(index = 1:4, X = letters[1:4])

dfB <- data.frame(X = letters[1:4])
dfB$list_elements <- list(c(1, 4), c(3, 2, 1), c(4, 1, 2), c(3))

# define function that provides solution

unlist_merge_df <- function(listed_df, reference_df){
    # reference_df assumed to have columns "X" and "index"
    # listed_df assumed to have column "list_elements"
    df_out <- data.frame(index = c(), X = c())
    my_list <- listed_df$list_elements
    for(idx in 1:length(my_list)){
        df_out <- rbind(df_out, 
                        data.frame(index = my_list[[idx]], 
                                   X = listed_df[idx, 'X'])
                        )
    }
    return(df_out)
}

# call the function
dfC <- unlist_merge_df(dfB, dfA)

# show output in human and R-parseable formats
dfC

dput(dfC)

The output is:

index   X
1   1   a
2   4   a
3   3   b
4   2   b
5   1   b
6   4   c
7   1   c
8   2   c
9   3   d

structure(list(index = c(1, 4, 3, 2, 1, 4, 1, 2, 3), X = structure(c(1L, 
1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L), .Label = c("a", "b", "c", "d"
), class = "factor")), .Names = c("index", "X"), row.names = c(NA, 
9L), class = "data.frame")

Second, let me say that the situation you're in isn't very desireable. If you can avoid it, you probably should. Either don't use data frames at all, and only use lists, or avoid constructing the listed data frame entirely (if you can), and directly construct the desired output.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM