I have 2 data frames, the first column of one is a list (df A), the first column of the other contains items from the list but in some cases each row has multiple items (df B). What I want to do is go through and create new rows for each item from df A, that occurs in the first column of df B.
DF A
dfA
Index X
1 1 alpha
2 2 beta
3 3 gamma
4 4 delta
DF B
dfB
list X
1 1 4 alpha
2 3 2 1 beta
3 4 1 2 gamma
4 3 delta
Desired
dfC
Index x
1 1 alpha
2 4 alpha
3 3 beta
4 2 beta
5 1 beta
6 4 gamma
7 1 gamma
8 2 gamma
9 3 delta
The actual data I am using: DF A
dput(head(allwines))
structure(list(Wine = c("Albariño", "Aligoté", "Amarone", "Arneis",
"Asti Spumante", "Auslese"), Description = c("Spanish white wine grape that makes crisp, refreshing, and light-bodied wines.",
"White wine grape grown in Burgundy making medium-bodied, crisp, dry wines with spicy character.",
"From Italy’s Veneto Region a strong, dry, long- lived red, made from a blend of partially dried red grapes.",
"A light-bodied dry wine the Piedmont Region of Italy", "From the Piedmont Region of Italy, A semidry sparkling wine produced from the Moscato di Canelli grape in the village of Asti",
"German white wine from grapes that are very ripe and thus high in sugar"
)), .Names = c("Wine", "Description"), row.names = c(NA, 6L), class = "data.frame")
DF B
> dput(head(cheesePairing))
structure(list(Wine = c("Cabernet Sauvignon\r\n \r\n \r\n \r\n \r\n \r\n Pinot Noir\r\n \r\n \r\n \r\n \r\n \r\n Sauvignon Blanc\r\n \r\n \r\n \r\n \r\n \r\n Zinfandel",
"Chianti\r\n \r\n \r\n \r\n \r\n \r\n Pinot Noir\r\n \r\n \r\n \r\n \r\n \r\n Sangiovese",
"Chardonnay", "Bardolino\r\n \r\n \r\n \r\n \r\n \r\n Malbec\r\n \r\n \r\n \r\n \r\n \r\n Riesling\r\n \r\n \r\n \r\n \r\n \r\n Rioja\r\n \r\n \r\n \r\n \r\n \r\n Sauvignon Blanc",
"Tempranillo", "Asti Spumante"), Cheese = c("Abbaye De Belloc Cheese",
"Ardrahan cheese", "Asadero cheese", "Asiago cheese", "Azeitao",
"Baby Swiss Cheese"), Suggestions = c("Pair with apples, sliced pears OR a sampling of olives and thin sliced salami. Pass around slices of baguette.",
"Serve with a substantial wheat cracker and apples or grapes.",
"Rajas (blistered chile strips) fresh corn tortillas", "Table water crackers, raw nuts (almond, walnuts)",
"Nutty brown bread, grapes", "Server with dried fruits, whole grain, nutty breads, nuts"
)), .Names = c("Wine", "Cheese", "Suggestions"), row.names = c(NA,
6L), class = "data.frame")
Building off of Curt's answer, I think I found a more efficient solution...assuming I interpreted your objective correctly.
My commented code is below. You should be able to run this as-is and get the desired dfC. One thing to note is that I assumed (based on your actual data) that the delimiter splitting dfB$Index is "\\r\\n".
# set up fake data
dfA<-data.frame(Index=c('1','2','3','4'), X=c('alpha','beta','gamma','delta'))
dfB<-data.frame(Index=c('1 \r\n 4','3 \r\n 2 \r\n 1','4 \r\n 1 \r\n 2','3'), X=c('alpha','beta','gamma','delta'))
dfA$Index<-as.character(dfA$Index)
dfA$X<-as.character(dfA$X)
dfB$Index<-as.character(dfB$Index)
dfB$X<-as.character(dfB$X)
dfB_index_parsed<-strsplit(x=dfB$Index,"\r\n") # split Index of dfB by delimiter "\r\n" and store in a list
names(dfB_index_parsed)<-dfB$X
dfB_split_num<-lapply(dfB_index_parsed, length) # find the number of splits per row of dfB and store in a list
dfB_split_num_vec<-do.call('c', dfB_split_num) # convert number of splits above from list to vector
g<-do.call('c',dfB_index_parsed) # store all split values in a single vector
g<-gsub(' ','',g) # remove trailing/leading spaces that occur after split
names(g)<-rep(names(dfB_split_num_vec), dfB_split_num_vec ) # associate each split Index from dfB with X from dfB
g<-g[g %in% dfA$Index] # check which dfB$Index are in dfA$Index
dfC<-data.frame(Index=g, X=names(g)) # construct data.frame
First, let me provide a functional answer to your question. I doubt my answer is very efficient, but it works.
# construct toy data
dfA <- data.frame(index = 1:4, X = letters[1:4])
dfB <- data.frame(X = letters[1:4])
dfB$list_elements <- list(c(1, 4), c(3, 2, 1), c(4, 1, 2), c(3))
# define function that provides solution
unlist_merge_df <- function(listed_df, reference_df){
# reference_df assumed to have columns "X" and "index"
# listed_df assumed to have column "list_elements"
df_out <- data.frame(index = c(), X = c())
my_list <- listed_df$list_elements
for(idx in 1:length(my_list)){
df_out <- rbind(df_out,
data.frame(index = my_list[[idx]],
X = listed_df[idx, 'X'])
)
}
return(df_out)
}
# call the function
dfC <- unlist_merge_df(dfB, dfA)
# show output in human and R-parseable formats
dfC
dput(dfC)
The output is:
index X
1 1 a
2 4 a
3 3 b
4 2 b
5 1 b
6 4 c
7 1 c
8 2 c
9 3 d
structure(list(index = c(1, 4, 3, 2, 1, 4, 1, 2, 3), X = structure(c(1L,
1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L), .Label = c("a", "b", "c", "d"
), class = "factor")), .Names = c("index", "X"), row.names = c(NA,
9L), class = "data.frame")
Second, let me say that the situation you're in isn't very desireable. If you can avoid it, you probably should. Either don't use data frames at all, and only use lists, or avoid constructing the listed data frame entirely (if you can), and directly construct the desired output.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.