简体   繁体   中英

map one dataframe to another

I have a dataframe with the food quantity for an individual:

set.seed(1)
quantity <- data.frame(apple = sample(0:10, 5, replace = TRUE),
                       egg = sample(0:10, 5, replace = TRUE),
                       beer = sample(0:10, 5, replace = TRUE))

eg. the first person ate 8 apple, 6 eggs and drank 0 beers, 5 person in total

I also have a reference table with market weights and nutrient intake:

reference <- data.frame(name = c("apple", "apple", "egg", "beer", "beer", "beer"),
                        market_weight = c(0.4, 0.6, 1, 0.2, 0.7, 0.1),
                        nutr1 = sample(1:999, 6, replace = TRUE),
                        nutr2 = sample(1:999, 6, replace = TRUE),
                        nutr3 = sample(1:999, 6, replace = TRUE))

for each person, I need to know the nutrient intake (ie. nutr1) according to the food quantities they eat.

Expected result (5 rows - each for participants):

nutr1    nutr2    nutr3
7814.8  4996.4    9053.6  
  W        T        K  
.....    ....     .....

My (inefficient) solution:

here I join quantities and nutrient intake

library(dplyr)
merged <- quantity %>%
  t %>%
  as.data.frame %>%
  tibble::rownames_to_column() %>%
  `colnames<-`(c("name","id1","id2", "id3", "id4", "id5")) %>%
  right_join(., reference, by= "name") %>%
  na.omit

here I multiply quantities * market_weight * nutrients (1 to 3) and sum for each nutrient

out <- merged %>%
  mutate(mutr1_final = id1 * market_weight * nutr1,
         mutr2_final = id1 * market_weight * nutr2,
         mutr3_final = id1 * market_weight * nutr3) %>%
  summarise_at(., vars(c(mutr1_final, mutr2_final, mutr3_final)), funs(sum))

With real data, the dataframe quantity contains 40k lines (aka participants) and the number of nutrients is 80-ish. What it is an efficient way to do this? Thanks

solution:

set.seed(1)
quantity <- data.frame(apple = sample(0:10, 5, replace = TRUE),
                       egg = sample(0:10, 5, replace = TRUE),
                       beer = sample(0:10, 5, replace = TRUE))

reference <- data.frame(name = c("apple", "apple", "egg", "beer", "beer", "beer"),
                        market_weight = c(0.4, 0.6, 1, 0.2, 0.7, 0.1),
                        nutr1 = sample(1:999, 6, replace = TRUE),
                        nutr2 = sample(1:999, 6, replace = TRUE),
                        nutr3 = sample(1:999, 6, replace = TRUE)) %>% 
 # multiply market_weight by nutrients 
  mutate(nutr1 = market_weight*nutr1,          
         nutr2 = market_weight*nutr2,
         nutr3 = market_weight*nutr3) %>%
 # sum within fruit name
  group_by(name) %>% 
  summarise_all(sum) %>%
  as.data.frame() %>%
  select(-market_weight)

# merge quantity and reference (same line: quantity and combined food intake)
merged <- t(quantity) %>%
  as.data.frame %>%
  tibble::rownames_to_column(., "name") %>%
  right_join(., reference, by="name")

# multiplication and summation
out <- matrix(data=NA, nrow=5, ncol=3) %>%
  as.data.frame %>%
  `colnames<-`(colnames(reference)[2:4])
for(i in 2:6) {
  for(j in 7:9){
    out[i-1,j-6] = sum(merged[, i] * merged[, j])
  }
}

more efficient solutions are appreciated!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM