简体   繁体   English

R 将一个 dataframe 的每一行值与另一个的每一行值相乘,创建新的 dataframe

[英]R multiply each row value of one dataframe with each row value of another, create new dataframe

i´m pretty new to R and programming.我对 R 和编程很陌生。 I have the following problem: I have multiple dataframes like:我有以下问题:我有多个数据框,例如:

Chicago             New York            Miami
county   percent    county percent   county  percent
a          2%        d       4%        g      30%
b          3%        e       6%        h       2%

and one like:和一个像:

routes
origin      destination       travellers
Chicago     Miami             100
Chicago     New York          200

now i want to multiply each row of one dataframe with each row of the other and get something like:现在我想将一个 dataframe 的每一行与另一行的每一行相乘,得到如下结果:

result
origin      destination    travellers
Chicago$a   Miami$g        2% * 30% *100
Chicago$a   Miami$h        2% * 2% *100
Chicago$b   Miami$g        3% * 30% *100
Chicago$b   Miami$h        3% * 2% *100
Chicago$a   New York$d     2% * 4% *200
Chicago$a   New York$e     2% * 6% *200
Chicago$b   New York$d     3% * 4% *200
Chicago$b   New York$e     3% * 6% *200

My idea is to loop over the routes dataframe and grab for each row the right origin and destination dataframe and multiply every value of the origin df with every value of the destination df and assign the results to a new dataframe.我的想法是遍历路线 dataframe 并为每一行获取正确的起点和终点 dataframe 并将起点 df 的每个值乘以目标 df 的每个值并将结果分配给新的 Z6A8064Z5DF47945505700 For getting the right dataframes a switch case could be a solution like:为了获得正确的数据帧,开关盒可能是一种解决方案,例如:

switch(routes$origin,
   "Chicago" = df_origin <-Chicago,
   "Miami" = df_origin<-Miami,
   "New York" = df_origin<-New York,
)
switch(routes$destination,
   "Chicago" = df_destination <-Chicago,
   "Miami" = df_destination<-Miami,
   "New York" = df_destination<-New York,
)

But i don´t know how to do this in a loop and assign the results of the multiplication to a new dataframe.但我不知道如何在循环中执行此操作并将乘法结果分配给新的 dataframe。 The situation is quite difficult to explain but I hope my problem has become clear.这种情况很难解释,但我希望我的问题已经清楚了。 many thanks in advance for each tip!!!非常感谢每个提示!!!!

here are results from dput(head(df)):以下是 dput(head(df)) 的结果:

    dput(head(Chicago))
structure(list(county = structure(1:2, .Label = c("a", "b"), class = "factor"), 
    percent = c(0.02, 0.03)), row.names = 1:2, class = "data.frame")

    dput(head(Miami))
structure(list(county = structure(1:2, .Label = c("g", "h"), class = "factor"), 
    percent = c(0.3, 0.02)), row.names = 1:2, class = "data.frame")

    dput(head(New_york))
structure(list(county = structure(1:2, .Label = c("d", "e"), class = "factor"), 
    percent = c(0.04, 0.06)), row.names = 1:2, class = "data.frame")

    dput(head(routes))
structure(list(origin = structure(c(1L, 1L), .Label = "Chicago", class = "factor"), 
    destination = structure(1:2, .Label = c("Miami", "New York"
    ), class = "factor"), travellers = c(100, 200)), row.names = 1:2, class = "data.frame")

Here is a set of functions that does what the question asks for.这是一组功能,可以满足问题的要求。

  1. Function perc2num is an auxiliary function, it transforms percentages with the % symbol into the corresponding reals. Function perc2num是一个辅助 function,它将带有%符号的百分比转换为相应的实数。
  2. Function makeCityList accepts a vector of data frames names and binds them all together. Function makeCityList接受数据帧名称的向量并将它们绑定在一起。

The last function does all the real work.最后一个 function 完成所有实际工作。

perc2num <- function(x) {
  y <- gsub('[^[:digit:]]', '', x)
  y <- as.numeric(y)/100
  y
}

makeCityList <- function(x, sep = '_', envir = .GlobalEnv){
  lst <- mget(x, envir = envir)
  lst <- lapply(seq_along(lst), function(i){
    DF <- lst[[i]]
    Name <- names(lst)[i]
    DF[['city']] <- Name
    DF
  })
  out_lst <- do.call(rbind, lst)
  out_lst[['city']] <- gsub(sep, ' ', out_lst[['city']])
  out_lst[['percent']] <- perc2num(out_lst[['percent']])
  row.names(out_lst) <- NULL
  out_lst
}
mergeRoutesCities <- function(x = routes, y){
  f <- function(x) apply(x, 1, prod)
  i <- grep('county', names(y))
  mrg <- merge(x, y[-i], by.x = 'origin', by.y = 'city')
  mrg <- merge(mrg, y, by.x = 'destination', by.y = 'city')
  mrg[['travellers']] <- f(mrg[c(3, 4, 6)])
  mrg[c(2, 1, 5, 3)]
}

cities_names <- c("Chicago", "New_York", "Miami")
tmp <- makeCityList(cities_names)

mergeRoutesCities(routes, tmp)

origin destination county travellers始发地 目的地县 旅客

#1 Chicago       Miami      g       0.60
#2 Chicago       Miami      h       0.04
#3 Chicago       Miami      g       0.90
#4 Chicago       Miami      h       0.06
#5 Chicago    New York      d       0.16
#6 Chicago    New York      e       0.24
#7 Chicago    New York      d       0.24
#8 Chicago    New York      e       0.36

In base R try:在基础 R 尝试:

Chicago <- structure(list(county = structure(1:2, .Label = c("a", "b"), class = "factor"), 
               percent = c(0.02, 0.03)), row.names = 1:2, class = "data.frame")

Miami <- structure(list(county = structure(1:2, .Label = c("g", "h"), class = "factor"), 
               percent = c(0.3, 0.02)), row.names = 1:2, class = "data.frame")

New_york <- structure(list(county = structure(1:2, .Label = c("d", "e"), class = "factor"), 
               percent = c(0.04, 0.06)), row.names = 1:2, class = "data.frame")

routes <- structure(list(origin = structure(c(1L, 1L), .Label = "Chicago", class = "factor"), 
               destination = structure(1:2, .Label = c("Miami", "New_york"
           ), class = "factor"), travellers = c(100, 200)), row.names = 1:2, class = "data.frame")

# solution:
cities <- c("Chicago", "New_york", "Miami") # create vector or list with data frame names

d_orig <- do.call(rbind, lapply(cities, function(x) cbind(get(x), origin = x)))
names(d_orig) <- c("county_orig", "percent_orig", "origin")

d_dest <- do.call(rbind, lapply(cities, function(x) cbind(get(x), destination = x)))
names(d_dest) <- c("county_dest", "percent_dest", "destination")

want <- merge(d_orig, routes, by = "origin")
want <- merge(want, d_dest, by = "destination")
want$travellers_Want <- want$travellers * want$percent_orig * want$percent_dest
want$destination_want <- paste(want$destination, want$county_dest, sep = "$") #?
want$origin_want <- paste(want$origin, want$county_orig, sep = "$")
want


#  destination  origin county_orig percent_orig travellers county_dest percent_dest
#1       Miami Chicago           a         0.02        100           g         0.30
#2       Miami Chicago           a         0.02        100           h         0.02
#3       Miami Chicago           b         0.03        100           g         0.30
#4       Miami Chicago           b         0.03        100           h         0.02
#5    New_york Chicago           a         0.02        200           d         0.04
#6    New_york Chicago           a         0.02        200           e         0.06
#7    New_york Chicago           b         0.03        200           d         0.04
#8    New_york Chicago           b         0.03        200           e         0.06
#  travellers_Want destination_want origin_want
#1            0.60          Miami$g   Chicago$a
#2            0.04          Miami$h   Chicago$a
#3            0.90          Miami$g   Chicago$b
#4            0.06          Miami$h   Chicago$b
#5            0.16       New_york$d   Chicago$a
#6            0.24       New_york$e   Chicago$a
#7            0.24       New_york$d   Chicago$b
#8            0.36       New_york$e   Chicago$b

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM