简体   繁体   中英

R multiply each row value of one dataframe with each row value of another, create new dataframe

i´m pretty new to R and programming. I have the following problem: I have multiple dataframes like:

Chicago             New York            Miami
county   percent    county percent   county  percent
a          2%        d       4%        g      30%
b          3%        e       6%        h       2%

and one like:

routes
origin      destination       travellers
Chicago     Miami             100
Chicago     New York          200

now i want to multiply each row of one dataframe with each row of the other and get something like:

result
origin      destination    travellers
Chicago$a   Miami$g        2% * 30% *100
Chicago$a   Miami$h        2% * 2% *100
Chicago$b   Miami$g        3% * 30% *100
Chicago$b   Miami$h        3% * 2% *100
Chicago$a   New York$d     2% * 4% *200
Chicago$a   New York$e     2% * 6% *200
Chicago$b   New York$d     3% * 4% *200
Chicago$b   New York$e     3% * 6% *200

My idea is to loop over the routes dataframe and grab for each row the right origin and destination dataframe and multiply every value of the origin df with every value of the destination df and assign the results to a new dataframe. For getting the right dataframes a switch case could be a solution like:

switch(routes$origin,
   "Chicago" = df_origin <-Chicago,
   "Miami" = df_origin<-Miami,
   "New York" = df_origin<-New York,
)
switch(routes$destination,
   "Chicago" = df_destination <-Chicago,
   "Miami" = df_destination<-Miami,
   "New York" = df_destination<-New York,
)

But i don´t know how to do this in a loop and assign the results of the multiplication to a new dataframe. The situation is quite difficult to explain but I hope my problem has become clear. many thanks in advance for each tip!!!

here are results from dput(head(df)):

    dput(head(Chicago))
structure(list(county = structure(1:2, .Label = c("a", "b"), class = "factor"), 
    percent = c(0.02, 0.03)), row.names = 1:2, class = "data.frame")

    dput(head(Miami))
structure(list(county = structure(1:2, .Label = c("g", "h"), class = "factor"), 
    percent = c(0.3, 0.02)), row.names = 1:2, class = "data.frame")

    dput(head(New_york))
structure(list(county = structure(1:2, .Label = c("d", "e"), class = "factor"), 
    percent = c(0.04, 0.06)), row.names = 1:2, class = "data.frame")

    dput(head(routes))
structure(list(origin = structure(c(1L, 1L), .Label = "Chicago", class = "factor"), 
    destination = structure(1:2, .Label = c("Miami", "New York"
    ), class = "factor"), travellers = c(100, 200)), row.names = 1:2, class = "data.frame")

Here is a set of functions that does what the question asks for.

  1. Function perc2num is an auxiliary function, it transforms percentages with the % symbol into the corresponding reals.
  2. Function makeCityList accepts a vector of data frames names and binds them all together.

The last function does all the real work.

perc2num <- function(x) {
  y <- gsub('[^[:digit:]]', '', x)
  y <- as.numeric(y)/100
  y
}

makeCityList <- function(x, sep = '_', envir = .GlobalEnv){
  lst <- mget(x, envir = envir)
  lst <- lapply(seq_along(lst), function(i){
    DF <- lst[[i]]
    Name <- names(lst)[i]
    DF[['city']] <- Name
    DF
  })
  out_lst <- do.call(rbind, lst)
  out_lst[['city']] <- gsub(sep, ' ', out_lst[['city']])
  out_lst[['percent']] <- perc2num(out_lst[['percent']])
  row.names(out_lst) <- NULL
  out_lst
}
mergeRoutesCities <- function(x = routes, y){
  f <- function(x) apply(x, 1, prod)
  i <- grep('county', names(y))
  mrg <- merge(x, y[-i], by.x = 'origin', by.y = 'city')
  mrg <- merge(mrg, y, by.x = 'destination', by.y = 'city')
  mrg[['travellers']] <- f(mrg[c(3, 4, 6)])
  mrg[c(2, 1, 5, 3)]
}

cities_names <- c("Chicago", "New_York", "Miami")
tmp <- makeCityList(cities_names)

mergeRoutesCities(routes, tmp)

origin destination county travellers

#1 Chicago       Miami      g       0.60
#2 Chicago       Miami      h       0.04
#3 Chicago       Miami      g       0.90
#4 Chicago       Miami      h       0.06
#5 Chicago    New York      d       0.16
#6 Chicago    New York      e       0.24
#7 Chicago    New York      d       0.24
#8 Chicago    New York      e       0.36

In base R try:

Chicago <- structure(list(county = structure(1:2, .Label = c("a", "b"), class = "factor"), 
               percent = c(0.02, 0.03)), row.names = 1:2, class = "data.frame")

Miami <- structure(list(county = structure(1:2, .Label = c("g", "h"), class = "factor"), 
               percent = c(0.3, 0.02)), row.names = 1:2, class = "data.frame")

New_york <- structure(list(county = structure(1:2, .Label = c("d", "e"), class = "factor"), 
               percent = c(0.04, 0.06)), row.names = 1:2, class = "data.frame")

routes <- structure(list(origin = structure(c(1L, 1L), .Label = "Chicago", class = "factor"), 
               destination = structure(1:2, .Label = c("Miami", "New_york"
           ), class = "factor"), travellers = c(100, 200)), row.names = 1:2, class = "data.frame")

# solution:
cities <- c("Chicago", "New_york", "Miami") # create vector or list with data frame names

d_orig <- do.call(rbind, lapply(cities, function(x) cbind(get(x), origin = x)))
names(d_orig) <- c("county_orig", "percent_orig", "origin")

d_dest <- do.call(rbind, lapply(cities, function(x) cbind(get(x), destination = x)))
names(d_dest) <- c("county_dest", "percent_dest", "destination")

want <- merge(d_orig, routes, by = "origin")
want <- merge(want, d_dest, by = "destination")
want$travellers_Want <- want$travellers * want$percent_orig * want$percent_dest
want$destination_want <- paste(want$destination, want$county_dest, sep = "$") #?
want$origin_want <- paste(want$origin, want$county_orig, sep = "$")
want


#  destination  origin county_orig percent_orig travellers county_dest percent_dest
#1       Miami Chicago           a         0.02        100           g         0.30
#2       Miami Chicago           a         0.02        100           h         0.02
#3       Miami Chicago           b         0.03        100           g         0.30
#4       Miami Chicago           b         0.03        100           h         0.02
#5    New_york Chicago           a         0.02        200           d         0.04
#6    New_york Chicago           a         0.02        200           e         0.06
#7    New_york Chicago           b         0.03        200           d         0.04
#8    New_york Chicago           b         0.03        200           e         0.06
#  travellers_Want destination_want origin_want
#1            0.60          Miami$g   Chicago$a
#2            0.04          Miami$h   Chicago$a
#3            0.90          Miami$g   Chicago$b
#4            0.06          Miami$h   Chicago$b
#5            0.16       New_york$d   Chicago$a
#6            0.24       New_york$e   Chicago$a
#7            0.24       New_york$d   Chicago$b
#8            0.36       New_york$e   Chicago$b

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM