简体   繁体   中英

create new data frame from a function of other data frames

I'm such a newbie to R that I may have a difficult time asking my question. Please bear with me.

I have two data frames. Let's pretend for the sake of explanation:

df1

Columns represents types of gains: corn, oats, wheat, etc. Rows represents the month of the year, jan, feb, etc. Elements represents the price per ton for that gain type purchased during that particular month.

df2

Columns representing countries: Spain, Chile, Mexico, etc. The rows of this frame represent additional costs for dealing with that country, maybe: Packaging cost, Shipping cost, Country import tax, Inspection fees, etc. for each country.

Now I want to build a third data frame:

df3

It is to represent the total cost of a combination of grains (for example 10% corn, 50% oats, ...) with the associated costs for shipping, tax, etc. for all countries, for each month Assume there is an equation (using data from df1 and df2) to compute the total cost per country per month for a given combination of grains and the additional costs for each country.

For the sake of brevity let's say part of that equation for the total cost for March, and Spain is

cost <- .10 * df1[ “mar”,”oats”]  + df2[“tax”,”Spain”]  + .....

It's straight-forward for me to pick the elements of the second data frame and do the arithmetic with the columns of the first data frame to get the results. for a particular country:

cost <- .10 * df1[ ,”oats”]  + df2[“tax”,”Spain”]  + .....

This gives me the cost for each month for Spain

The problem is: I have to repeat the same arithmetic for every country.

Another version:

  cost <- .10 * df1[ ,”oats”]  + df2[“tax”,]  + .....

Gives me the cost for each country, but only for January

I'd like to one set of equations that gives me the the total cost per month for all counties. Another words, df3 will have the same number of rows as df1 (months), and the same number of columns as df2 (countries).

Edit ... pasting in example posted in a closed question:

# build df1 - cost of grains (with goofy data so I can track the arithemetic)
  v1 <- c(1:12)
  v2 <- c(13:24)
  v3 <- c(25:36)
  v4 <- c(37:48)
  grain <- data.frame("wheat"=v1,"oats"=v2,"corn"=v3,"rye"=v4)

  grain

# build df2 - additional costs (again, with goofy data to see what is being used where and when)
  w1 <- c(1.3:4.3)
  w2 <- c(5.3:8.3)
  w3 <- c(9.3:12.3)
  w4 <- c(13.3:16.3)
  cost <- data.frame("Spain"=w1,"Peru"=w2,"Mexico"=w3,"Kenya"=w4)
  row.names(cost) <- c("packing","shipping","tax","inspection")

  cost

# assume 10% wheat, 30% oats and 60% rye with some clown-equation for total cost
# now for my feeble attempt at getting a dataframe that has 12 rows (months) and 4 column (countries)

  total_cost <- data.frame( 0.1*grain[,"wheat"] +
                            0.3*grain[,"oats"] +
                            0.6*grain[,"rye"] +
                            cost["packing","Mexico"] +
                            cost["shipping","Mexico"] +
                            cost["tax","Mexico"]  +
                            cost["inspection","Mexico"] )
  total_cost

You have a couple of choices: one would be to use the outer function supplying inputs of the 'month' vector and the 'country' vector from the colnames of df2 and using a function that would pull the 'cost' components from df1 and df2. (Could not get that approach to work.) You would get a 'month' x 'country' matrix. Another would be to transpose the df2 dataframe and merge using all=TRUE with df1 getting a "long" format dataframe from which you could do column operations with your formulas, and then reshape to a format that is "wide" in 'countries'. Details will depend on the specific data setup and you have not offered an example yet.

This will give you a 12 x 4 grid of combinations of months and countries:

 dfrm <- expand.grid(grain$months,  colnames(cost) )

This will give you a function that takes a month value and a country value and calculates the expression above:

 costcros <- function(x) { sum(grain[ grain[, 'months'] == x[1], c(1,2,4)]*c(0.1,0.3,0.6) ) + 
                           sum( cost[, x[2]]) }

This adds the calculation to each row of dfrm:

 dfrm$crosscost <- apply(expand.grid(grain$months,  colnames(cost) ), 1,  costcros)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM