简体   繁体   中英

R - Creating a new column based on a conditional observation and applying it to the master df

I have a very large dataframe (with ~15 million observations of 10 variables). The df is essentially results for a set of cities under various scenarios (conditions). Here is a simplified view of the df:

State City Result Year Condition1 Condition2 Condition3
AL Cottonwood 4.5 2000 p5 a10 d20
....
AL Cottonwood 2.5 2010 p10 a20 d50

I am trying to create a new column ("base") that is equal to a given city's result under the various scenarios for the year 2000. Because of the many scenarios, I am having a lot of difficulty doing this.

Thank you!

So you want a comparison on each row for those conditions but the year 2000?

The way I would go about it would be to join the dataframe onto itself filtered to the year 2000. Assuming you dataframe is called df

require(dplyr)
df_base <- df %>% left_join(
  df %>% 
    filter(Year == 2000) %>% #get just year 2000 results
    select(-Year) %>% #remove year so that it does not join on it
    rename(base = result) #rename the result column of the cut dataframe to base
)

This will join by all other columns that aren't year, meaning the same state and city and all your conditionals, and return the full dataframe with a new column called "base" with the year 2000 result for state+city+conditions. If there are other columns you don't wish to join on you can either remove them in the select, or specify all columns to join on explicitly by using the "by" variable in the left_join.

Consider ave for calculation of records across same multiple groups and have Result return itself with identity() .

# YEAR 2000 CALCULATION
df$Base <- with(df, ifelse(Year == 2000,
                           ave(Result, Condition1, Condition2, Condition3, FUN=identity),
                           NA)
               )

# ASSIGN 2000 RESULT TO ALL OTHER YEARS
df$Base <- with(df, ave(Base, Condition1, Condition2, Condition3, FUN=function(x) max(x, na.rm=TRUE)))

Not sure of performance across ~15 mill obs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM