简体   繁体   中英

How can I simplify my R-code to make it shorter?

I'm analyzing data from a vacation home and I have this very long piece of code when I was trying to calculate prices for each apartment for each year and season. It works, but it is awfully long. I'm still a beginner in R and I would love to know also so I can learn from this. This is my code:

ue <- ue %>% 
  mutate(price_night = case_when(
    ost > 0 & hochsaison > 0 & year == 2018 ~ as.numeric('180'),
    ost > 0 & mittelsaison > 0 & year == 2018 ~ as.numeric('160'),
    ost > 0 & nebensaison > 0 & year == 2018 ~ as.numeric('140'),
    west > 0 & hochsaison > 0 & year == 2018 ~ as.numeric('180'),
    west > 0 & mittelsaison > 0 & year == 2018 ~ as.numeric('160'),
    west > 0 & nebensaison > 0 & year == 2018 ~ as.numeric('140'),
    sued > 0 & hochsaison > 0 & year == 2018 ~ as.numeric('100'),
    sued > 0 & mittelsaison > 0 & year == 2018 ~ as.numeric('80'),
    sued > 0 & nebensaison > 0 & year == 2018 ~ as.numeric('60'),
    ost.west > 0 & hochsaison > 0 & year == 2018 ~ as.numeric('360'),
    ost.west > 0 & mittelsaison > 0 & year == 2018 ~ as.numeric('320'),
    ost.west > 0 & nebensaison > 0 & year == 2018 ~ as.numeric('280'),
    sued.ost > 0 & hochsaison > 0 & year == 2018 ~ as.numeric('280'),
    sued.ost > 0 & mittelsaison > 0 & year == 2018 ~ as.numeric('240'),
    sued.ost > 0 & nebensaison > 0 & year == 2018 ~ as.numeric('200'),
    sued.west > 0 & hochsaison > 0 & year == 2018 ~ as.numeric('280'),
    sued.west > 0 & mittelsaison > 0 & year == 2018 ~ as.numeric('240'),
    sued.west > 0 & nebensaison > 0 & year == 2018 ~ as.numeric('200'),
    gesamtes_haus > 0 & hochsaison > 0 & year == 2018 ~ as.numeric('460'),
    gesamtes_haus > 0 & mittelsaison > 0 & year == 2018 ~ as.numeric('400'),
    gesamtes_haus > 0 & nebensaison > 0 & year == 2018 ~ as.numeric('340'),ost > 0 & hochsaison > 0 & year == 2022 ~ as.numeric('240'),
    ost > 0 & mittelsaison > 0 & year == 2022 ~ as.numeric('210'),
    ost > 0 & nebensaison > 0 & year == 2022 ~ as.numeric('170'),
    west > 0 & hochsaison > 0 & year == 2022 ~ as.numeric('240'),
    west > 0 & mittelsaison > 0 & year == 2022 ~ as.numeric('210'),
    west > 0 & nebensaison > 0 & year == 2022 ~ as.numeric('170'),
    sued > 0 & hochsaison > 0 & year == 2022 ~ as.numeric('120'),
    sued > 0 & mittelsaison > 0 & year == 2022 ~ as.numeric('100'),
    sued > 0 & nebensaison > 0 & year == 2022 ~ as.numeric('80'),
    ost.west > 0 & hochsaison > 0 & year == 2022 ~ as.numeric('480'),
    ost.west > 0 & mittelsaison > 0 & year == 2022 ~ as.numeric('420'),
    ost.west > 0 & nebensaison > 0 & year == 2022 ~ as.numeric('340'),
    sued.ost > 0 & hochsaison > 0 & year == 2022 ~ as.numeric('360'),
    sued.ost > 0 & mittelsaison > 0 & year == 2022 ~ as.numeric('310'),
    sued.ost > 0 & nebensaison > 0 & year == 2022 ~ as.numeric('250'),
    sued.west > 0 & hochsaison > 0 & year == 2022 ~ as.numeric('360'),
    sued.west > 0 & mittelsaison > 0 & year == 2022 ~ as.numeric('310'),
    sued.west > 0 & nebensaison > 0 & year == 2022 ~ as.numeric('250'),
    gesamtes_haus > 0 & hochsaison > 0 & year == 2022 ~ as.numeric('600'),
    gesamtes_haus > 0 & mittelsaison > 0 & year == 2022 ~ as.numeric('520'),
    gesamtes_haus > 0 & nebensaison > 0 & year == 2022 ~ as.numeric('420'),
    TRUE ~ as.numeric('0')))

I would switch your strategy from storing all the identities of price_night into a file instead of code. There are enough conditions here to warrant a different strategy.

It's generally a good idea to store data in files instead of code. It's easier to edit a file than code. You can display the table stored in the file easier than showing collaborators the code. These are just two reasons.

The way I'd accomplish it in your case:

  • store the various conditions as new columns in your data frame. For example, a column west_gt_0 = west > 0 .
  • create a.csv file in Excel with those same column names and the value of price_night that they define. The columns will be like west_gt_0, nebensaison_gt_0, ..., year, price_night
  • Read this table into code and join it to your data frame.
  • Share the table with the definitions with your collaborators and/or display in your notebook so people can see it and possibly spot any errors or mistakes
price_def <- read_csv("price_night.csv")
ue <- left_join(
  ue, 
  price_def, 
  by = names(price_def) %>% setdiff("price_night")
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM