[英]new to R, and getting this error message, how do I omit NA in my cohort to analyze my data?
[英]How do I correctly analyze and plot my hotel booking data in R?
我是 R 的初學者,我的數據是一家擁有 3 間不同公寓的酒店及其在 2018 年至 2022 年之間的預訂數據。對於每個預訂,我都有一個到達日期和一個離開日期(已經采用日期格式),預訂的總晚數以及他們住在哪間公寓。我現在想分析和 plot 數據,以了解預訂數量隨時間的整體變化以及每個不同公寓的變化。 我真的很想在 ggplot 中得到一個看起來像 geom_line 圖的圖,但我不知道最好的方法,以及如何將“預訂數量”放入變量中。
謝謝!
這是我的數據的負責人:
arr_date dep_date total_nights hochsaison mittelsaison nebensaison ost west sued ost.west sued.west sued.ost gesamtes_haus year month wday
1 2018-01-12 2018-01-14 2 0 0 2 0 0 0 0 0 2 0 2018 Jan Fr
2 2018-01-17 2018-01-21 4 0 0 4 0 4 0 0 0 0 0 2018 Jan Mi
3 2018-01-21 2018-01-24 3 0 0 3 0 0 3 0 0 0 0 2018 Jan So
4 2018-02-09 2018-02-11 2 0 2 0 0 0 0 2 0 0 0 2018 Feb Fr
5 2018-02-09 2018-02-13 4 0 4 0 0 0 4 0 0 0 0 2018 Feb Fr
6 2018-02-16 2018-02-18 2 0 0 2 0 0 0 0 0 0 2 2018 Feb Fr
Rows: 323
Columns: 16
$ arr_date <date> 2018-01-12, 2018-01-17, 2018-01-21, 2018-02-09, 2018-02-09, 2018-02-16, 2018-02-23, 2018-02-22, 2018-03-19, 2018-03-27, 2018-03-29, 2018-04-19, 2018-04-27, 201…
$ dep_date <date> 2018-01-14, 2018-01-21, 2018-01-24, 2018-02-11, 2018-02-13, 2018-02-18, 2018-02-25, 2018-02-25, 2018-03-24, 2018-04-03, 2018-04-04, 2018-04-24, 2018-05-01, 201…
$ total_nights <dbl> 2, 4, 3, 2, 4, 2, 2, 3, 5, 7, 5, 5, 4, 3, 6, 5, 17, 7, 7, 15, 13, 9, 19, 7, 7, 14, 8, 13, 10, 11, 6, 7, 14, 6, 5, 5, 13, 7, 7, 10, 3, 6, 5, 7, 12, 12, 12, 2, 2,…
$ hochsaison <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 5, 0, 3, 3, 6, 5, 17, 7, 7, 15, 13, 9, 19, 7, 7, 14, 8, 13, 10, 11, 6, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ mittelsaison <dbl> 0, 0, 0, 2, 4, 0, 0, 0, 0, 0, 0, 5, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 14, 6, 5, 5, 13, 7, 7, 10, 3, 6, 5, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ nebensaison <dbl> 2, 4, 3, 0, 0, 2, 2, 3, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 12, 12, 2, 2, 4, 2, 4, 1…
$ ost <dbl> 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 17, 0, 0, 15, 0, 0, 19, 0, 0, 0, 0, 13, 0, 0, 0, 0, 0, 0, 5, 5, 0, 7, 0, 0, 0, 6, 0, 7, 12, 0, 0, 0, 0, 0, 0, 0,…
$ west <dbl> 0, 4, 0, 0, 0, 0, 0, 0, 5, 0, 5, 5, 0, 0, 6, 0, 0, 0, 7, 0, 13, 0, 0, 7, 0, 14, 0, 0, 0, 11, 0, 7, 14, 0, 0, 0, 0, 0, 0, 10, 0, 0, 5, 0, 0, 12, 0, 0, 0, 0, 2, 0…
$ sued <dbl> 0, 0, 3, 0, 4, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 7, 0, 8, 0, 10, 0, 6, 0, 0, 6, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 12, 0, 0, 4, 0, 0, 0,…
$ ost.west <dbl> 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ sued.west <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 13, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ sued.ost <dbl> 2, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0…
$ gesamtes_haus <dbl> 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 4, 3, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0…
$ year <dbl> 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018…
$ month <chr> "Jan", "Jan", "Jan", "Feb", "Feb", "Feb", "Feb", "Feb", "Mär", "Mär", "Mär", "Apr", "Apr", "Mai", "Mai", "Jun", "Jun", "Jun", "Jun", "Jul", "Jul", "Jul", "Jul",…
$ wday <chr> "Fr", "Mi", "So", "Fr", "Fr", "Fr", "Fr", "Do", "Mo", "Di", "Do", "Do", "Fr", "Fr", "Sa", "Fr", "Do", "Sa", "Sa", "Sa", "So", "Do", "So", "Sa", "Sa", "Sa", "Sa"…
>
structure(list(arr_date = structure(c(17543, 17548, 17552, 17571,
17571, 17578), class = "Date"), dep_date = structure(c(17545,
17552, 17555, 17573, 17575, 17580), class = "Date"), total_nights = c(2,
4, 3, 2, 4, 2), hochsaison = c(0, 0, 0, 0, 0, 0), mittelsaison = c(0,
0, 0, 2, 4, 0), nebensaison = c(2, 4, 3, 0, 0, 2), ost = c(0,
0, 0, 0, 0, 0), west = c(0, 4, 0, 0, 0, 0), sued = c(0, 0, 3,
0, 4, 0), ost.west = c(0, 0, 0, 2, 0, 0), sued.west = c(0, 0,
0, 0, 0, 0), sued.ost = c(2, 0, 0, 0, 0, 0), gesamtes_haus = c(0,
0, 0, 0, 0, 2), year = c(2018, 2018, 2018, 2018, 2018, 2018),
month = c("Jan", "Jan", "Jan", "Feb", "Feb", "Feb"), wday = c("Fr",
"Mi", "So", "Fr", "Fr", "Fr"), wohnung = c("kombi", "west",
"sued", "kombi", "sued", "gesamtes haus"), saison = c("nebensaison",
"nebensaison", "nebensaison", "mittelsaison", "mittelsaison",
"nebensaison")), row.names = c(NA, 6L), class = "data.frame")
這里有一些東西可以幫助您入門。 我建議你玩這個,完善你的想法,當你對自己的目標有更清晰的想法時提出一個新問題。
library(dplyr)
library(tidyr)
library(lubridate)
library(purrr)
library(ggplot2)
apt = c("ost", "west", "sued", "ost.west", "sued.west", "sued.ost")
df %>%
## get the days in each booking interval
mutate(days = map2(arr_date, dep_date, .f = seq.Date, by = "1 day")) %>%
## get rid of unneeded columns
select(all_of(apt), days) %>%
## convert the night counts in each apartment to 1 or 0
mutate(across(all_of(apt), pmin, 1)) %>%
## pivot to long format
pivot_longer(cols = all_of(apt), names_to = "apt") %>%
## discard apartments with no booking
filter(value != 0) %>%
select(-value) %>%
## convert days to its own column
unnest(days) -> df_long
## count days for plotting
count(df_long, days) %>%
ggplot(aes(x = days, y = n)) +
geom_line() +
labs(
x = "date",
y = "separate bookings"
) +
expand_limits(y = 0)
要了解代碼的作用,請逐行運行以查看每次發生的變化(首先是df %>% mutate(...)
,然后是df %>% mutate(...) %>% select(...)
,然后df %>% mutate(...) %>% select(...) %>% mutate(...)
)。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.