简体   繁体   中英

Create a table from another table information in R

I have a dataframe CLIENTS with: Hotel number Nationality Date of arrive Date of departure Something like:

Client   Nationality   Hotel   Dateofarrive   DateofDeparture
Cl1          es         h1      21/07/2019    24/07/2019
Cl2          es         h1      23/07/2019    24/07/2019
Cl3          es         h1      06/07/2019    10/07/2019
Cl4          es         h2      05/07/2019    06/07/2019
Cl5          fr         h3      01/07/2019    02/07/2019
Cl6          pt         h1      07/07/2019    09/07/2019

And i want a dataframe Hotel and for each hotel of the dataframe i want all the clients that are of spanish nationality and all the nights they have sleept in the hotel, the same for french nationality and portuguese. Something like:

Hotel    CliEspan  Nights  CliFrench Night CliPortug Night 
H1           3       8       0         0          1      2
H2           1       5       1         1          0      0

An option using dplyr and tidyr . We convert Dateofarrive and DateofDeparture to actual Date object then calculate the difference in by subtracting two dates, group_by Hotel and Nationality and sum total number of days and total number of entries in each group. Convert to long format, combine columns and spread it to wide format by filling missing values to 0.

library(dplyr)
library(tidyr)

df %>%
  mutate_at(vars(Dateofarrive, DateofDeparture), as.Date, "%d/%m/%Y") %>%
  mutate(days = as.integer(DateofDeparture - Dateofarrive)) %>%
  group_by(Hotel, Nationality) %>%
  summarise(total = sum(days), 
             n = n()) %>%
  gather(key, value, total, n) %>%
  unite(col, Nationality, key, sep = "_") %>%
  spread(col, value, fill = 0)

Not sure why we have 5 nights for hotel h2 but another dplyr possibility(can then spread :

df %>% 
   group_by(Client,Hotel,Nationality) %>%
  mutate_at(vars(contains("Date")),list(~lubridate::dmy(.))) %>% 
   summarise(Time = DateofDeparture- Dateofarrive) %>% 
   ungroup() %>% 
   group_by(Hotel, Nationality) %>% 
   mutate(Nights= as.numeric(sum(Time))) %>% 
   tidyr::spread(Hotel, Nights, fill=0) %>% 
   rename_at(vars(contains("h")),list(~paste0(.,"_nights")))
# A tibble: 6 x 6
# Groups:   Nationality [3]
  Client Nationality Time   h1_nights h2_nights h3_nights
  <chr>  <chr>       <drtn>     <dbl>     <dbl>     <dbl>
1 Cl1    es          3 days         8         0         0
2 Cl2    es          1 days         8         0         0
3 Cl3    es          4 days         8         0         0
4 Cl4    es          1 days         0         1         0
5 Cl5    fr          1 days         0         0         1
6 Cl6    pt          2 days         2         0         0

Here is a data.table option using dcast to cast the data to wide format.

library(data.table)
setDT(df)
# convert to date and calculate nights
df[, Dateofarrive := as.Date(Dateofarrive, format = '%d/%m/%Y')]
df[, DateofDeparture := as.Date(DateofDeparture, format = '%d/%m/%Y')]
df[, nights := as.numeric(DateofDeparture - Dateofarrive)]

# dcast to wide format
new <- dcast(df, Hotel ~ Nationality, value.var = 'nights',
             fun.aggregate = list(sum, length))
names(new) <- gsub('nights_length', 'clients', names(new))

new
#    Hotel nights_sum_es nights_sum_fr nights_sum_pt clients_es clients_fr clients_pt
# 1:    h1             8             0             2          3          0          1
# 2:    h2             1             0             0          1          0          0
# 3:    h3             0             1             0          0          1          0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM