[英]How to combine dataframes in R based-on similar timelines for multiple attributes and then transforming the data to make these columns as row headers?
我正在嘗試將我的銷售數據和患者數據合並到 R(以及一些其他屬性)中,這些數據是在同一時間范圍內在國家/地區級別匯總的。 合並后,我想將其合並為長格式而不是寬格式,並在國家月級別保持其唯一性。
這是我的輸入數據的樣子 -
1) 銷售數據
Coutry_ID Country_Name 1/28/2018 2/28/2018 3/28/2018 4/28/2018 5/28/2018
A0001 USA 44 72 85 25 72
A0002 Germany 98 70 69 48 41
A0003 Russia 82 42 32 29 43
A0004 UK 79 83 51 48 47
A0005 France 45 75 10 13 23
A0006 India 92 85 28 13 18
2) 患者數據
Coutry_ID Country_Name 1/28/2018 2/28/2018 3/28/2018 4/28/2018 5/28/2018
A0001 USA 7 13 22 23 13
A0002 Germany 9 10 17 25 25
A0003 Russia 24 19 6 8 5
A0004 UK 6 8 20 1 11
A0005 France 4 9 8 10 25
A0006 India 18 21 2 13 17
這就是我打算輸出的樣子 -
Coutry_ID Country_Name Month Sales Patients
A0001 USA 1/28/2018 44 7
A0001 USA 2/28/2018 72 13
A0001 USA 3/28/2018 85 22
A0001 USA 4/28/2018 25 23
A0001 USA 5/28/2018 72 13
A0002 Germany 1/28/2018 98 9
A0002 Germany 2/28/2018 70 10
A0002 Germany 3/28/2018 69 17
A0002 Germany 4/28/2018 48 25
A0002 Germany 5/28/2018 41 25
A0003 Russia 1/28/2018 82 24
A0003 Russia 2/28/2018 42 19
A0003 Russia 3/28/2018 32 6
A0003 Russia 4/28/2018 29 8
A0003 Russia 5/28/2018 43 5
A0004 UK 1/28/2018 79 6
A0004 UK 2/28/2018 83 8
A0004 UK 3/28/2018 51 20
A0004 UK 4/28/2018 48 1
A0004 UK 5/28/2018 47 11
A0005 France 1/28/2018 45 4
A0005 France 2/28/2018 75 9
A0005 France 3/28/2018 10 8
A0005 France 4/28/2018 13 10
A0005 France 5/28/2018 23 25
A0006 India 1/28/2018 92 18
A0006 India 2/28/2018 85 21
A0006 India 3/28/2018 28 2
A0006 India 4/28/2018 13 13
A0006 India 5/28/2018 18 17
我需要關於這兩件事的一些指導-
1 - 如何將數據從寬轉換為長?
2 - 為了合並數據,我正在考慮在所有這些數據集上使用 DPLYR left_join 以及我的帶有 ID 和名稱的國家/地區主列表。 我的疑問是我是否應該先將數據集從寬格式轉換為長格式,還是合並后再做?
您可以獲得長格式的兩個數據幀,然后加入:
library(dplyr)
library(tidyr)
inner_join(
sales %>% pivot_longer(cols = -c(Coutry_ID, Country_Name), values_to = 'Sales'),
patients %>% pivot_longer(cols = -c(Coutry_ID, Country_Name),
values_to = 'Patients'),
by = c("Coutry_ID", "Country_Name", "name"))
# A tibble: 30 x 5
# Coutry_ID Country_Name name Sales Patients
# <fct> <fct> <chr> <int> <int>
# 1 A0001 USA 1/28/2018 44 7
# 2 A0001 USA 2/28/2018 72 13
# 3 A0001 USA 3/28/2018 85 22
# 4 A0001 USA 4/28/2018 25 23
# 5 A0001 USA 5/28/2018 72 13
# 6 A0002 Germany 1/28/2018 98 9
# 7 A0002 Germany 2/28/2018 70 10
# 8 A0002 Germany 3/28/2018 69 17
# 9 A0002 Germany 4/28/2018 48 25
#10 A0002 Germany 5/28/2018 41 25
# … with 20 more rows
數據
sales <- structure(list(Coutry_ID = structure(1:6, .Label = c("A0001",
"A0002", "A0003", "A0004", "A0005", "A0006"), class = "factor"),
Country_Name = structure(c(6L, 2L, 4L, 5L, 1L, 3L), .Label = c("France",
"Germany", "India", "Russia", "UK", "USA"), class = "factor"),
`1/28/2018` = c(44L, 98L, 82L, 79L, 45L, 92L), `2/28/2018` = c(72L,
70L, 42L, 83L, 75L, 85L), `3/28/2018` = c(85L, 69L, 32L,
51L, 10L, 28L), `4/28/2018` = c(25L, 48L, 29L, 48L, 13L,
13L), `5/28/2018` = c(72L, 41L, 43L, 47L, 23L, 18L)), class =
"data.frame", row.names = c(NA, -6L))
patients <- structure(list(Coutry_ID = structure(1:6, .Label = c("A0001",
"A0002", "A0003", "A0004", "A0005", "A0006"), class = "factor"),
Country_Name = structure(c(6L, 2L, 4L, 5L, 1L, 3L), .Label = c("France",
"Germany", "India", "Russia", "UK", "USA"), class = "factor"),
`1/28/2018` = c(7L, 9L, 24L, 6L, 4L, 18L), `2/28/2018` = c(13L,
10L, 19L, 8L, 9L, 21L), `3/28/2018` = c(22L, 17L, 6L, 20L,
8L, 2L), `4/28/2018` = c(23L, 25L, 8L, 1L, 10L, 13L), `5/28/2018` = c(13L,
25L, 5L, 11L, 25L, 17L)), class = "data.frame", row.names = c(NA, -6L))
基礎R(不像上面那樣雄辯):
# Create a named list of dataframes:
df_list <- list(patients = patients, sales = sales)
# Create a vector in each with the name of the dataframe:
df_list <- mapply(cbind, df_list, "desc" = as.character(names(df_list)),
SIMPLIFY = FALSE)
# Define a function to reshape the data:
reshape_ps <- function(x){
tmp <- setNames(reshape(x,
direction = "long",
varying = which(names(x) %in% names(x[,sapply(x, is.numeric)])),
idvar = c(!(names(x) %in% names(x[,sapply(x, is.numeric)]))),
v.names = "month",
times = as.Date(names(x[,sapply(x, is.numeric)]), "%m/%d/%Y"),
new.row.names = 1:(nrow(x)*length(which(names(x) %in% names(x[,sapply(x, is.numeric)]))))),
c(names(x[!(names(x) %in% names(x[,sapply(x, is.numeric)]))]), "month", as.character(unique(x$desc))))
# Drop the dataframe name vector:
clean <- tmp[,names(tmp) != "desc"]
# Specify the return object:
return(clean)
}
# Merge the result of the function applied on both dataframes:
Reduce(function(y, z){merge(y, z, by = intersect(colnames(y), colnames(z)), all = TRUE)},
Map(function(x){reshape_ps(x)}, df_list))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.