[英]How to combine dataframes in R based-on similar timelines for multiple attributes and then transforming the data to make these columns as row headers?
I am trying to merge my sales data and patients data in R (and some other attributes) which are rolled-up at the country level for the same time-frame.我正在尝试将我的销售数据和患者数据合并到 R(以及一些其他属性)中,这些数据是在同一时间范围内在国家/地区级别汇总的。 After merging, I want to consolidate it to a long format instead of wide format and keep it unique at the Country-Month level.
合并后,我想将其合并为长格式而不是宽格式,并在国家月级别保持其唯一性。
This is how my input data looks like -这是我的输入数据的样子 -
1) Sales Data 1) 销售数据
Coutry_ID Country_Name 1/28/2018 2/28/2018 3/28/2018 4/28/2018 5/28/2018
A0001 USA 44 72 85 25 72
A0002 Germany 98 70 69 48 41
A0003 Russia 82 42 32 29 43
A0004 UK 79 83 51 48 47
A0005 France 45 75 10 13 23
A0006 India 92 85 28 13 18
2) Patients Data 2) 患者数据
Coutry_ID Country_Name 1/28/2018 2/28/2018 3/28/2018 4/28/2018 5/28/2018
A0001 USA 7 13 22 23 13
A0002 Germany 9 10 17 25 25
A0003 Russia 24 19 6 8 5
A0004 UK 6 8 20 1 11
A0005 France 4 9 8 10 25
A0006 India 18 21 2 13 17
AND this is how I intend output to look like -这就是我打算输出的样子 -
Coutry_ID Country_Name Month Sales Patients
A0001 USA 1/28/2018 44 7
A0001 USA 2/28/2018 72 13
A0001 USA 3/28/2018 85 22
A0001 USA 4/28/2018 25 23
A0001 USA 5/28/2018 72 13
A0002 Germany 1/28/2018 98 9
A0002 Germany 2/28/2018 70 10
A0002 Germany 3/28/2018 69 17
A0002 Germany 4/28/2018 48 25
A0002 Germany 5/28/2018 41 25
A0003 Russia 1/28/2018 82 24
A0003 Russia 2/28/2018 42 19
A0003 Russia 3/28/2018 32 6
A0003 Russia 4/28/2018 29 8
A0003 Russia 5/28/2018 43 5
A0004 UK 1/28/2018 79 6
A0004 UK 2/28/2018 83 8
A0004 UK 3/28/2018 51 20
A0004 UK 4/28/2018 48 1
A0004 UK 5/28/2018 47 11
A0005 France 1/28/2018 45 4
A0005 France 2/28/2018 75 9
A0005 France 3/28/2018 10 8
A0005 France 4/28/2018 13 10
A0005 France 5/28/2018 23 25
A0006 India 1/28/2018 92 18
A0006 India 2/28/2018 85 21
A0006 India 3/28/2018 28 2
A0006 India 4/28/2018 13 13
A0006 India 5/28/2018 18 17
I need a little guidance on these 2 things -我需要关于这两件事的一些指导-
1 - How to convert the data from wide to long? 1 - 如何将数据从宽转换为长?
2 - For merging data, I am thinking about using DPLYR left_join on all these data-sets with my master list of countries with ID and Name. 2 - 为了合并数据,我正在考虑在所有这些数据集上使用 DPLYR left_join 以及我的带有 ID 和名称的国家/地区主列表。 My doubt is whether I should first convert the data sets into The long format from wide or do that after merging?
我的疑问是我是否应该先将数据集从宽格式转换为长格式,还是合并后再做?
You can get both the dataframes in long format and then join :您可以获得长格式的两个数据帧,然后加入:
library(dplyr)
library(tidyr)
inner_join(
sales %>% pivot_longer(cols = -c(Coutry_ID, Country_Name), values_to = 'Sales'),
patients %>% pivot_longer(cols = -c(Coutry_ID, Country_Name),
values_to = 'Patients'),
by = c("Coutry_ID", "Country_Name", "name"))
# A tibble: 30 x 5
# Coutry_ID Country_Name name Sales Patients
# <fct> <fct> <chr> <int> <int>
# 1 A0001 USA 1/28/2018 44 7
# 2 A0001 USA 2/28/2018 72 13
# 3 A0001 USA 3/28/2018 85 22
# 4 A0001 USA 4/28/2018 25 23
# 5 A0001 USA 5/28/2018 72 13
# 6 A0002 Germany 1/28/2018 98 9
# 7 A0002 Germany 2/28/2018 70 10
# 8 A0002 Germany 3/28/2018 69 17
# 9 A0002 Germany 4/28/2018 48 25
#10 A0002 Germany 5/28/2018 41 25
# … with 20 more rows
data数据
sales <- structure(list(Coutry_ID = structure(1:6, .Label = c("A0001",
"A0002", "A0003", "A0004", "A0005", "A0006"), class = "factor"),
Country_Name = structure(c(6L, 2L, 4L, 5L, 1L, 3L), .Label = c("France",
"Germany", "India", "Russia", "UK", "USA"), class = "factor"),
`1/28/2018` = c(44L, 98L, 82L, 79L, 45L, 92L), `2/28/2018` = c(72L,
70L, 42L, 83L, 75L, 85L), `3/28/2018` = c(85L, 69L, 32L,
51L, 10L, 28L), `4/28/2018` = c(25L, 48L, 29L, 48L, 13L,
13L), `5/28/2018` = c(72L, 41L, 43L, 47L, 23L, 18L)), class =
"data.frame", row.names = c(NA, -6L))
patients <- structure(list(Coutry_ID = structure(1:6, .Label = c("A0001",
"A0002", "A0003", "A0004", "A0005", "A0006"), class = "factor"),
Country_Name = structure(c(6L, 2L, 4L, 5L, 1L, 3L), .Label = c("France",
"Germany", "India", "Russia", "UK", "USA"), class = "factor"),
`1/28/2018` = c(7L, 9L, 24L, 6L, 4L, 18L), `2/28/2018` = c(13L,
10L, 19L, 8L, 9L, 21L), `3/28/2018` = c(22L, 17L, 6L, 20L,
8L, 2L), `4/28/2018` = c(23L, 25L, 8L, 1L, 10L, 13L), `5/28/2018` = c(13L,
25L, 5L, 11L, 25L, 17L)), class = "data.frame", row.names = c(NA, -6L))
Base R (not as eloquent as above):基础R(不像上面那样雄辩):
# Create a named list of dataframes:
df_list <- list(patients = patients, sales = sales)
# Create a vector in each with the name of the dataframe:
df_list <- mapply(cbind, df_list, "desc" = as.character(names(df_list)),
SIMPLIFY = FALSE)
# Define a function to reshape the data:
reshape_ps <- function(x){
tmp <- setNames(reshape(x,
direction = "long",
varying = which(names(x) %in% names(x[,sapply(x, is.numeric)])),
idvar = c(!(names(x) %in% names(x[,sapply(x, is.numeric)]))),
v.names = "month",
times = as.Date(names(x[,sapply(x, is.numeric)]), "%m/%d/%Y"),
new.row.names = 1:(nrow(x)*length(which(names(x) %in% names(x[,sapply(x, is.numeric)]))))),
c(names(x[!(names(x) %in% names(x[,sapply(x, is.numeric)]))]), "month", as.character(unique(x$desc))))
# Drop the dataframe name vector:
clean <- tmp[,names(tmp) != "desc"]
# Specify the return object:
return(clean)
}
# Merge the result of the function applied on both dataframes:
Reduce(function(y, z){merge(y, z, by = intersect(colnames(y), colnames(z)), all = TRUE)},
Map(function(x){reshape_ps(x)}, df_list))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.