如何基于多个属性的相似时间线组合 R 中的数据帧，然后转换数据以将这些列作为行标题？

Question

I am trying to merge my sales data and patients data in R (and some other attributes) which are rolled-up at the country level for the same time-frame.我正在尝试将我的销售数据和患者数据合并到 R（以及一些其他属性）中，这些数据是在同一时间范围内在国家/地区级别汇总的。 After merging, I want to consolidate it to a long format instead of wide format and keep it unique at the Country-Month level.合并后，我想将其合并为长格式而不是宽格式，并在国家月级别保持其唯一性。

This is how my input data looks like -这是我的输入数据的样子 -

1) Sales Data 1) 销售数据

Coutry_ID   Country_Name    1/28/2018   2/28/2018   3/28/2018   4/28/2018   5/28/2018
A0001       USA               44           72         85          25          72
A0002       Germany           98           70         69          48          41
A0003       Russia            82           42         32          29          43
A0004       UK                79           83         51          48          47
A0005       France            45           75         10          13          23
A0006       India             92           85         28          13          18

2) Patients Data 2) 患者数据

Coutry_ID   Country_Name    1/28/2018   2/28/2018   3/28/2018   4/28/2018   5/28/2018
A0001       USA                7          13          22          23          13
A0002       Germany            9          10          17          25          25
A0003       Russia            24          19           6           8           5
A0004       UK                 6          8           20           1          11
A0005       France             4          9            8          10          25
A0006       India             18          21           2          13          17

AND this is how I intend output to look like -这就是我打算输出的样子 -

Coutry_ID   Country_Name    Month       Sales   Patients
A0001       USA         1/28/2018       44      7
A0001       USA         2/28/2018       72      13
A0001       USA         3/28/2018       85      22
A0001       USA         4/28/2018       25      23
A0001       USA         5/28/2018       72      13
A0002       Germany     1/28/2018       98      9
A0002       Germany     2/28/2018       70      10
A0002       Germany     3/28/2018       69      17
A0002       Germany     4/28/2018       48      25
A0002       Germany     5/28/2018       41      25
A0003       Russia      1/28/2018       82      24
A0003       Russia      2/28/2018       42      19
A0003       Russia      3/28/2018       32      6
A0003       Russia      4/28/2018       29      8
A0003       Russia      5/28/2018       43      5
A0004       UK          1/28/2018       79      6
A0004       UK          2/28/2018       83      8
A0004       UK          3/28/2018       51      20
A0004       UK          4/28/2018       48      1
A0004       UK          5/28/2018       47      11
A0005       France      1/28/2018       45      4
A0005       France      2/28/2018       75      9
A0005       France      3/28/2018       10      8
A0005       France      4/28/2018       13      10
A0005       France      5/28/2018       23      25
A0006       India       1/28/2018       92      18
A0006       India       2/28/2018       85      21
A0006       India       3/28/2018       28      2
A0006       India       4/28/2018       13      13
A0006       India       5/28/2018       18      17

I need a little guidance on these 2 things -我需要关于这两件事的一些指导-

1 - How to convert the data from wide to long? 1 - 如何将数据从宽转换为长？

2 - For merging data, I am thinking about using DPLYR left_join on all these data-sets with my master list of countries with ID and Name. 2 - 为了合并数据，我正在考虑在所有这些数据集上使用 DPLYR left_join 以及我的带有 ID 和名称的国家/地区主列表。 My doubt is whether I should first convert the data sets into The long format from wide or do that after merging?我的疑问是我是否应该先将数据集从宽格式转换为长格式，还是合并后再做？

Answer 1

You can get both the dataframes in long format and then join :您可以获得长格式的两个数据帧，然后加入：

library(dplyr)
library(tidyr)

inner_join(
   sales %>% pivot_longer(cols = -c(Coutry_ID, Country_Name), values_to = 'Sales'),
   patients %>% pivot_longer(cols = -c(Coutry_ID, Country_Name), 
                values_to = 'Patients'), 
       by = c("Coutry_ID", "Country_Name", "name"))

# A tibble: 30 x 5
#   Coutry_ID Country_Name name      Sales Patients
#   <fct>     <fct>        <chr>     <int>    <int>
# 1 A0001     USA          1/28/2018    44        7
# 2 A0001     USA          2/28/2018    72       13
# 3 A0001     USA          3/28/2018    85       22
# 4 A0001     USA          4/28/2018    25       23
# 5 A0001     USA          5/28/2018    72       13
# 6 A0002     Germany      1/28/2018    98        9
# 7 A0002     Germany      2/28/2018    70       10
# 8 A0002     Germany      3/28/2018    69       17
# 9 A0002     Germany      4/28/2018    48       25
#10 A0002     Germany      5/28/2018    41       25
# … with 20 more rows

data数据

sales <- structure(list(Coutry_ID = structure(1:6, .Label = c("A0001", 
"A0002", "A0003", "A0004", "A0005", "A0006"), class = "factor"), 
Country_Name = structure(c(6L, 2L, 4L, 5L, 1L, 3L), .Label = c("France", 
"Germany", "India", "Russia", "UK", "USA"), class = "factor"), 
`1/28/2018` = c(44L, 98L, 82L, 79L, 45L, 92L), `2/28/2018` = c(72L, 
70L, 42L, 83L, 75L, 85L), `3/28/2018` = c(85L, 69L, 32L, 
51L, 10L, 28L), `4/28/2018` = c(25L, 48L, 29L, 48L, 13L, 
13L), `5/28/2018` = c(72L, 41L, 43L, 47L, 23L, 18L)), class = 
"data.frame", row.names = c(NA, -6L))

patients <- structure(list(Coutry_ID = structure(1:6, .Label = c("A0001", 
"A0002", "A0003", "A0004", "A0005", "A0006"), class = "factor"), 
Country_Name = structure(c(6L, 2L, 4L, 5L, 1L, 3L), .Label = c("France", 
"Germany", "India", "Russia", "UK", "USA"), class = "factor"), 
`1/28/2018` = c(7L, 9L, 24L, 6L, 4L, 18L), `2/28/2018` = c(13L, 
10L, 19L, 8L, 9L, 21L), `3/28/2018` = c(22L, 17L, 6L, 20L, 
8L, 2L), `4/28/2018` = c(23L, 25L, 8L, 1L, 10L, 13L), `5/28/2018` = c(13L, 
25L, 5L, 11L, 25L, 17L)), class = "data.frame", row.names = c(NA, -6L))

Answer 2

Base R (not as eloquent as above):基础R（不像上面那样雄辩）：

# Create a named list of dataframes:
df_list <- list(patients = patients, sales = sales)

# Create a vector in each with the name of the dataframe:
df_list <- mapply(cbind,  df_list, "desc" = as.character(names(df_list)),
                  SIMPLIFY = FALSE)

# Define a function to reshape the data:
reshape_ps <- function(x){

tmp <- setNames(reshape(x,
        direction = "long",
        varying = which(names(x) %in% names(x[,sapply(x, is.numeric)])),
        idvar = c(!(names(x) %in% names(x[,sapply(x, is.numeric)]))),
        v.names = "month",
        times = as.Date(names(x[,sapply(x, is.numeric)]), "%m/%d/%Y"),
        new.row.names = 1:(nrow(x)*length(which(names(x) %in% names(x[,sapply(x, is.numeric)]))))),
        c(names(x[!(names(x) %in% names(x[,sapply(x, is.numeric)]))]), "month", as.character(unique(x$desc))))

# Drop the dataframe name vector:
clean <- tmp[,names(tmp) != "desc"]

# Specify the return object:
return(clean)
}

# Merge the result of the function applied on both dataframes:
Reduce(function(y, z){merge(y, z, by = intersect(colnames(y), colnames(z)), all = TRUE)},
                            Map(function(x){reshape_ps(x)}, df_list))

如何基于多个属性的相似时间线组合 R 中的数据帧，然后转换数据以将这些列作为行标题？

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-03-04 10:09:42

解决方案2
1 2020-03-04 12:24:21

如何基于多个属性的相似时间线组合 R 中的数据帧，然后转换数据以将这些列作为行标题？

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-03-04 10:09:42

解决方案2 1 2020-03-04 12:24:21

解决方案1
1 已采纳 2020-03-04 10:09:42

解决方案2
1 2020-03-04 12:24:21