簡體   English   中英

如何基於多個屬性的相似時間線組合 R 中的數據幀,然后轉換數據以將這些列作為行標題?

[英]How to combine dataframes in R based-on similar timelines for multiple attributes and then transforming the data to make these columns as row headers?

我正在嘗試將我的銷售數據和患者數據合並到 R(以及一些其他屬性)中,這些數據是在同一時間范圍內在國家/地區級別匯總的。 合並后,我想將其合並為長格式而不是寬格式,並在國家月級別保持其唯一性。

這是我的輸入數據的樣子 -

1) 銷售數據

Coutry_ID   Country_Name    1/28/2018   2/28/2018   3/28/2018   4/28/2018   5/28/2018
A0001       USA               44           72         85          25          72
A0002       Germany           98           70         69          48          41
A0003       Russia            82           42         32          29          43
A0004       UK                79           83         51          48          47
A0005       France            45           75         10          13          23
A0006       India             92           85         28          13          18

2) 患者數據

Coutry_ID   Country_Name    1/28/2018   2/28/2018   3/28/2018   4/28/2018   5/28/2018
A0001       USA                7          13          22          23          13
A0002       Germany            9          10          17          25          25
A0003       Russia            24          19           6           8           5
A0004       UK                 6          8           20           1          11
A0005       France             4          9            8          10          25
A0006       India             18          21           2          13          17

這就是我打算輸出的樣子 -

Coutry_ID   Country_Name    Month       Sales   Patients
A0001       USA         1/28/2018       44      7
A0001       USA         2/28/2018       72      13
A0001       USA         3/28/2018       85      22
A0001       USA         4/28/2018       25      23
A0001       USA         5/28/2018       72      13
A0002       Germany     1/28/2018       98      9
A0002       Germany     2/28/2018       70      10
A0002       Germany     3/28/2018       69      17
A0002       Germany     4/28/2018       48      25
A0002       Germany     5/28/2018       41      25
A0003       Russia      1/28/2018       82      24
A0003       Russia      2/28/2018       42      19
A0003       Russia      3/28/2018       32      6
A0003       Russia      4/28/2018       29      8
A0003       Russia      5/28/2018       43      5
A0004       UK          1/28/2018       79      6
A0004       UK          2/28/2018       83      8
A0004       UK          3/28/2018       51      20
A0004       UK          4/28/2018       48      1
A0004       UK          5/28/2018       47      11
A0005       France      1/28/2018       45      4
A0005       France      2/28/2018       75      9
A0005       France      3/28/2018       10      8
A0005       France      4/28/2018       13      10
A0005       France      5/28/2018       23      25
A0006       India       1/28/2018       92      18
A0006       India       2/28/2018       85      21
A0006       India       3/28/2018       28      2
A0006       India       4/28/2018       13      13
A0006       India       5/28/2018       18      17

我需要關於這兩件事的一些指導-

1 - 如何將數據從寬轉換為長?

2 - 為了合並數據,我正在考慮在所有這些數據集上使用 DPLYR left_join 以及我的帶有 ID 和名稱的國家/地區主列表。 我的疑問是我是否應該先將數據集從寬格式轉換為長格式,還是合並后再做?

您可以獲得長格式的兩個數據幀,然后加入:

library(dplyr)
library(tidyr)

inner_join(
   sales %>% pivot_longer(cols = -c(Coutry_ID, Country_Name), values_to = 'Sales'),
   patients %>% pivot_longer(cols = -c(Coutry_ID, Country_Name), 
                values_to = 'Patients'), 
       by = c("Coutry_ID", "Country_Name", "name"))

# A tibble: 30 x 5
#   Coutry_ID Country_Name name      Sales Patients
#   <fct>     <fct>        <chr>     <int>    <int>
# 1 A0001     USA          1/28/2018    44        7
# 2 A0001     USA          2/28/2018    72       13
# 3 A0001     USA          3/28/2018    85       22
# 4 A0001     USA          4/28/2018    25       23
# 5 A0001     USA          5/28/2018    72       13
# 6 A0002     Germany      1/28/2018    98        9
# 7 A0002     Germany      2/28/2018    70       10
# 8 A0002     Germany      3/28/2018    69       17
# 9 A0002     Germany      4/28/2018    48       25
#10 A0002     Germany      5/28/2018    41       25
# … with 20 more rows

數據

sales <- structure(list(Coutry_ID = structure(1:6, .Label = c("A0001", 
"A0002", "A0003", "A0004", "A0005", "A0006"), class = "factor"), 
Country_Name = structure(c(6L, 2L, 4L, 5L, 1L, 3L), .Label = c("France", 
"Germany", "India", "Russia", "UK", "USA"), class = "factor"), 
`1/28/2018` = c(44L, 98L, 82L, 79L, 45L, 92L), `2/28/2018` = c(72L, 
70L, 42L, 83L, 75L, 85L), `3/28/2018` = c(85L, 69L, 32L, 
51L, 10L, 28L), `4/28/2018` = c(25L, 48L, 29L, 48L, 13L, 
13L), `5/28/2018` = c(72L, 41L, 43L, 47L, 23L, 18L)), class = 
"data.frame", row.names = c(NA, -6L))

patients <- structure(list(Coutry_ID = structure(1:6, .Label = c("A0001", 
"A0002", "A0003", "A0004", "A0005", "A0006"), class = "factor"), 
Country_Name = structure(c(6L, 2L, 4L, 5L, 1L, 3L), .Label = c("France", 
"Germany", "India", "Russia", "UK", "USA"), class = "factor"), 
`1/28/2018` = c(7L, 9L, 24L, 6L, 4L, 18L), `2/28/2018` = c(13L, 
10L, 19L, 8L, 9L, 21L), `3/28/2018` = c(22L, 17L, 6L, 20L, 
8L, 2L), `4/28/2018` = c(23L, 25L, 8L, 1L, 10L, 13L), `5/28/2018` = c(13L, 
25L, 5L, 11L, 25L, 17L)), class = "data.frame", row.names = c(NA, -6L))

基礎R(不像上面那樣雄辯):

# Create a named list of dataframes:
df_list <- list(patients = patients, sales = sales)

# Create a vector in each with the name of the dataframe:
df_list <- mapply(cbind,  df_list, "desc" = as.character(names(df_list)),
                  SIMPLIFY = FALSE)

# Define a function to reshape the data:
reshape_ps <- function(x){

tmp <- setNames(reshape(x,
        direction = "long",
        varying = which(names(x) %in% names(x[,sapply(x, is.numeric)])),
        idvar = c(!(names(x) %in% names(x[,sapply(x, is.numeric)]))),
        v.names = "month",
        times = as.Date(names(x[,sapply(x, is.numeric)]), "%m/%d/%Y"),
        new.row.names = 1:(nrow(x)*length(which(names(x) %in% names(x[,sapply(x, is.numeric)]))))),
        c(names(x[!(names(x) %in% names(x[,sapply(x, is.numeric)]))]), "month", as.character(unique(x$desc))))

# Drop the dataframe name vector:
clean <- tmp[,names(tmp) != "desc"]

# Specify the return object:
return(clean)
}

# Merge the result of the function applied on both dataframes:
Reduce(function(y, z){merge(y, z, by = intersect(colnames(y), colnames(z)), all = TRUE)},
                            Map(function(x){reshape_ps(x)}, df_list))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM