简体   繁体   中英

Summarizing the data in a data table

Hi I have the following input.

+------+----+----+
| CODE | R1 | R2 |
+------+----+----+
| A    |  0 |  1 |
| B    |  1 |  1 |
| C    |  0 |  0 |
+------+----+----+

I need the output like below.

+------+------+-------+
| CODE | CODE | VALUE |
+------+------+-------+
| A    | R1   |     0 |
| A    | R2   |     1 |
| B    | R1   |     1 |
| B    | R2   |     1 |
| C    | R1   |     0 |
| C    | R2   |     0 |
+------+------+-------+

Please note that the regions such as R1 and R2 , there are many regions in the actual data like R3 , R4 , R5 and so on. For simplicity, I put only R1 and R2 .

Thanks in advance for your help!

Pretty classic scenario to transform data from wide to long format . And straight forward to use pivot-longer from :

df <- read.table(stringsAsFactors = F, header = T, text = "
           CODE R1  R2
           A    0   1
           B    1   1
           C    0   0");

dfTarget <- read.table(stringsAsFactors = F, header = T, text = "
 CODE  CODE  VALUE 
 A     R1        0 
 A     R2        1 
 B     R1        1 
 B     R2        1 
 C     R1        0 
 C     R2        0");

Now, the code that you want:

df %>% pivot_longer(cols = c("R1","R2"), values_to = "VALUE")
# A tibble: 6 x 3
  CODE  name  VALUE
  <chr> <chr> <int>
1 A     R1        0
2 A     R2        1
3 B     R1        1
4 B     R2        1
5 C     R1        0
6 C     R2        0

colnames(df)[1:2] <- c("CODE1", "CODE2"); #to change the column names, dataframe 
#with duplicate column names is not a good idea.

You can find a comprehensive overview of pivoting here @ https://tidyr.tidyverse.org/articles/pivot.html

With data.table , we can use melt

library(data.table)
melt(setDT(df1), id.var = 'CODE')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM