简体   繁体   中英

Melting an R data.table with a factor column

I have the following R data.table (though this should scale with a data.frame too). The goal is to reshape this data.table to plot as a scatterplot in ggplot2 . I therefore need to reshape this data.table to have one "factor" column to color the points:

> library(data.table)
> dt
   ID                   x_A               y_A        x_B       y_B                                                                                                                                                                                                  
   1:   05AC            0.81               3          0.92      2.05                                                                                                                                                                                                   
   2:   01BA            0.41               5          0.63      1.8                                                                                                                                                                                                   
   3:   Z1AC            0.41               5          0.58      1.8                                                                                                                                                                                                   
   4:   B2BA            0.21             6.5          1.00      1.8   
   ....

I believe the correct output needs to be of the form:

ID     type   x      y
05AC   A      0.81   3       
05AC   B      0.92   2.05
01BA   A      0.41   5 
01BA   B      0.63   1.8
Z1AC   A      0.41   5 
Z1AC   B      0.58   1.8
B2BA   A      0.21   6.5 
B2BA   B      1.00   1.8

Is there a standard way to "unfold" data.tables in this fashion? I'm happy for how to use dplyr in this case, but I suspect there should be a data.table method.

melt() would work, if I could figure out how to create the column type , eg

melt(dt, id.vars=c("ID")) 

will only melt based on the one column ID

I'm especially confused how one "scrapes" the A and B type from columns 2-3 and columns 4-5 respectively...

Staying within data.table , after your suggested approach of using melt , you can tstrsplit to split the variable based on the "_" character.

## use tstrsplit to split a column on a regular expression
dt[, c("xy", "type") := tstrsplit(variable, "_")]
dt 
#       ID variable value xy type
#  1: 05AC      x_A  0.81  x    A
#  2: 01BA      x_A  0.41  x    A
#  3: Z1AC      x_A  0.41  x    A
#  4: B2BA      x_A  0.21  x    A
#  5: 05AC      y_A  3.00  y    A
#  6: 01BA      y_A  5.00  y    A
#  7: Z1AC      y_A  5.00  y    A
#  8: B2BA      y_A  6.50  y    A
#  9: 05AC      x_B  0.92  x    B
# 10: 01BA      x_B  0.63  x    B
# 11: Z1AC      x_B  0.58  x    B
# 12: B2BA      x_B  1.00  x    B
# 13: 05AC      y_B  2.05  y    B
# 14: 01BA      y_B  1.80  y    B
# 15: Z1AC      y_B  1.80  y    B
# 16: B2BA      y_B  1.80  y    B

This gives you the long-form of your required solution. You can then use dcast to widen it

dcast(dt, formula = ID + type ~ xy)

#      ID type    x    y
# 1: 01BA    A 0.41 5.00
# 2: 01BA    B 0.63 1.80
# 3: 05AC    A 0.81 3.00
# 4: 05AC    B 0.92 2.05
# 5: B2BA    A 0.21 6.50
# 6: B2BA    B 1.00 1.80
# 7: Z1AC    A 0.41 5.00
# 8: Z1AC    B 0.58 1.80

The logic of this answer is the same as the suggested dplyr approach of gather %>% separate %>% spread , but using data.table .

A combination of dplyr and tidyr can produce your desired result. This is untested, due to the lack of a reproducible example.

library(tidyr)
library(dplyr)

dt %>% 
  gather(variable, value, -ID) %>% 
  separate(variable, c("group", "type"), sep = "\\_") %>% 
  spread(group, value, na.rm = TRUE)

What this does:

  1. gathers all columns except the ID column into a key-value rows, variable and value.
  2. separates the variable column into group and type, using _ as a separator.
  3. spread the contents of the group rows into columns and populate them with the value column, removing any NA combinations.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM