繁体   English   中英

如何将多列合并为一列并在R中附加其唯一代码?

[英]How to combine multiple columns into one column and attach their unique code in R?

我有一个看起来像这样的数据框

+---------+------------+-------------+--------+
|   code  |chem_1      | chem_2      | chem_3 |
+---------+------------+-------------+--------+
|    1    |PCB001      |PCB047       |PCB047  |
|    2    |chlorpyrifos|chlorpyriphos|        | 
|    3    |TOC         |             |        |
+---------+------------+-------------+--------+

我想将所有化学品合并为一列,并附加其代码。

+-------------+--------+
| chem        | code   |
+-------------+--------+
|PCB001       | 1      |
|PCB047       | 1      | 
|PCB047       | 1      |
|chlorpyrifos | 2      |
|chlorpyriphos| 2      |
|    TOC      | 3      |
+-------------+--------+

我想知道是否有一种简单的方法可以在一个函数调用中做到这一点。 非常感谢!

有很多方法可以做到这一点。 这是使用reshape2::melt

library(reshape2);
df[df == ""] <- NA;
melt(df, id = "code", na.rm = T, value.name = "chem")[, -2]
#  code          chem
#1    1        PCB001
#2    2  chlorpyrifos
#3    3           TOC
#4    1        PCB047
#5    2 chlorpyriphos
#7    1        PCB047

我们首先用NA替换所有空值,然后使用带有na.rm = TRUE melt na.rm = TRUE到长na.rm = TRUE ,同时删除NA条目。


样本数据

df <- read.table(text =
    " code  chem_1       chem_2       chem_3
    1    PCB001      PCB047       PCB047
    2    chlorpyrifos   chlorpyriphos     ''
    3    TOC           ''  ''                ", header = T)

tidyverse解决方案。

# Required package
library(tidyverse)

# Dummy data
df <- data.frame(code = 1:5, foo = letters[1:5], bar = LETTERS[6:10])

#    code foo bar
# 1    1   a   F
# 2    2   b   G
# 3    3   c   H
# 4    4   d   I
# 5    5   e   J

# Reformat
df %>% gather(key, chem, -code) %>% select(-key)

#    code  chem
# 1     1     a
# 2     2     b
# 3     3     c
# 4     4     d
# 5     5     e
# 6     1     F
# 7     2     G
# 8     3     H
# 9     4     I
# 10    5     J

利用meltdata.table

library(data.table)
library(dplyr)
melt(df, id.vars = "code", measure.vars = c("chem_1", "chem_2", "chem_3")) %>%
  arrange(code) %>%
  drop_na() %>%
  select(-variable)

 # code         value
 #1    1        PCB001
 #2    1        PCB047
 #3    1        PCB047
 #4    2  chlorpyrifos
 #5    2 chlorpyriphos
 #7    3           TOC

数据:使用na.stringsNA替换' '空间

df <- read.table(text =
   " code  chem_1       chem_2       chem_3
    1    PCB001      PCB047       PCB047
    2    chlorpyrifos   chlorpyriphos     ' '
    3    TOC           ' '  ' '                ", na.strings=" ", header = T)

考虑在基数R中reshape

data <- data.frame(code = c(1:3),
                   chem_1 = c("PCB001", "chlorpyrifo", "TOC"),
                   chem_2 = c("PCB047", "chlorpyriphos", NA),
                   chem_3 = c("PCB047", NA, NA))

rdf <- reshape(data, varying = names(data)[-1], v.names = "chem", 
               times = names(data)[-1], timevar = "type", idvar = "code",
               new.row.names = 1:1000, direction = "long")    
rdf

#   code   type          chem
# 1    1 chem_1        PCB001
# 2    2 chem_1   chlorpyrifo
# 3    3 chem_1           TOC
# 4    1 chem_2        PCB047
# 5    2 chem_2 chlorpyriphos
# 6    3 chem_2          <NA>
# 7    1 chem_3        PCB047
# 8    2 chem_3          <NA>
# 9    3 chem_3          <NA>

您可以在base R中利用data.frame的回收功能执行此data.frame

df1 <- subset(data.frame(df[1],chem = unlist(df[-1])),chem!="")
df1[order(df1$code),] # if you need it sorted
#         code          chem
# chem_11    1        PCB001
# chem_21    1        PCB047
# chem_31    1        PCB047
# chem_12    2  chlorpyrifos
# chem_22    2 chlorpyriphos
# chem_13    3           TOC

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM