如何将多列合并为一列并在R中附加其唯一代码？

Question

I have a dataframe that looks like this 我有一个看起来像这样的数据框

+---------+------------+-------------+--------+
|   code  |chem_1      | chem_2      | chem_3 |
+---------+------------+-------------+--------+
|    1    |PCB001      |PCB047       |PCB047  |
|    2    |chlorpyrifos|chlorpyriphos|        | 
|    3    |TOC         |             |        |
+---------+------------+-------------+--------+

I want to combine all the chemicals into one column with their code attached to it. 我想将所有化学品合并为一列，并附加其代码。

+-------------+--------+
| chem        | code   |
+-------------+--------+
|PCB001       | 1      |
|PCB047       | 1      | 
|PCB047       | 1      |
|chlorpyrifos | 2      |
|chlorpyriphos| 2      |
|    TOC      | 3      |
+-------------+--------+

I want to know if there's an easy way that I can do it in one function call. 我想知道是否有一种简单的方法可以在一个函数调用中做到这一点。 Thanks so much! 非常感谢！

Answer 1

There exist many ways to do this; 有很多方法可以做到这一点。 here is one using reshape2::melt 这是使用reshape2::melt

library(reshape2);
df[df == ""] <- NA;
melt(df, id = "code", na.rm = T, value.name = "chem")[, -2]
#  code          chem
#1    1        PCB001
#2    2  chlorpyrifos
#3    3           TOC
#4    1        PCB047
#5    2 chlorpyriphos
#7    1        PCB047

We first replace all empty values with NA s, and then use melt with na.rm = TRUE to reshape from wide to long whilst removing NA entries. 我们首先用NA替换所有空值，然后使用带有na.rm = TRUE melt na.rm = TRUE到长na.rm = TRUE ，同时删除NA条目。

Sample data 样本数据

df <- read.table(text =
    " code  chem_1       chem_2       chem_3
    1    PCB001      PCB047       PCB047
    2    chlorpyrifos   chlorpyriphos     ''
    3    TOC           ''  ''                ", header = T)

Answer 2

A tidyverse solution. tidyverse解决方案。

# Required package
library(tidyverse)

# Dummy data
df <- data.frame(code = 1:5, foo = letters[1:5], bar = LETTERS[6:10])

#    code foo bar
# 1    1   a   F
# 2    2   b   G
# 3    3   c   H
# 4    4   d   I
# 5    5   e   J

# Reformat
df %>% gather(key, chem, -code) %>% select(-key)

#    code  chem
# 1     1     a
# 2     2     b
# 3     3     c
# 4     4     d
# 5     5     e
# 6     1     F
# 7     2     G
# 8     3     H
# 9     4     I
# 10    5     J

Answer 3

Use melt from data.table 利用melt从data.table

library(data.table)
library(dplyr)
melt(df, id.vars = "code", measure.vars = c("chem_1", "chem_2", "chem_3")) %>%
  arrange(code) %>%
  drop_na() %>%
  select(-variable)

 # code         value
 #1    1        PCB001
 #2    1        PCB047
 #3    1        PCB047
 #4    2  chlorpyrifos
 #5    2 chlorpyriphos
 #7    3           TOC

Data: Replace ' ' spaces with NA using na.strings 数据：使用na.strings用NA替换' '空间

df <- read.table(text =
   " code  chem_1       chem_2       chem_3
    1    PCB001      PCB047       PCB047
    2    chlorpyrifos   chlorpyriphos     ' '
    3    TOC           ' '  ' '                ", na.strings=" ", header = T)

Answer 4

Consider reshape in base R: 考虑在基数R中reshape ：

data <- data.frame(code = c(1:3),
                   chem_1 = c("PCB001", "chlorpyrifo", "TOC"),
                   chem_2 = c("PCB047", "chlorpyriphos", NA),
                   chem_3 = c("PCB047", NA, NA))

rdf <- reshape(data, varying = names(data)[-1], v.names = "chem", 
               times = names(data)[-1], timevar = "type", idvar = "code",
               new.row.names = 1:1000, direction = "long")    
rdf

#   code   type          chem
# 1    1 chem_1        PCB001
# 2    2 chem_1   chlorpyrifo
# 3    3 chem_1           TOC
# 4    1 chem_2        PCB047
# 5    2 chem_2 chlorpyriphos
# 6    3 chem_2          <NA>
# 7    1 chem_3        PCB047
# 8    2 chem_3          <NA>
# 9    3 chem_3          <NA>

Answer 5

You could do this in base R , leveraging the recycling feature of data.frame : 您可以在base R中利用data.frame的回收功能执行此data.frame ：

df1 <- subset(data.frame(df[1],chem = unlist(df[-1])),chem!="")
df1[order(df1$code),] # if you need it sorted
#         code          chem
# chem_11    1        PCB001
# chem_21    1        PCB047
# chem_31    1        PCB047
# chem_12    2  chlorpyrifos
# chem_22    2 chlorpyriphos
# chem_13    3           TOC

如何将多列合并为一列并在R中附加其唯一代码？

问题描述

5 个解决方案

解决方案1
0 2018-06-22 12:21:53

Sample data 样本数据

解决方案2
0 2018-06-22 12:26:35

解决方案3
0 2018-06-22 13:42:22

解决方案4
0 2018-06-22 14:15:35

解决方案5
0 2018-06-22 19:31:22

如何将多列合并为一列并在R中附加其唯一代码？

问题描述

5 个解决方案

解决方案1 0 2018-06-22 12:21:53

Sample data 样本数据

解决方案2 0 2018-06-22 12:26:35

解决方案3 0 2018-06-22 13:42:22

解决方案4 0 2018-06-22 14:15:35

解决方案5 0 2018-06-22 19:31:22

解决方案1
0 2018-06-22 12:21:53

解决方案2
0 2018-06-22 12:26:35

解决方案3
0 2018-06-22 13:42:22

解决方案4
0 2018-06-22 14:15:35

解决方案5
0 2018-06-22 19:31:22