简体   繁体   English

如何将多列合并为一列并在R中附加其唯一代码?

[英]How to combine multiple columns into one column and attach their unique code in R?

I have a dataframe that looks like this 我有一个看起来像这样的数据框

+---------+------------+-------------+--------+
|   code  |chem_1      | chem_2      | chem_3 |
+---------+------------+-------------+--------+
|    1    |PCB001      |PCB047       |PCB047  |
|    2    |chlorpyrifos|chlorpyriphos|        | 
|    3    |TOC         |             |        |
+---------+------------+-------------+--------+

I want to combine all the chemicals into one column with their code attached to it. 我想将所有化学品合并为一列,并附加其代码。

+-------------+--------+
| chem        | code   |
+-------------+--------+
|PCB001       | 1      |
|PCB047       | 1      | 
|PCB047       | 1      |
|chlorpyrifos | 2      |
|chlorpyriphos| 2      |
|    TOC      | 3      |
+-------------+--------+

I want to know if there's an easy way that I can do it in one function call. 我想知道是否有一种简单的方法可以在一个函数调用中做到这一点。 Thanks so much! 非常感谢!

There exist many ways to do this; 有很多方法可以做到这一点。 here is one using reshape2::melt 这是使用reshape2::melt

library(reshape2);
df[df == ""] <- NA;
melt(df, id = "code", na.rm = T, value.name = "chem")[, -2]
#  code          chem
#1    1        PCB001
#2    2  chlorpyrifos
#3    3           TOC
#4    1        PCB047
#5    2 chlorpyriphos
#7    1        PCB047

We first replace all empty values with NA s, and then use melt with na.rm = TRUE to reshape from wide to long whilst removing NA entries. 我们首先用NA替换所有空值,然后使用带有na.rm = TRUE melt na.rm = TRUE到长na.rm = TRUE ,同时删除NA条目。


Sample data 样本数据

df <- read.table(text =
    " code  chem_1       chem_2       chem_3
    1    PCB001      PCB047       PCB047
    2    chlorpyrifos   chlorpyriphos     ''
    3    TOC           ''  ''                ", header = T)

A tidyverse solution. tidyverse解决方案。

# Required package
library(tidyverse)

# Dummy data
df <- data.frame(code = 1:5, foo = letters[1:5], bar = LETTERS[6:10])

#    code foo bar
# 1    1   a   F
# 2    2   b   G
# 3    3   c   H
# 4    4   d   I
# 5    5   e   J

# Reformat
df %>% gather(key, chem, -code) %>% select(-key)

#    code  chem
# 1     1     a
# 2     2     b
# 3     3     c
# 4     4     d
# 5     5     e
# 6     1     F
# 7     2     G
# 8     3     H
# 9     4     I
# 10    5     J

Use melt from data.table 利用meltdata.table

library(data.table)
library(dplyr)
melt(df, id.vars = "code", measure.vars = c("chem_1", "chem_2", "chem_3")) %>%
  arrange(code) %>%
  drop_na() %>%
  select(-variable)

 # code         value
 #1    1        PCB001
 #2    1        PCB047
 #3    1        PCB047
 #4    2  chlorpyrifos
 #5    2 chlorpyriphos
 #7    3           TOC

Data: Replace ' ' spaces with NA using na.strings 数据:使用na.stringsNA替换' '空间

df <- read.table(text =
   " code  chem_1       chem_2       chem_3
    1    PCB001      PCB047       PCB047
    2    chlorpyrifos   chlorpyriphos     ' '
    3    TOC           ' '  ' '                ", na.strings=" ", header = T)

Consider reshape in base R: 考虑在基数R中reshape

data <- data.frame(code = c(1:3),
                   chem_1 = c("PCB001", "chlorpyrifo", "TOC"),
                   chem_2 = c("PCB047", "chlorpyriphos", NA),
                   chem_3 = c("PCB047", NA, NA))

rdf <- reshape(data, varying = names(data)[-1], v.names = "chem", 
               times = names(data)[-1], timevar = "type", idvar = "code",
               new.row.names = 1:1000, direction = "long")    
rdf

#   code   type          chem
# 1    1 chem_1        PCB001
# 2    2 chem_1   chlorpyrifo
# 3    3 chem_1           TOC
# 4    1 chem_2        PCB047
# 5    2 chem_2 chlorpyriphos
# 6    3 chem_2          <NA>
# 7    1 chem_3        PCB047
# 8    2 chem_3          <NA>
# 9    3 chem_3          <NA>

You could do this in base R , leveraging the recycling feature of data.frame : 您可以在base R中利用data.frame的回收功能执行此data.frame

df1 <- subset(data.frame(df[1],chem = unlist(df[-1])),chem!="")
df1[order(df1$code),] # if you need it sorted
#         code          chem
# chem_11    1        PCB001
# chem_21    1        PCB047
# chem_31    1        PCB047
# chem_12    2  chlorpyrifos
# chem_22    2 chlorpyriphos
# chem_13    3           TOC

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将一列数据附加到 R 中的多列 - How to attach data a column to multiple columns in R 在 R 中将多个竞赛列合并为一列 - Combine multiple race columns into one column in R 如何将R中的多个种族列合并为一个? - How to combine multiple ethnicity columns into one in R? 在 R 如何将 R dataframe 中的多个字符串列组合成一个列表列 - In R how to combine multiple string columns in an R dataframe into one list column R:如何根据单列中的唯一值组合来自多列的重复行,并通过 | 合并这些唯一值? - R: How to combine duplicated rows from multiple columns based on unique values in a single column and merge those unique values by |? 在 R 中将多个列表列合并为一个列表列? - Combine multiple list columns into one list column in R? R - 如何将多个 boolean 列(不知道有多少)合并为一列 - R - How to combine multiple boolean columns (don't know how many) into one column 检查 R 中多列(视为一个大“列”)中的唯一值 - Check unique values in multiple columns (treated as one big 'column') in R R中如何合并一列的member然后统计其他列? - How to combine member of one column and then count other columns in R? 如何将多个字符列合并为 R 数据框中的单个列 - How to combine multiple character columns into a single column in an R data frame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM