簡體   English   中英

如何將多列合並為一列並在R中附加其唯一代碼?

[英]How to combine multiple columns into one column and attach their unique code in R?

我有一個看起來像這樣的數據框

+---------+------------+-------------+--------+
|   code  |chem_1      | chem_2      | chem_3 |
+---------+------------+-------------+--------+
|    1    |PCB001      |PCB047       |PCB047  |
|    2    |chlorpyrifos|chlorpyriphos|        | 
|    3    |TOC         |             |        |
+---------+------------+-------------+--------+

我想將所有化學品合並為一列,並附加其代碼。

+-------------+--------+
| chem        | code   |
+-------------+--------+
|PCB001       | 1      |
|PCB047       | 1      | 
|PCB047       | 1      |
|chlorpyrifos | 2      |
|chlorpyriphos| 2      |
|    TOC      | 3      |
+-------------+--------+

我想知道是否有一種簡單的方法可以在一個函數調用中做到這一點。 非常感謝!

有很多方法可以做到這一點。 這是使用reshape2::melt

library(reshape2);
df[df == ""] <- NA;
melt(df, id = "code", na.rm = T, value.name = "chem")[, -2]
#  code          chem
#1    1        PCB001
#2    2  chlorpyrifos
#3    3           TOC
#4    1        PCB047
#5    2 chlorpyriphos
#7    1        PCB047

我們首先用NA替換所有空值,然后使用帶有na.rm = TRUE melt na.rm = TRUE到長na.rm = TRUE ,同時刪除NA條目。


樣本數據

df <- read.table(text =
    " code  chem_1       chem_2       chem_3
    1    PCB001      PCB047       PCB047
    2    chlorpyrifos   chlorpyriphos     ''
    3    TOC           ''  ''                ", header = T)

tidyverse解決方案。

# Required package
library(tidyverse)

# Dummy data
df <- data.frame(code = 1:5, foo = letters[1:5], bar = LETTERS[6:10])

#    code foo bar
# 1    1   a   F
# 2    2   b   G
# 3    3   c   H
# 4    4   d   I
# 5    5   e   J

# Reformat
df %>% gather(key, chem, -code) %>% select(-key)

#    code  chem
# 1     1     a
# 2     2     b
# 3     3     c
# 4     4     d
# 5     5     e
# 6     1     F
# 7     2     G
# 8     3     H
# 9     4     I
# 10    5     J

利用meltdata.table

library(data.table)
library(dplyr)
melt(df, id.vars = "code", measure.vars = c("chem_1", "chem_2", "chem_3")) %>%
  arrange(code) %>%
  drop_na() %>%
  select(-variable)

 # code         value
 #1    1        PCB001
 #2    1        PCB047
 #3    1        PCB047
 #4    2  chlorpyrifos
 #5    2 chlorpyriphos
 #7    3           TOC

數據:使用na.stringsNA替換' '空間

df <- read.table(text =
   " code  chem_1       chem_2       chem_3
    1    PCB001      PCB047       PCB047
    2    chlorpyrifos   chlorpyriphos     ' '
    3    TOC           ' '  ' '                ", na.strings=" ", header = T)

考慮在基數R中reshape

data <- data.frame(code = c(1:3),
                   chem_1 = c("PCB001", "chlorpyrifo", "TOC"),
                   chem_2 = c("PCB047", "chlorpyriphos", NA),
                   chem_3 = c("PCB047", NA, NA))

rdf <- reshape(data, varying = names(data)[-1], v.names = "chem", 
               times = names(data)[-1], timevar = "type", idvar = "code",
               new.row.names = 1:1000, direction = "long")    
rdf

#   code   type          chem
# 1    1 chem_1        PCB001
# 2    2 chem_1   chlorpyrifo
# 3    3 chem_1           TOC
# 4    1 chem_2        PCB047
# 5    2 chem_2 chlorpyriphos
# 6    3 chem_2          <NA>
# 7    1 chem_3        PCB047
# 8    2 chem_3          <NA>
# 9    3 chem_3          <NA>

您可以在base R中利用data.frame的回收功能執行此data.frame

df1 <- subset(data.frame(df[1],chem = unlist(df[-1])),chem!="")
df1[order(df1$code),] # if you need it sorted
#         code          chem
# chem_11    1        PCB001
# chem_21    1        PCB047
# chem_31    1        PCB047
# chem_12    2  chlorpyrifos
# chem_22    2 chlorpyriphos
# chem_13    3           TOC

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM