我的數據有兩個標題（變量類型和分組因子）。如何拆分標題並將分組因子轉換為列？

Question

我的 csv 文件有兩個標題，如下所示：

 Run #1,Run #1,Run #2,Run #2 
 Angle,Light,Angle,Light 
 a,b,c,d    
 e,f,g,h

第一個標題給出測量編號，第二個標題給出測量類型。 我希望我的數據如下所示：

Run Angle Light
1    a      b
1    e      f
2    c      d
2    g      h

要將表讀入 R，我必須使用 scan 分別讀入標題，然后將它們合並為一個標題：

header <- scan(file, nlines = 1, sep=",",what = character())
header2 <- scan(file, skip = 1, nlines = 1, sep = ",", what = character())

df<- read.table(file, sep=",", header=F, skip=2)
names(df) <- paste(header, header2, sep = "_")

所以我最終得到了這個：

structure(list(`Run #1_Angle` = c(0, 0.01, 0.02, 0.03), `Run #1_Light` = c(0, 
0, 0, 0), `Run #2_Angle` = c(NA, 0, 0, 0), 
    `Run #2_Light` = c(NA, NA, 0, 0)), row.names = c(NA, 
4L), class = "data.frame")

我假設我必須收集，然后傳播：

df_fix<-df %>%
  gather()%>%
  separate(key, into = c('run', 'variable'), sep = "_") %>% 
  mutate(variable=as.factor(variable)) %>% 
  mutate(run=as.factor(run)) %>% 
  group_by(run) %>% 
  spread(variable, value)

這給了我這個錯誤：

Error: Each row of output must be identified by a unique combination of keys. Keys are shared for 526500 rows.

526500 行是我的整個數據集，所以我不太確定這意味着什么以及如何避免它？ 或者，是否有不同的方式來保留標題的一部分並將另一部分轉換為列？

Answer 1

我們可以使用pivot_longer將“寬”格式轉換為“長”格式，而不是gather/spread

library(dplyr)
library(tidyr)
df1 %>% 
   pivot_longer(cols = everything(), names_to = c("Run", ".value"), 
      names_pattern = ".*\\s+#(\\d+)_(\\w+)", values_drop_na = TRUE)
# A tibble: 7 × 3
  Run   Angle Light
  <chr> <dbl> <dbl>
1 1      0        0
2 1      0.01     0
3 2      0       NA
4 1      0.02     0
5 2      0        0
6 1      0.03     0
7 2      0        0

我的數據有兩個標題（變量類型和分組因子）。如何拆分標題並將分組因子轉換為列？

問題描述

1 個解決方案

解決方案1
1 已采納 2021-10-21 16:24:34

我的數據有兩個標題（變量類型和分組因子）。 如何拆分標題並將分組因子轉換為列？

問題描述

1 個解決方案

解決方案1 1 已采納 2021-10-21 16:24:34

我的數據有兩個標題（變量類型和分組因子）。如何拆分標題並將分組因子轉換為列？

解決方案1
1 已采納 2021-10-21 16:24:34