![](/img/trans.png)
[英]How can I generate by-group summary statistics if my grouping variable is a factor?
[英]My data has two headers (variable type and grouping factor). How can I split the header and turn the grouping factor into a column?
我的 csv 文件有兩個標題,如下所示:
Run #1,Run #1,Run #2,Run #2
Angle,Light,Angle,Light
a,b,c,d
e,f,g,h
第一個標題給出測量編號,第二個標題給出測量類型。 我希望我的數據如下所示:
Run Angle Light
1 a b
1 e f
2 c d
2 g h
要將表讀入 R,我必須使用 scan 分別讀入標題,然后將它們合並為一個標題:
header <- scan(file, nlines = 1, sep=",",what = character())
header2 <- scan(file, skip = 1, nlines = 1, sep = ",", what = character())
df<- read.table(file, sep=",", header=F, skip=2)
names(df) <- paste(header, header2, sep = "_")
所以我最終得到了這個:
structure(list(`Run #1_Angle` = c(0, 0.01, 0.02, 0.03), `Run #1_Light` = c(0,
0, 0, 0), `Run #2_Angle` = c(NA, 0, 0, 0),
`Run #2_Light` = c(NA, NA, 0, 0)), row.names = c(NA,
4L), class = "data.frame")
我假設我必須收集,然后傳播:
df_fix<-df %>%
gather()%>%
separate(key, into = c('run', 'variable'), sep = "_") %>%
mutate(variable=as.factor(variable)) %>%
mutate(run=as.factor(run)) %>%
group_by(run) %>%
spread(variable, value)
這給了我這個錯誤:
Error: Each row of output must be identified by a unique combination of keys. Keys are shared for 526500 rows.
526500 行是我的整個數據集,所以我不太確定這意味着什么以及如何避免它? 或者,是否有不同的方式來保留標題的一部分並將另一部分轉換為列?
我們可以使用pivot_longer
將“寬”格式轉換為“長”格式,而不是gather/spread
library(dplyr)
library(tidyr)
df1 %>%
pivot_longer(cols = everything(), names_to = c("Run", ".value"),
names_pattern = ".*\\s+#(\\d+)_(\\w+)", values_drop_na = TRUE)
# A tibble: 7 × 3
Run Angle Light
<chr> <dbl> <dbl>
1 1 0 0
2 1 0.01 0
3 2 0 NA
4 1 0.02 0
5 2 0 0
6 1 0.03 0
7 2 0 0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.