[英]Transpose a dataframe in case of rows contain two values for the same variable in R
I'm dealing with a dataframe that contains a variable named "Marker" which shows two values all the samples I collected.我正在处理一个数据框,其中包含一个名为“Marker”的变量,它显示了我收集的所有样本的两个值。 The dataframe is, for instance, as follows:
例如,数据框如下:
Sample.File Sample.Name Marker value
1 a a_1 xxx 16
2 a a_1 xxx 18
3 a a_1 yyy 16
4 a a_1 yyy 20
5 a a_1 zzz 9
6 a a_1 zzz 13
7 b b_1 xxx 10
8 b b_1 xxx 10
9 b b_1 yyy 6
10 b b_1 yyy 12
11 b b_1 zzz 14
12 b b_1 zzz 14
which is provided by the following code:这是由以下代码提供的:
data <- data.frame(
Sample.File = as.factor(c("a", "a", "a", "a", "a", "a", "b", "b", "b", "b",
"b", "b")),
Sample.Name = as.factor(c("a_1", "a_1", "a_1", "a_1", "a_1", "a_1", "b_1",
"b_1", "b_1", "b_1", "b_1", "b_1")),
Marker = as.factor(c("xxx", "xxx", "yyy", "yyy", "zzz", "zzz", "xxx",
"xxx", "yyy", "yyy", "zzz", "zzz")),
value = c(16L, 18L, 16L, 20L, 9L, 13L, 10L, 10L, 6L, 12L, 14L, 14L)
)
The new dataframe I'd like to work with is should be achieved by transposing the current data, but maintaining the columns Sample.File and Sample.Name for all the collected samples.我想要使用的新数据框应该通过转置当前数据来实现,但为所有收集的样本维护 Sample.File 和 Sample.Name 列。 Furthermore, I'd like to obtain new variables to be labelled as follows (eg xxx & xxx.1, yyy & yyy.1, zzz & zzz.1) for the column labelled as "value".
此外,我想为标记为“值”的列获得如下标记的新变量(例如 xxx & xxx.1、yyy & yyy.1、zzz & zzz.1)。
The table I'd like to achieve looks like the following:我想要实现的表如下所示:
Sample.File Sample.Name xxx xxx.1 yyy yyy.1 zzz zzz.1
1 a a_1 16 18 16 20 9 13
2 b b_1 10 10 6 12 14 14
I'd like to use a code without writing the name of the labels reported into "Marker" column (since I could obtain up to 100 different labels).我想使用代码而不将报告的标签名称写入“标记”列(因为我可以获得多达 100 个不同的标签)。 I tried to use the following code but I couldn't achieve my goal:
我尝试使用以下代码,但无法实现我的目标:
I tried to use the following code but I couldn't achieve my goal:我尝试使用以下代码,但无法实现我的目标:
library(dplyr)
library(tidyr)
data %>%
gather(Sample.File, Sample.Name) %>%
spread(value)
Error: `var` must evaluate to a single number or a column name, not a double vector
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
attributes are not identical across measure variables;
they will be dropped
I'd be very grateful if anybody could attend to this matter!如果有人能处理这件事,我将不胜感激!
Here is one way to do it.这是一种方法。 We can create an ID for each
Marker
and then create a column.我们可以为每个
Marker
创建一个 ID,然后创建一个列。 After that, we can convert it to wide format.之后,我们可以将其转换为宽格式。
library(dplyr)
library(tidyr)
data2 <- data %>%
group_by_at(vars(-value)) %>%
mutate(N = row_number() - 1) %>%
unite(col = "Marker", Marker, N, sep = ".") %>%
pivot_wider(names_from = "Marker", values_from = "value") %>%
ungroup()
data2
# # A tibble: 2 x 8
# Sample.File Sample.Name xxx.0 xxx.1 yyy.0 yyy.1 zzz.0 zzz.1
# <fct> <fct> <int> <int> <int> <int> <int> <int>
# 1 a a_1 16 18 16 20 9 13
# 2 b b_1 10 10 6 12 14 14
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.