简体   繁体   English

如果行包含 R 中同一变量的两个值,则转置数据帧

[英]Transpose a dataframe in case of rows contain two values for the same variable in R

I'm dealing with a dataframe that contains a variable named "Marker" which shows two values all the samples I collected.我正在处理一个数据框,其中包含一个名为“Marker”的变量,它显示了我收集的所有样本的两个值。 The dataframe is, for instance, as follows:例如,数据框如下:

Sample.File Sample.Name Marker value
1            a         a_1    xxx    16
2            a         a_1    xxx    18
3            a         a_1    yyy    16
4            a         a_1    yyy    20
5            a         a_1    zzz     9
6            a         a_1    zzz    13
7            b         b_1    xxx    10
8            b         b_1    xxx    10
9            b         b_1    yyy     6
10           b         b_1    yyy    12
11           b         b_1    zzz    14
12           b         b_1    zzz    14

which is provided by the following code:这是由以下代码提供的:

data <- data.frame(
   Sample.File = as.factor(c("a", "a", "a", "a", "a", "a", "b", "b", "b", "b",
                             "b", "b")),
   Sample.Name = as.factor(c("a_1", "a_1", "a_1", "a_1", "a_1", "a_1", "b_1",
                             "b_1", "b_1", "b_1", "b_1", "b_1")),
        Marker = as.factor(c("xxx", "xxx", "yyy", "yyy", "zzz", "zzz", "xxx",
                             "xxx", "yyy", "yyy", "zzz", "zzz")),
   value = c(16L, 18L, 16L, 20L, 9L, 13L, 10L, 10L, 6L, 12L, 14L, 14L)
)

The new dataframe I'd like to work with is should be achieved by transposing the current data, but maintaining the columns Sample.File and Sample.Name for all the collected samples.我想要使​​用的新数据框应该通过转置当前数据来实现,但为所有收集的样本维护 Sample.File 和 Sample.Name 列。 Furthermore, I'd like to obtain new variables to be labelled as follows (eg xxx & xxx.1, yyy & yyy.1, zzz & zzz.1) for the column labelled as "value".此外,我想为标记为“值”的列获得如下标记的新变量(例如 xxx & xxx.1、yyy & yyy.1、zzz & zzz.1)。

The table I'd like to achieve looks like the following:我想要实现的表如下所示:

  Sample.File Sample.Name xxx xxx.1 yyy yyy.1 zzz zzz.1
1           a         a_1  16    18  16    20   9    13
2           b         b_1  10    10   6    12  14    14

I'd like to use a code without writing the name of the labels reported into "Marker" column (since I could obtain up to 100 different labels).我想使用代码而不将报告的标签名称写入“标记”列(因为我可以获得多达 100 个不同的标签)。 I tried to use the following code but I couldn't achieve my goal:我尝试使用以下代码,但无法实现我的目标:

I tried to use the following code but I couldn't achieve my goal:我尝试使用以下代码,但无法实现我的目标:

library(dplyr)
library(tidyr)
data %>%
  gather(Sample.File, Sample.Name) %>%
  spread(value)

Error: `var` must evaluate to a single number or a column name, not a double vector
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
attributes are not identical across measure variables;
they will be dropped

I'd be very grateful if anybody could attend to this matter!如果有人能处理这件事,我将不胜感激!

Here is one way to do it.这是一种方法。 We can create an ID for each Marker and then create a column.我们可以为每个Marker创建一个 ID,然后创建一个列。 After that, we can convert it to wide format.之后,我们可以将其转换为宽格式。

library(dplyr)
library(tidyr)

data2 <- data %>%
  group_by_at(vars(-value)) %>%
  mutate(N = row_number() - 1) %>%
  unite(col = "Marker", Marker, N, sep = ".") %>%
  pivot_wider(names_from = "Marker", values_from = "value") %>%
  ungroup()
data2
# # A tibble: 2 x 8
#   Sample.File Sample.Name xxx.0 xxx.1 yyy.0 yyy.1 zzz.0 zzz.1
#   <fct>       <fct>       <int> <int> <int> <int> <int> <int>
# 1 a           a_1            16    18    16    20     9    13
# 2 b           b_1            10    10     6    12    14    14

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM