简体   繁体   English

如何聚合一列中的值以创建新列

[英]How to aggregate values in one column to create a new column

I have a data frame with many repeating values in certain columns.我有一个数据框,在某些列中有许多重复值。 I would like to create a new columns with a new value for each unique entry in the column of interest.我想为感兴趣的列中的每个唯一条目创建一个具有新值的新列。 I have looked around in aggregation related questions on Stack Overflow and haven't quite found what I am looking for.我在 Stack Overflow 上查看了与聚合相关的问题,但还没有找到我要找的东西。

dput(head(example)) output is below. dput(head(example)) output 如下。

structure(list(avecor = c(-0.929199786400515, -0.729228501795928, 
-0.431983639087243, -0.55088842103792, -0.978422379116014, -0.627856061946295
), miR = structure(c(9L, 5L, 6L, 2L, 8L, 4L), .Label = c("hsa-miR-107", 
"hsa-miR-193a-3p", "hsa-miR-28-5p", "hsa-miR-331-3p", "hsa-miR-362-3p", 
"hsa-miR-362-5p", "hsa-miR-429", "hsa-miR-590-5p", "hsa-miR-630"
), class = "factor"), mRNA = structure(c(1L, 2L, 2L, 3L, 3L, 
4L), .Label = c("IGF1R", "PRKCA", "TESK2", "THBS1", "TLN2", "VAV3"
), class = "factor")), row.names = c("hsa-miR-630:IGF1R", "hsa-miR-362-3p:PRKCA", 
"hsa-miR-362-5p:PRKCA", "hsa-miR-193a-3p:TESK2", "hsa-miR-590-5p:TESK2", 
"hsa-miR-331-3p:THBS1"), class = "data.frame")
                          avecor             miR  mRNA
hsa-miR-630:IGF1R     -0.9291998     hsa-miR-630 IGF1R
hsa-miR-362-3p:PRKCA  -0.7292285  hsa-miR-362-3p PRKCA
hsa-miR-362-5p:PRKCA  -0.4319836  hsa-miR-362-5p PRKCA
hsa-miR-193a-3p:TESK2 -0.5508884 hsa-miR-193a-3p TESK2
hsa-miR-590-5p:TESK2  -0.9784224  hsa-miR-590-5p TESK2
hsa-miR-331-3p:THBS1  -0.6278561  hsa-miR-331-3p THBS1
hsa-miR-28-5p:TLN2    -0.9988643   hsa-miR-28-5p  TLN2
hsa-miR-331-3p:TLN2   -0.8773624  hsa-miR-331-3p  TLN2
hsa-miR-429:TLN2      -0.9901250     hsa-miR-429  TLN2
hsa-miR-107:VAV3      -0.7713383     hsa-miR-107  VAV3

If applied to the mRNA column, the ideal output would be:如果应用于mRNA列,理想的 output 将是:

                          avecor             miR  mRNA UniquemRNA
hsa-miR-630:IGF1R     -0.9291998     hsa-miR-630 IGF1R 1 
hsa-miR-362-3p:PRKCA  -0.7292285  hsa-miR-362-3p PRKCA 2
hsa-miR-362-5p:PRKCA  -0.4319836  hsa-miR-362-5p PRKCA 2
hsa-miR-193a-3p:TESK2 -0.5508884 hsa-miR-193a-3p TESK2 3
hsa-miR-590-5p:TESK2  -0.9784224  hsa-miR-590-5p TESK2 3
hsa-miR-331-3p:THBS1  -0.6278561  hsa-miR-331-3p THBS1 4
hsa-miR-28-5p:TLN2    -0.9988643   hsa-miR-28-5p  TLN2 5
hsa-miR-331-3p:TLN2   -0.8773624  hsa-miR-331-3p  TLN2 5
hsa-miR-429:TLN2      -0.9901250     hsa-miR-429  TLN2 5
hsa-miR-107:VAV3      -0.7713383     hsa-miR-107  VAV3 6

Any help would be most appreciated.非常感激任何的帮助。

If I understand you correctly, you did already create that column by creating mRNA as a factor .如果我理解正确,您确实已经通过创建mRNA作为一个factor来创建该列。 If that is really what you want, you could just recode the factor into numeric values.如果这确实是您想要的,您可以将factor重新编码为numeric But that does just replicate the information that is already there.但这确实只是复制了已经存在的信息。 This is how you could go about doing that:这就是你可以 go 这样做的方式:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
structure(list(avecor = c(-0.929199786400515, -0.729228501795928, 
                          -0.431983639087243, -0.55088842103792, -0.978422379116014, -0.627856061946295
), miR = structure(c(9L, 5L, 6L, 2L, 8L, 4L), .Label = c("hsa-miR-107", 
                                                         "hsa-miR-193a-3p", "hsa-miR-28-5p", "hsa-miR-331-3p", "hsa-miR-362-3p", 
                                                         "hsa-miR-362-5p", "hsa-miR-429", "hsa-miR-590-5p", "hsa-miR-630"
), class = "factor"), mRNA = structure(c(1L, 2L, 2L, 3L, 3L, 
                                         4L), .Label = c("IGF1R", "PRKCA", "TESK2", "THBS1", "TLN2", "VAV3"
                                         ), class = "factor")), row.names = c("hsa-miR-630:IGF1R", "hsa-miR-362-3p:PRKCA", 
                                                                              "hsa-miR-362-5p:PRKCA", "hsa-miR-193a-3p:TESK2", "hsa-miR-590-5p:TESK2", 
                                                                              "hsa-miR-331-3p:THBS1"), class = "data.frame") %>% 
mutate(UniquemRNA = as.numeric(mRNA))
#>       avecor             miR  mRNA UniquemRNA
#> 1 -0.9291998     hsa-miR-630 IGF1R          1
#> 2 -0.7292285  hsa-miR-362-3p PRKCA          2
#> 3 -0.4319836  hsa-miR-362-5p PRKCA          2
#> 4 -0.5508884 hsa-miR-193a-3p TESK2          3
#> 5 -0.9784224  hsa-miR-590-5p TESK2          3
#> 6 -0.6278561  hsa-miR-331-3p THBS1          4

I use R base package.我使用 R 基础 package。

 df<-structure(list(avecor = c(-0.929199786400515, -0.729228501795928, 
    -0.431983639087243, -0.55088842103792, -0.978422379116014, -0.627856061946295
    ), miR = structure(c(9L, 5L, 6L, 2L, 8L, 4L), .Label = c("hsa-miR-107", 
    "hsa-miR-193a-3p", "hsa-miR-28-5p", "hsa-miR-331-3p", "hsa-miR-362-3p", 
    "hsa-miR-362-5p", "hsa-miR-429", "hsa-miR-590-5p", "hsa-miR-630"
    ), class = "factor"), mRNA = structure(c(1L, 2L, 2L, 3L, 3L, 
    4L), .Label = c("IGF1R", "PRKCA", "TESK2", "THBS1", "TLN2", "VAV3"
    ), class = "factor")), row.names = c("hsa-miR-630:IGF1R", "hsa-miR-362-3p:PRKCA", 
    "hsa-miR-362-5p:PRKCA", "hsa-miR-193a-3p:TESK2", "hsa-miR-590-5p:TESK2", 
    "hsa-miR-331-3p:THBS1"), class = "data.frame")



 UniquemRNA<-c()
    for (i in 1:length(table(df$mRNA))){
      fre <- rep(i, table(df$mRNA)[[i]])
        UniquemRNA<-c(UniquemRNA,fre)
    }
    UniquemRNA
df$UniquemRNA<-UniquemRNA
df

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在R中添加新列和聚合值 - How to add a new column and aggregate values in R 以 5 分钟的间隔聚合列值并创建一个新的数据框 - aggregate column values at 5 min intervals and create a new dataframe 如何基于一个数据框中的列的值和R中另一个数据框的列标题名称有条件地创建新列 - how to conditionally create new column based on the values of a column in one dataframe and the column header names of another dataframe in R 如何基于另一列的值聚合一列的R数据帧 - How to aggregate R dataframe of one column based on values of another 创建新的数据框,将列名作为行名,并将一列中的值作为新列名 - Create new data frame with column names as row names, and values from one column as new column names 当“ by”列值之一为NA时进行汇总 - Aggregate when one of the “by” column-values is NA 如何使用索引为 1 的另一列中的值创建新列 - How to create a new column with values from another column with index-1 如何 map 将列值分离到一个新列中? - How to map separated column-values into one new column? 如何基于R中的另一列创建具有多个值的新列 - How to create a new column with multiple values based on another column in R 如何使用现有列中的值创建新列,以告知新值将来自哪一列? - How to create new column using values in an existing column to tell which column the new values will come from?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM