简体   繁体   English

根据其他列的结果向数据框添加新列

[英]Add a new column to a dataframe based on results from other columns

I'm very new to R so I hope my question will be interesting. 我对R很新,所以我希望我的问题会很有趣。 What I want to do is quite straightforward. 我想做的事情非常简单。 Here's a sample of my dataset: 这是我的数据集的示例:

> head(belongliness)
   ACTIVITY_X ACTIVITY_Y ACTIVITY_Z   Event  cluster1    cluster2     cluster3    cluster4
1:         40         47         62 Head-up 0.1900989 0.768225365 0.0160654667 0.025610279
2:         60         74         95 Head-up 0.5392218 0.038558310 0.0064671635 0.415752686
3:         62         63         88 Head-up 0.7953673 0.044981152 0.0067121719 0.152939414
4:         60         56         82 Head-up 0.9941016 0.002608879 0.0003007537 0.002988748
5:         66         61         90 Head-up 0.7027407 0.048318016 0.0079239680 0.241017291
6:         60         53         80 Head-up 0.9541378 0.023338896 0.0024442116 0.020079071

I would like to create a new column "winning cluster" to the right side of column "cluster 4" . 我想在列"cluster 4"的右侧创建一个新的"winning cluster"列。 Column "winning cluster" will take the highest value among columns "cluster 1" to "cluster 4" for each row and display the index name of that column. "winning cluster"将在每行的"cluster 1"列到"cluster 4"取最高值,并显示该列的索引名称。

For row 1 that will be cluster 2 , for row 2 cluster 1 , for row 3 cluster 1 etc. 对于行1,这将是cluster 2 ,对第2行cluster 1 ,对第3行cluster 1

Any help is appreciated! 任何帮助表示赞赏!

If the dataset is a data.table class, specify the columns of interest in .SDcols , get the column index of highest value in each row with max.col , use that to select the column name and assign ( := ) as 'winning_cluster' 如果数据集是一个data.table类,指定的兴趣列.SDcols ,得到最高值的列索引,每行max.col ,用它来选择列名并分配( := )为“winning_cluster “

library(data.table)
belongliness[, winning_cluster := names(.SD)[max.col(.SD)], 
           .SDcols = cluster1:cluster4]

In basic R, this is easily done: 在基本的R中,这很容易做到:

belongliness$`winning cluster` = apply(belongliness[,5:8], 1, max)

where belongliness[,5:8] corresponds to columns cluster1 through cluster4 . 其中, belongliness[,5:8]对应于cluster1cluster4列。

Or if you wanted the index, 或者如果你想要索引,

belongliness$`winning cluster` = apply(belongliness[,5:8], 1, which.max)
belongliness$`winning cluster` = paste0('cluster', belongliness$`winning cluster`)

Edit: the right hand side of the first line is essentially max.col : 编辑:第一行的右侧基本上是max.col

belongliness$`winning cluster` = max.col(belongliness[,5:8])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM