使用不同的阈值替换多列中的值

Question

I have a data set with multiple columns containing quantitative data that I want to transform into binary. 我有一个包含多列的数据集，其中包含要转换为二进制的定量数据。 To do, I would like to use thresholds that are different for each column. 为此，我想对每列使用不同的阈值。

Example 例

Input: 输入：

  antigen1 antigen2 antigen3 antigen4
1      215      421        2       12
2     1524       33      112      443
3      944      836      343       32
4       53      321      563        4

Code to generate the data set: 生成数据集的代码：

input <- data.frame(
  antigen1 = c(215,1524,944,53),
  antigen2 = c(421, 33, 836,321),
  antigen3 = c(2,112,343,563),
  antigen4 = c(12,443,32,4))

Thresholds for each column, for antigen1 to antigen4 respectively: 100, 50, 400, 100 抗原1至抗原4的每列阈值分别为：100、50、400、100

Output: 输出：

  antigen1 antigen2 antigen3 antigen4
1        1        1        0        0
2        1        0        0        1
3        1        1        0        0
4        0        1        1        0

This is what I've tried, using R: 这是我使用R尝试过的：

# Define lists
cut_offs <- c(100,50,400,100)
antigens <- names(input[1:ncol(input)])

# Loop through both lists
for (anti in antigens) {
  for (co in cut_offs) {
    input[[anti]][input[[anti]]]<cut_offs[co] <- 0 
    input[[anti]][input[[anti]]]>=cut_offs[co] <- 1
  }
}

How can I make both "anti" and "co" increase simultaneously by one after each loop? 每个循环后，如何使“ anti”和“ co”同时增加一？

Answer 1

We can do this in a vectorized manner without any loops 我们可以以向量化的方式进行此操作，而无需任何循环

+(input >= cut_offs[col(input)])
#      antigen1 antigen2 antigen3 antigen4
#[1,]        1        1        0        0
#[2,]        1        0        0        1
#[3,]        1        1        0        0
#[4,]        0        1        1        0

Answer 2

We could use mapply 我们可以使用mapply

+(mapply(`>=`, input, cut_offs))

#     antigen1 antigen2 antigen3 antigen4
#[1,]        1        1        0        0
#[2,]        1        0        0        1
#[3,]        1        1        0        0
#[4,]        0        1        1        0

We can wrap it in data.frame if you need data frame as final output 如果您需要数据帧作为最终输出，我们可以将其包装在data.frame

data.frame(+(mapply(`>=`, input, cut_offs)))

Or with sapply 或与sapply

sapply(seq_along(cut_offs), function(x) +(input[, x] > cut_offs[x]))

As far as your for loop is concerned you need only one loop since length(cut_offs) would be equal to number of columns in input , so we can loop over them using same index. 就您的for循环而言，您只需要一个循环，因为length(cut_offs)等于input的列数，因此我们可以使用相同的索引对其进行循环。

temp <- replace(input, TRUE, 0) #Initialise with all values as 0

for (x in seq_along(cut_offs)) {
    temp[input[, x] >= cut_offs[x], x] <- 1 
}

temp
#  antigen1 antigen2 antigen3 antigen4
#1        1        1        0        0
#2        1        0        0        1
#3        1        1        0        0
#4        0        1        1        0

使用不同的阈值替换多列中的值

问题描述

2 个解决方案

解决方案1
2 2019-05-01 14:06:52

解决方案2
1 已采纳 2019-05-01 09:09:01

使用不同的阈值替换多列中的值

问题描述

2 个解决方案

解决方案1 2 2019-05-01 14:06:52

解决方案2 1 已采纳 2019-05-01 09:09:01

解决方案1
2 2019-05-01 14:06:52

解决方案2
1 已采纳 2019-05-01 09:09:01