简体   繁体   English

如何从(9)R中的重复值创建一个新变量?我需要循环吗?

[英]How to create a new variable from (9) repeated values in R? Do I need loops?

Firstly, I apologize for the vagueness of the title. 首先,我为标题的模糊性道歉。 I have a dataset which contains dichotomous values coded 0 and 1 for a certain variable X. v001 is the subject identifier and the values from v1pc10le8 to v9pc10le8 are the values for X at each of the nine visits. 我有一个包含编码二分值的数据集01对某个变量X. v001是主题标识符和从值v1pc10le8v9pc10le8是在每个9次的X的值。 In addition, firstpc10 and lastpc10 signify the first (baseline) and last measurements for X respectively. 此外, firstpc10lastpc10表示X的第一次(基线)和最后一次测量。

      v001 firstpc10 lastpc10 v1pc10le8 v2pc10le8 v3pc10le8 v4pc10le8 v5pc10le8 v6pc10le8 v7pc10le8 v8pc10le8 v9pc10le8
1473 28084         0        0         0      <NA>         0      <NA>      <NA>         0         0      <NA>      <NA>
1474 28089         0        0      <NA>      <NA>      <NA>         0      <NA>         0      <NA>      <NA>      <NA>
1475 28102         0        1      <NA>      <NA>         0         0         0         0         1      <NA>      <NA>
1476 28103         0        1      <NA>      <NA>      <NA>         0         0         0         0         1         1
1477 28119         0        0      <NA>      <NA>      <NA>         0      <NA>         0         0         0      <NA>
1478 28184         0        1      <NA>      <NA>         0      <NA>      <NA>         0      <NA>      <NA>         1
1479 28202         1        1      <NA>      <NA>         1      <NA>         0         0         0         1         1
1480 28211         0        0         0      <NA>         0         0      <NA>      <NA>      <NA>      <NA>      <NA>
1481 28212         0        1         0      <NA>      <NA>         1      <NA>      <NA>      <NA>      <NA>      <NA>
1482 28213         0        0      <NA>      <NA>         0      <NA>      <NA>         0      <NA>      <NA>      <NA>
1483 28214         0        0      <NA>      <NA>      <NA>         0         0         0      <NA>         1         0
1484 28215         0        0      <NA>      <NA>      <NA>         0      <NA>         0         0         0         0
1485 28232         0        1      <NA>      <NA>         0      <NA>         0         1      <NA>      <NA>      <NA>
1486 28244         1        1         1      <NA>      <NA>      <NA>         0         0         0         0         1
1487 28258         0        1      <NA>      <NA>      <NA>         0      <NA>         0         1      <NA>         1
1488 28281         0        1      <NA>      <NA>      <NA>         0         0         0         1      <NA>      <NA>
1489 28303         0        0         0      <NA>      <NA>      <NA>      <NA>         0         0         0      <NA>
1490 28337         0        1      <NA>      <NA>         0      <NA>      <NA>         0      <NA>         1      <NA>
1491 28355         1        1      <NA>      <NA>         1      <NA>         0      <NA>         0         1      <NA>
1492 29983         0        0      <NA>      <NA>      <NA>         0         0      <NA>         0         0         0

I want to ignore all the NA and compute a new variable called "change" which has the following values: 我想忽略所有NA并计算一个名为“change”的新变量,该变量具有以下值:

1 - if subjects were 0 at baseline and remained 0 throughout 1 - 如果受试者在基线时为0并且始终为0

2 - if subjects were 1 at baseline and remained 1 throughout 2 - 如果受试者在基线时为1并且始终保持1

3 - if subjects were 1 at baseline and changed to 0 (and remained 0 throughout) 3 - 如果受试者在基线时为1并且改为0(并且始终为0)

4 - if subjects were 0 at baseline and changed to 1 (and remained 1 throughout) 4 - 如果受试者在基线时为0并且改为1(并且始终保持为1)

5 - if subjects fluctuated between values of 0 and 1 without a trend (eg subject # 28214 ) - these are subjects who don't fit in the above 4 catagories 5 - 如果受试者在没有趋势的0和1之间波动(例如主题# 28214 ) - 这些是不适合上述4个28214的主题

This is the output I expect to see: 这是我期望看到的输出:

      v001   change
1473 28084      1
1474 28089      1 
1475 28102      4
1476 28103      4
1477 28119      1
1478 28184      4    
1479 28202      5
1480 28211      1
1481 28212      4
1482 28213      1
1483 28214      5
1484 28215      1
1485 28232      4
1486 28244      5
1487 28258      4
1488 28281      4
1489 28303      1
1490 28337      4
1491 28355      5
1492 29983      1 

I tried to do this with SPSS and R but I am having huge difficulties and I will greatly appreciate any help. 我试图用SPSS和R做到这一点,但我遇到了很大的困难,我将非常感谢任何帮助。 (I have included the dput output from R below). (我已经包括dput从下方右输出)。

Thank you! 谢谢!

structure(list(v001 = c(28084, 28089, 28102, 28103, 28119, 28184, 
28202, 28211, 28212, 28213, 28214, 28215, 28232, 28244, 28258, 
28281, 28303, 28337, 28355, 29983), firstpc10 = c(0, 0, 0, 0, 
0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0), lastpc10 = c(0, 
0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0), v1pc10le8 = c(0, 
NA, NA, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA, 1, NA, NA, 0, NA, 
NA, NA), v2pc10le8 = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_), v3pc10le8 = c(0, NA, 0, NA, NA, 0, 1, 0, 
NA, 0, NA, NA, 0, NA, NA, NA, NA, 0, 1, NA), v4pc10le8 = c(NA, 
0, 0, 0, 0, NA, NA, 0, 1, NA, 0, 0, NA, NA, 0, 0, NA, NA, NA, 
0), v5pc10le8 = c(NA, NA, 0, 0, NA, NA, 0, NA, NA, NA, 0, NA, 
0, 0, NA, 0, NA, NA, 0, 0), v6pc10le8 = c(0, 0, 0, 0, 0, 0, 0, 
NA, NA, 0, 0, 0, 1, 0, 0, 0, 0, 0, NA, NA), v7pc10le8 = c(0, 
NA, 1, 0, 0, NA, 0, NA, NA, NA, NA, 0, NA, 0, 1, 1, 0, NA, 0, 
0), v8pc10le8 = c(NA, NA, NA, 1, 0, NA, 1, NA, NA, NA, 1, 0, 
NA, 0, NA, NA, 0, 1, 1, 0), v9pc10le8 = c(NA, NA, NA, 1, NA, 
1, 1, NA, NA, NA, 0, 0, NA, 1, 1, NA, NA, NA, NA, 0)), .Names = c("v001", 
"firstpc10", "lastpc10", "v1pc10le8", "v2pc10le8", "v3pc10le8", 
"v4pc10le8", "v5pc10le8", "v6pc10le8", "v7pc10le8", "v8pc10le8", 
"v9pc10le8"), row.names = 1473:1492, class = "data.frame")

I defined a function to output 1-5 depending on the starting condition and the number of times the status changed from 0 to 1. I used the rowwise() function from the package dplyr to apply that function to each row of the data frame. 我根据起始条件和状态从0变为1的次数定义了一个输出1-5的函数。我使用了包dplyr中的rowwise()函数将该函数应用于数据帧的每一行。 I called the input data frame dat . 我调用了输入数据帧dat The function I defined uses diff() to count the number of times the status "flips" from 0 to 1 and tests whether it does so exactly once, and depending on the baseline status, returns 1,2,3,4,or 5. 我定义的函数使用diff()来计算状态“从0翻转”到1的次数,并测试它是否只执行一次,并根据基线状态返回1,2,3,4或5 。

classify_change <- function(x) {
  baseline <- x$firstpc10
  visits <- na.omit(as.numeric(x[grepl('le8', names(x))]))

  # Count number of times the status flips from 0 to 1 between visits
  n_flips <- sum(diff(visits) != 0)

  answer <- 5

  if (baseline == 0 & n_flips == 0) answer <- 1
  if (baseline == 1 & n_flips == 0) answer <- 2
  if (baseline == 1 & n_flips == 1) answer <- 3
  if (baseline == 0 & n_flips == 1) answer <- 4

  return(data.frame(change = answer))

}

library(dplyr)

dat %>%
  rowwise %>%
  do(classify_change(.))

I notice your expected output contains zeroes but the description of the categories only has 1-5 as possible outcomes. 我注意到您的预期输出包含零,但类别的描述只有1-5个可能的结果。 This function returns 1 for those rows. 此函数为这些行返回1。

@qdread's solution is great in terms of compactness and neatness. @ qdread的解决方案在紧凑性和整洁性方面非常出色。 Adding to that great approach, I would like to post a solution that demonstrates how can one approach such problems in a functional way. 除了这个伟大的方法,我想发布一个解决方案,演示如何以功能的方式解决这些问题。
.

The first step is identifying the columns that should be used as the base, and the visits, which is basically straight forward: 第一步是确定应该用作基础的列,以及访问,这基本上是直截了当的:

library(magrittr)

# Define the columns to be used 
col.visits = colnames(df)[4:ncol(df)] # Visits are represented from column 4 on
col.baseline = "firstpc10"
col.final = "lastpc10"

.

A second step is thinking about how would you define "remained 0/1 throughout": 第二步是考虑如何定义“始终为0/1”:

# Define unit functions 
single_change_to_1 = function(numeric_array){
  positive_change = (diff(numeric_array) == 1)  # True if 0 -> 1 change occured
  return(sum(positive_change, na.rm = T) == 1)  # Return True if only 1 change occured 
}

single_change_to_0 = function(numeric_array){
  negative_change = (diff(numeric_array) == -1)  # True if 1 -> 0 change occured
  return(sum(negative_change, na.rm = T) == 1)   # Return True if only 1 change occured 
}

.

A third step is putting together your conditions in a function: 第三步是将您的条件放在一个函数中:

calculate_change = function(patientInfo){
  # Extract data 
  patient.base = patientInfo[[col.baseline]]
  patient.visits = patientInfo[col.visits] %>% as.numeric %>% .[!is.na(.)] # Turn to vector, and Discard NAs 

  # Apply if-else
  if(patient.base == 0 && all(patient.visits == 0)) return(1)
  if(patient.base == 1 && all(patient.visits == 1)) return(2)                                         

  if(patient.base == 1 && single_change_to_0(patient.visits) && !single_change_to_1(patient.visits)) return(3)                                         
  if(patient.base == 0 && single_change_to_1(patient.visits) && !single_change_to_0(patient.visits)) return(4)   

  # If the entry didnt match any of the previous conditions, return 5
  return(5)
}

.

And finally, apply the change function to each row: 最后,将更改函数应用于每一行:

df[["change"]] = apply(df, 1, calculate_change)
df[["change"]]
# [1] 1 1 4 4 1 4 5 1 4 1 5 1 4 5 4 4 1 4 5 1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 R 中的 ID 内使用重复测量时,如何为之前的观察值创建变量? - When using repeated measures within IDs in R, how do I create a variable for a value from the observation before? 如何从一个变量中创建两个新变量,并在 R 中为其附加虚拟值? - How do I create two new variables out of one variable, and attach dummy values to it in R? 如何根据 R 中的两个分类值创建新变量? - How do I create a new variable based on two categorical values in R? 如何根据行值合并 R 中的数据并用它创建新变量? - How do I merge data in R based on row values and create new variable with it? 如果 R 中不存在新变量,如何创建它? - How do I create a new variable in R if it does not already exist? 如何应用多个条件从 R 中的当前变量创建新变量? - How do I apply multiple conditions to create new variable from current variables in R? 如何用向量中的重复值填充新变量? - How to populate new variable with repeated values from vector? 在R中使用do循环来创建新变量 - Using do loops in R to create new variables 如何在不使用长变量值作为新变量名的情况下将 R dataframe 从长改型为宽? - How do I reshape an R dataframe from long to wide without using the long variable values as the new variable names? 如何在 R 中使用循环来创建一系列值 - How can I use loops in R to create a series of values
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM