简体   繁体   English

需要一些帮助编写函数

[英]Need some help writing a function

I'm trying to write a function that takes a few lines of code and allows me to input a single variable. 我正在尝试编写一个函数,该函数需要几行代码,并允许我输入一个变量。 I've got the code below that creates an object using the Surv function (Survival package). 我下面有使用Surv函数(Survival包)创建对象的代码。 The second line takes the variable in question, in this case a column listed as Variable_X, and outputs data that can then be visualized using ggsurvplot. 第二行使用有问题的变量,在本例中为列为Variable_X的列,然后输出可以使用ggsurvplot可视化的数据。 The output is a Kaplan-Meier survival curve. 输出是Kaplan-Meier生存曲线。 What I'd like to do is have a function such that i can type f(Variable_X) and have the output KM curve visualized for whichever column I choose from the data. 我想做的是具有一个函数,使我可以键入f(Variable_X)并可视化从数据中选择的任何列的输出KM曲线。 I want f(y) to output the KM as if I had put y where the ~Variable_X currently is. 我希望f(y)输出KM,就像我将y放置在〜Variable_X当前所在的位置一样。 I'm new to R and very new to how functions work, I've tried the below code but it obviously doesn't work. 我对R并不陌生,对函数的工作方式也很陌生,我尝试了以下代码,但显然不起作用。 I'm working through datacamp and reading posts but I'm having a hard time with it, appreciate any help. 我正在研究数据营和阅读帖子,但是我很难过,感谢您的帮助。

surv_object <- Surv(time = KMeier_DF$Followup_Duration, event = KMeier_DF$Death_Indicator)

fitX <- survfit(surv_object ~ Variable_X, data = KMeier_DF)

ggsurvplot(fitX, data = KMeier_DF, pval = TRUE)

 f<- function(x) {
 dat<-read.csv("T:/datafile.csv")
 KMeier_DF < - dat
 surv_object <- Surv(time = KMeier_DF$Followup_Duration, event = 
 KMeier_DF$Death_Indicator)
 fitX<-survfit(surv_object ~ x, data = KMeier_DF)
 PlotX<- ggsurvplot(fitX, data = KMeier_DF, pval = TRUE)
 return(PlotX)
}

The crux of the problem you have is actually a tough stumbling block to figure out initially: how to pass variable or dataframe column names into a function . 您遇到的问题的症结实际上是一个很难解决的绊脚石: 如何将变量或数据列名传递给函数 I created some example data. 我创建了一些示例数据。 In the example below I supply a function four variables, one of which is your data. 在下面的示例中,我提供了一个函数四个变量,其中之一是您的数据。 You can see two ways I call on the columns, using [[]] , and [,] , which you can think of as being equivalent to using $ . 您可以看到我在列上调用的两种方式,分别是[[]][,] ,您可以认为它们等同于使用$ Outside of functions, they are, but not inside. 在函数外部,它们在内部,但不是内部。 The print functions are there to just show you the data along the way. 那里的print功能只是向您显示数据。 If those objects exist in your global environment, remove them one by one, rm(surv_object) , or clear them all rm(list = ls()) . 如果这些对象存在于您的全局环境中,则将它们rm(surv_object)逐一删除,或清除所有rm(list = ls())

duration <- c(1, 3, 4, 3, 3, 4, 2)
di <- c(1, 1, 0, 0, 0, 0, 1)
color <- c(1, 1, 2, 2, 3, 3, 4)
KMdf <- data.frame(duration, di, color)

testfun <- function(df, varb1, varb2, varb3) {
  surv_object <- Surv(time = df[[varb1]], event = df[ , varb2])
  print(surv_object)
  fitX <- survfit(surv_object ~ df[[varb3]], data = df)
  print(fitX)
#  plotx <- ggsurvplot(fitX, data = df, pval = TRUE) # this gives an error that surv_object is not found
#  return(plotx)
}

testfun(KMdf, "duration", "di", "color") # notice the use of quotes here, if not you'll get an error about object not found.

And even better, you have an even tougher stumbling block: how r handles variables and where it looks for them . 更好的是,您还有一个更艰难的绊脚石: r如何处理变量以及在哪里寻找变量。 From what I can tell, you're running into that because there is possibly a bug in ggsurvplot and looking at the global environment for variables, and not inside the function. 据我所知,您正在遇到这种情况,因为ggsurvplot 可能存在一个错误 ,并且正在全局环境中查找变量,而不是在函数内部。 They closed the issue, but as far as I can tell, it's still there. 他们解决了问题,但据我所知,它仍然存在。 When you try to run the ggsurvplot line, you'll get an error that you would get if you didn't supply a variable: 当您尝试运行ggsurvplot行时,将得到一个错误,如果不提供变量,则会出现此错误:

Error in eval(inp, data, env) : object 'surv_object' not found.

Hopefully that helps. 希望有帮助。 I'd submit a bug report if I were you. 如果您是我,我将提交错误报告。

edit 编辑

I was hoping this solution would help , but it doesn't. 我希望此解决方案可以帮助您 ,但没有帮助

testfun <- function(df, varb1, varb2, varb3) {
  surv_object <- Surv(time = df[[varb1]], event = df[,varb2])
  print(surv_object)
  fitX <- survfit(surv_object ~ df[[varb3]], data = df)
  print(fitX)
  attr(fitX[['strata']], "names") <- c("color = 1", "color = 2", "color = 3", "color = 4")
  plotx <- ggsurvplot(fitX, data = df, pval = TRUE) # this gives an error that surv_object is not found
  return(plotx)
}

Error in eval(inp, data, env) : object 'surv_object' not found

This is homework, right? 这是家庭作业,对不对?

First, you need to try to run the code before you provide it as an example. 首先,在提供示例之前,您需要尝试运行代码。 Your example has several fatal errors. 您的示例有几个致命错误。 ggsurvplot() needs either a library call to survminer or to be summoned as follows: survminer::ggsurvplot() . ggsurvplot()需要对survminer的库调用或如下所示的调用: survminer::ggsurvplot()

You have defined a function f , but you never used it. 您已经定义了函数f ,但从未使用过。 In the function definition, you have a wayward space < - . 在函数定义中,您有一个任意的空格< - It never would have worked. 它永远都行不通。

I suggest you start by defining a function that calculates the sum of two numbers, or concatenates two strings. 我建议您先定义一个函数,该函数计算两个数字的和,或连接两个字符串。 Start here or here . 这里这里开始。 Then, you can return to the Kaplan-Meier stuff. 然后,您可以返回Kaplan-Meier资料。

Second, in another class or two, you will need to know the three parts of a function. 其次,在另一个或两个类中,您将需要了解函数的三个部分。 You will need to understand the scope of a function. 您将需要了解函数的范围。 You might as well dig into the basics before you start copy-and-pasting. 在开始复制和粘贴之前,您不妨深入了解基础知识。

Third, before you post another question, please read How to make a great R reproducible example? 第三,在发布另一个问题之前,请阅读如何制作出色的R可重现示例? .

Best of luck. 祝你好运。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM