简体   繁体   English

在闪亮的应用程序上向DataFrame添加新列

[英]Add new columns to a DataFrame on shiny apps

I'm working on a simple ML model on R using shiny apps, the structure of the app will be: 我正在使用有光泽的应用程序在R上构建一个简单的ML模型,该应用程序的结构为:

1) Load the data from a local file 2) Train a model with the loaded data 3) Plot the results 1)从本地文件加载数据2)使用加载的数据训练模型3)绘制结果

The my problem is at stage 2 , I'm able to plot the input data with this code: 我的问题是在第2阶段 ,我可以使用以下代码绘制输入数据:

  output$plot1 <- renderPlot({
    ggplot(mydata(), aes(x=LotArea, y=SalePrice)) + geom_point()
  })

But since the predicted values are not in the original DF, I need to add them first. 但是由于预测值不在原始DF中,因此我需要先添加它们。

The code I'm using for that is: 我为此使用的代码是:

  obsB <- reactive({


    set.seed(0) 
    xgb_model = train(
      mydata()["LotArea"], as.vector(t(mydata()["SalePrice"])),  
      trControl = xgb_trcontrol,
      tuneGrid = xgbGrid,
      method = "xgbTree"
    )

    predicted = predict(xgb_model, mydata()["LotArea"])
    mydata()["predicted"] = predicted


  })

This is the error I'm getting: 这是我得到的错误:

Warning: Error in FUN: object 'predicted' not found

This happends when I change "LotArea" for "predicted" 当我将“ LotArea”更改为“ predicted”时,就会发生这种情况

  output$plot1 <- renderPlot({
    ggplot(mydata(), aes(x=predicted, y=SalePrice)) + geom_point()
  })

This is the complete code I have: 这是我的完整代码:

library(shiny)
library(readxl)
library(tidyverse)
library(xgboost)
library(caret)
library(iml)


#### UI


ui <- fluidPage(
  sidebarLayout(
    sidebarPanel(
      fileInput("file1", "Choose CSV File",
                accept = c(
                  "text/csv",
                  "text/comma-separated-values,text/plain",
                  ".csv")
      ),
      tags$hr(),
      checkboxInput("header", "Header", TRUE)
    ),
    mainPanel(
      #tableOutput("contents"),
      plotOutput("plot1", click = "plot_brush")
    )
  )
)

server <- function(input, output) {
  mydata <- reactive({
    req(input$file1, input$header, file.exists(input$file1$datapath))
    read.csv(input$file1$datapath, header = input$header)
  })


  output$contents <- renderTable({
    req(mydata())
    #mydata()
  })


  ### test
  xgb_trcontrol = trainControl(
    method = "cv",
    number = 5,  
    allowParallel = TRUE,
    verboseIter = FALSE,
    returnData = FALSE
  )


  #I am specifing the same parameters with the same values as I did for Python above. The hyperparameters to optimize are found in the website.
  xgbGrid <- expand.grid(nrounds = c(10,14),  # this is n_estimators in the python code above
                         max_depth = c(10, 15, 20, 25),
                         colsample_bytree = seq(0.5, 0.9, length.out = 5),
                         ## The values below are default values in the sklearn-api. 
                         eta = 0.1,
                         gamma=0,
                         min_child_weight = 1,
                         subsample = 1
  )




  obsB <- reactive({



    set.seed(0) 
    xgb_model = train(
      mydata()["LotArea"], as.vector(t(mydata()["SalePrice"])),  
      trControl = xgb_trcontrol,
      tuneGrid = xgbGrid,
      method = "xgbTree"
    )

    predicted = predict(xgb_model, mydata()["LotArea"])
    mydata()["predicted"] = predicted


  })

  output$plot1 <- renderPlot({
    ggplot(mydata(), aes(x=predicted, y=SalePrice)) + geom_point()
  })




  }

shinyApp(ui, server)

EDIT: 编辑:

I changed: 我变了:

mydata()["predicted"] = predicted

for: 对于:

data =  mydata()
data["predicted"] = predicted

But Know I get a different error: 但是知道我得到了另一个错误:

Warning: Error in : You're passing a function as global data.
Have you misspelled the `data` argument in `ggplot()

EDIT 2: This is a sample of the data I'm using: 编辑2:这是我正在使用的数据的示例:

https://drive.google.com/file/d/1R8GA0fW0pOgG8Cpykc8mAThvKOCRCVl0/view?usp=sharing https://drive.google.com/file/d/1R8GA0fW0pOgG8Cpykc8mAThvKOCRCVl0/view?usp=sharing

You cannot update a reactive value using that syntax. 您不能使用该语法更新反应式值。

Your problems are: 您的问题是:

  • If you create a reactive value using the value = reactive({...}) you cannot change its value outside that code block 如果使用value = reactive({...})创建无功值,则无法在该代码块之外更改其值
  • If you want to be able to change the value of a reactive element in more than one place within the code you need to use reactiveVal or reactiveValues functions to create the variables. 如果要在代码中的多个位置更改反应性元素的值,则需要使用reactiveValreactiveValues函数来创建变量。 Variables created in this manner can be changed using the variableName(newValue) syntax. 可以使用variableName(newValue)语法更改以此方式创建的variableName(newValue)

For instance 例如

# we are somewhere inside the server code
mydata = reactiveVal()

observe({
    req(input$file1, input$header, file.exists(input$file1$datapath))
    data = read.csv(input$file1$datapath, header = input$header)
    mydata(data)
})

obsB <- reactive({


    set.seed(0) 
    xgb_model = train(
      mydata()["LotArea"], as.vector(t(mydata()["SalePrice"])),  
      trControl = xgb_trcontrol,
      tuneGrid = xgbGrid,
      method = "xgbTree"
    )

    predicted = predict(xgb_model, mydata()["LotArea"])
    newData = mydata()
    newData['predicted'] = predicted
    mydata(newData)
  })
  • Otherwise you need to combine everything that changes mydata in that single code block. 否则,您需要在单个代码块中组合所有更改mydata

Note that I suspect in the code above there may be a loop due to you updating mydata in a code block that depends on mydata . 请注意,我怀疑上面的代码中可能存在循环,这是由于您在依赖于mydata的代码块中更新了mydata所致。 Couldn't test it since I don't have sample data but you may have to play around with isolate or have another trigger that is not mydata to make it work (eg. triggers of the mydata ) 因为我没有样本数据,所以无法测试它,但是您可能不得不进行isolate或者使用另一个不是mydata触发器来使其正常工作(例如, mydata触发器)

As a side note, if your app requires data to function, it is better for us if you provide a sample data. 附带说明一下,如果您的应用程序需要数据才能运行,那么提供示例数据对我们更好。 I can't test this as I can't easily guess what the input should look like. 我无法对此进行测试,因为无法轻易猜测输入的外观。 Also it is often better to use new code specifically written for the question that isolates the issue rather than paste your actual project here since you can get rid of the data and package dependencies and there won't be distractions that are not related to the problem 另外,最好使用专门为该问题编写的新代码来隔离问题,而不是将实际项目粘贴到此处,因为这样可以消除数据和程序包的依赖性,并且不会产生与问题无关的干扰

Since your code other issues that is not related to the question, here's an annotated and fixed version 由于您的代码存在其他与问题无关的问题,因此这里有一个带注释的固定版本

library(shiny)
library(readxl)
library(tidyverse)
library(xgboost)
library(caret)
library(iml)


#### UI


ui <- fluidPage(
    sidebarLayout(
        sidebarPanel(
            fileInput("file1", "Choose CSV File",
                      accept = c(
                          "text/csv",
                          "text/comma-separated-values,text/plain",
                          ".csv")
            ),
            tags$hr(),
            checkboxInput("header", "Header", TRUE)
        ),
        mainPanel(
            #tableOutput("contents"),
            plotOutput("plot1", click = "plot_brush")
        )
    )
)

server <- function(input, output) {
    # create mydata as a reactiveVal so that it can be edited everywhere
    mydata = reactiveVal()

    # reactive block is changed with an observe that allows mydata to be updated
    # on change of data
    observe({
        req(input$file1, input$header, file.exists(input$file1$datapath))
        data = read.csv(input$file1$datapath, header = input$header)
        mydata(data)
    })


    output$contents <- renderTable({
        req(mydata())
        #mydata()
    })


    ### test
    xgb_trcontrol = trainControl(
        method = "cv",
        number = 5,
        allowParallel = TRUE,
        verboseIter = FALSE,
        returnData = FALSE
    )


    #I am specifing the same parameters with the same values as I did for Python above. The hyperparameters to optimize are found in the website.
    xgbGrid <- expand.grid(nrounds = c(10,14),  # this is n_estimators in the python code above
                           max_depth = c(10, 15, 20, 25),
                           colsample_bytree = seq(0.5, 0.9, length.out = 5),
                           ## The values below are default values in the sklearn-api.
                           eta = 0.1,
                           gamma=0,
                           min_child_weight = 1,
                           subsample = 1
    )



    # note that obsB reactive variable is gone. if you don't use a 
    # reactive variable, the code block will not be executed.
    # unlike observe blocks, reactive blocks are lazy and should
    # not be relied on for their side effects
    observe({
        # this if ensures you don't run this block before mydata isn't a data frame
        # also prevents it running after it updates mydata. otherwise this will
        # be executed twice. its an invisible problem that'll make it run half
        # as fast unless you debug.
        if ('data.frame' %in% class(mydata()) & !'predicted' %in% names(mydata())){
            set.seed(0)
            xgb_model = train(
                mydata()["LotArea"], as.vector(t(mydata()["SalePrice"])),
                trControl = xgb_trcontrol,
                tuneGrid = xgbGrid,
                method = "xgbTree"
            )

            predicted = predict(xgb_model, mydata()["LotArea"])
            data = mydata()
            data["predicted"] = predicted
            mydata(data)
        }





    })

    output$plot1 <- renderPlot({
        data = mydata()
        # this is here to prevent premature triggering of this ggplot.
        # otherwise you'll get the "object not found" error
        if('predicted' %in% names(data)){
            ggplot(mydata(), aes(x=predicted, y=SalePrice)) + geom_point()
        }
    })




}

shinyApp(ui, server)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM