[英]R Shiny: Creating factor variables and defining levels
我試圖用 Shiny 創建一個機器學習應用程序。
在這個應用程序中,用戶可以選擇輸入變量的規格(通過輸入小部件),這些參數將用於給出響應變量的估計。 為此,我從選定的輸入創建了一個數據框,並將其保存在本地作為數據表。
當我再次加載數據表時出現問題,因為這將導致所有分類變量成為字符變量。 但是,我可以使用factor()
函數手動更改這些並使用levels=
規范。
問題是我不想每次使用新數據集時都手動執行此操作,因為這很可能會更改數據集中分類變量的位置。 新數據集中的分類變量也很可能不會相同。
數據框"DATA"
是主數據集,其中包含第 1 列中的響應變量。
數據框"test"
是根據所選輸入構建的數據框,它將用作預測的測試集,並且僅包含 1 個指定的觀察。 由於數據幀的構造方式,此數據幀將始終將響應變量作為數據幀中的最后一列。 因此DATA[ ,5]
的因子變量將始終對應於測試數據幀中的前一列: test[ ,4]
。
這是需要指定因子水平的測試數據框,因為當它只包含 1 個觀察時,它不會自動知道類別的數量。
test[4] <- factor(test[4], levels = unique(DATA[,5]))
我正在嘗試編寫一個代碼,該代碼將對數據集中的所有字符變量應用factor
函數並指定levels
而不管字符變量在數據集中的位置。
這是我到目前為止編寫的代碼:
library(shiny)
library(tidyverse)
library(shinythemes)
library(data.table)
library(RCurl)
library(randomForest)
library(mlbench)
library(janitor)
# Read data
DATA <- BostonHousing
# Rearrange data so the response variable is located in column 1
DATA <- DATA[,c(names(BostonHousing)[14],names(BostonHousing)[-14])]
# Creating a simple RF model
model <- randomForest(medv ~ ., data = DATA, ntree = 500, mtry = 4, importance = TRUE)
# UI -------------------------------------------------------------------------
ui <- fluidPage(
sidebarPanel(
h3("Parameters Selected"),
br(),
tableOutput('show_inputs'),
hr(),
actionButton("submitbutton", label = "calculate", class = "btn btn-primary", icon("calculator")),
hr(),
tableOutput("tabledata")
), # End sidebarPanel
mainPanel(
h3("Input widgets"),
uiOutput("select")
) # End mainPanel
) # End UI bracket
# Server -------------------------------------------------------------------------
server <- function(input, output, session) {
# Create input widgets from dataset
output$select <- renderUI({
df <- req(DATA)
tagList(map(
names(df[-1]),
~ ifelse(is.numeric(df[[.]]),
yes = tagList(sliderInput(
inputId = paste0(.),
label = .,
value = mean(df[[.]], na.rm = TRUE),
min = round(min(df[[.]], na.rm = TRUE),2),
max = round(max(df[[.]], na.rm = TRUE),2)
)),
no = tagList(selectInput(
inputId = paste0(.),
label = .,
choices = sort(unique(df[[.]])),
selected = sort(unique(df[[.]]))[1],
))
)
))
})
# creating dataframe of selected values to be displayed
AllInputs <- reactive({
id_exclude <- c("savebutton","submitbutton")
id_include <- setdiff(names(input), id_exclude)
if (length(id_include) > 0) {
myvalues <- NULL
for(i in id_include) {
myvalues <- as.data.frame(rbind(myvalues, cbind(i, input[[i]])))
}
names(myvalues) <- c("Variable", "Selected Value")
myvalues %>%
slice(match(names(DATA[,-1]), Variable))
}
})
# render table of selected values to be displayed
output$show_inputs <- renderTable({
AllInputs()
})
# Creating a dataframe for calculating a prediction
datasetInput <- reactive({
df1 <- data.frame(AllInputs(), stringsAsFactors = FALSE)
input <- transpose(rbind(df1, names(DATA[1])))
write.table(input,"input.csv", sep=",", quote = FALSE, row.names = FALSE, col.names = FALSE)
test <- read.csv(paste("input.csv", sep=""), header = TRUE)
# defining factor levels for factor variables
test[4] <- factor(test[4], levels = unique(DATA[,5])) # <- This line will cause problems if multiple factors in dataset or if different column location
# Making the actual prediction and store it in a data.frame
Prediction <- predict(model,test)
Output <- data.frame("Prediction"=Prediction)
print(format(Output, nsmall=2, big.mark=","))
})
# display the prediction when the submit button is pressed
output$tabledata <- renderTable({
if (input$submitbutton>0) {
isolate(datasetInput())
}
})
} # End server bracket
# ShinyApp -------------------------------------------------------------------------
shinyApp(ui, server)
要概括因子變量,您可以使用以下代碼:
# defining factor levels for factor variables
#test[4] <- factor(test[4], levels = unique(DATA[,5])) # <- This line will cause problems if multiple factors in dataset or if different column location
cnames <- colnames(DATA[sapply(DATA,class)=="factor"])
if (length(cnames)>0){
lapply(cnames, function(par) {
test[par] <<- factor(test[par], levels = unique(DATA[,par]))
})
}
您可以將其應用於 BostonHousing2 數據,如下所示
# Read data
BH <- BostonHousing2
DATA <- BH
# Rearrange data so the response variable is located in column 1
#DATA <- DATA[,c(names(BH)[14],names(BH)[-14])]
DATA <- DATA[,c(names(BH)[5],names(BH)[-5])] ## for BostonHousing2
# Creating a simple RF model
model <- randomForest(medv ~ ., data = DATA[,-2], ntree = 500, mtry = 4, importance = TRUE)
# UI -------------------------------------------------------------------------
ui <- fluidPage(
sidebarPanel(
h3("Parameters Selected"),
br(),
tableOutput('show_inputs'),
hr(),
actionButton("submitbutton", label = "calculate", class = "btn btn-primary", icon("calculator")),
hr(),
tableOutput("tabledata")
), # End sidebarPanel
mainPanel(
h3("Input widgets"),
uiOutput("select")
) # End mainPanel
) # End UI bracket
# Server -------------------------------------------------------------------------
server <- function(input, output, session) {
# Create input widgets from dataset
output$select <- renderUI({
df <- req(DATA)
tagList(map(
names(df[-1]),
~ ifelse(is.numeric(df[[.]]),
yes = tagList(sliderInput(
inputId = paste0(.),
label = .,
value = mean(df[[.]], na.rm = TRUE),
min = round(min(df[[.]], na.rm = TRUE),2),
max = round(max(df[[.]], na.rm = TRUE),2)
)),
no = tagList(selectInput(
inputId = paste0(.),
label = .,
choices = sort(unique(df[[.]])),
selected = sort(unique(df[[.]]))[1],
))
)
))
})
# creating dataframe of selected values to be displayed
AllInputs <- reactive({
id_exclude <- c("savebutton","submitbutton")
id_include <- setdiff(names(input), id_exclude)
if (length(id_include) > 0) {
myvalues <- NULL
for(i in id_include) {
myvalues <- as.data.frame(rbind(myvalues, cbind(i, input[[i]])))
}
names(myvalues) <- c("Variable", "Selected Value")
myvalues %>%
slice(match(names(DATA[,-1]), Variable))
}
})
# render table of selected values to be displayed
output$show_inputs <- renderTable({
AllInputs()
})
# Creating a dataframe for calculating a prediction
datasetInput <- reactive({
df1 <- data.frame(AllInputs(), stringsAsFactors = FALSE)
input <- transpose(rbind(df1, names(DATA[1])))
write.table(input,"input.csv", sep=",", quote = FALSE, row.names = FALSE, col.names = FALSE)
test <- read.csv(paste("input.csv", sep=""), header = TRUE)
# defining factor levels for factor variables
#test[4] <- factor(test[4], levels = unique(DATA[,5])) # <- This line will cause problems if multiple factors in dataset or if different column location
cnames <- colnames(DATA[sapply(DATA,class)=="factor"])
if (length(cnames)>0){
lapply(cnames, function(par) {
test[par] <<- factor(test[par], levels = unique(DATA[,par]))
})
}
# Making the actual prediction and store it in a data.frame
Prediction <- predict(model,test)
Output <- data.frame("Prediction"=Prediction)
print(format(Output, nsmall=2, big.mark=","))
})
# display the prediction when the submit button is pressed
output$tabledata <- renderTable({
if (input$submitbutton>0) {
isolate(datasetInput())
}
})
} # End server bracket
# ShinyApp -------------------------------------------------------------------------
shinyApp(ui, server)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.