簡體   English   中英

R(交叉表的意外輸出)

[英]R (Unexpected output from crosstable)

我將交叉表用於kNN模型,但輸出未按預期顯示。 它向我展示了一堆數字,而不是預測模型。 (我會添加一張圖片,但我需要10個聲望點)。 我想要一個具有清晰輸出的表。

#I'm setting working directory folder
setwd("F:/Level 5/CT5018 - Data Analytics/My project/Official Dataset - Adult")

#start calculating the time to run the code
k <-Sys.time()

#here I'm assigning adults to read the csv file
adults <- read.csv("Adults.csv", stringsAsFactors = FALSE)

#examine the structure of the adultsTr data frame
str(adults)

#drop the fnlwgt feature
adults <- adults[-3]

#table of sex
table(adults$Sex)

#recode Sex as a factor
adults$Sex <- factor(adults$Sex, levels = c("Female","Male"),
                       labels = c("Women", "Men"))

#table or proportions with more informative labels
round(prop.table(table(adults$Sex)) * 100, digits = 1)

#summarize all numeric features
summary(adults[c("Age", "Education.num", "Capital.gain", "Capital.loss", "Hours.per.week")])

#----------------------------------------------Min-Max normalisation-----------------------------    ------------------------------


#create normalization function
normalize <- function(x) {
  return ((x - min (x)) / (max(x) - min(x)))
}

#test normalization function - result should be identical
normalize(c(1, 2, 3, 4, 5))
normalize(c(10, 20, 30, 40, 50))

#normalize the adultsTr data
adultsN <- as.data.frame(lapply(adults[c("Age", "Education.num", "Capital.gain", "Capital.loss",   "Hours.per.week")], normalize))

#confirm that normalization worked
summary(adultsN$Age)

# create training and test data
adultsTrain <- adultsN[1:14999, ]
adultsTest <- adultsN[15000:19999, ]

# create labels for training and test data
adultsTrainLabels <- adults[1:14999, 1]
adultsTestLabels <- adults[15000:19999, 1]

#instaling package class
#install.packages("class")
library(class)

adultsTestPred <- knn(train = adultsTrain, test = adultsTest,
                      cl = adultsTrainLabels, k=122)

#installing package for cross tables
#install.packages("gmodels")
library(gmodels)

# Create the cross tabulation of predicted vs. actual
CrossTable(x = adultsTestLabels, y = adultsTestPred,
           prop.chisq=FALSE)

這就是向我展示的:

Cell Contents
|-------------------------|
|                       N |
|           N / Row Total |
|           N / Col Total |
|         N / Table Total |
|-------------------------|


Total Observations in Table:  5000 


| adultsTestPred 
adultsTestLabels |        17 |        18 |        19 |        20 |        21 |        22 |        23 |        24 |        25 |        26 |        27 |        28 |        29 |        30 |        31 |        32 |        33 |        34 |        35 |        36 |        37 |        38 |        39 |        40 |        41 |        42 |        43 |        44 |        45 |        46 |        47 |        48 |        49 |        50 |        51 |        52 |        53 |        54 |        55 |        56 |        57 |        58 |        59 |        60 |        61 |        62 |        63 |        64 |        65 |        66 |        67 |        68 |        69 |        71 |        72 |        73 |        76 |        77 |        90 | Row Total | 
-----------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
              17 |        65 |         6 |         1 |         1 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         1 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |        74 | 
                 |     0.878 |     0.081 |     0.014 |     0.014 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.014 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.015 | 
                 |     0.556 |     0.200 |     0.012 |     0.005 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.004 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 |     0.000 | 

--------------------------------------------
Where I actually want this:
-----------------------------------------------
   Cell Contents
|-------------------------|
|                       N |
|         N / Table Total |
|-------------------------|


Total Observations in Table:  2000 


| predicted Sex 

actual Sex |    Female |      Male | Row Total | 

-------------|-----------|-----------|-----------|
      Female |       514 |       161 |       675 | 
             |     0.257 |     0.080 |           | 
-------------|-----------|-----------|-----------|
        Male |       162 |      1163 |      1325 | 
             |     0.081 |     0.582 |           | 
-------------|-----------|-----------|-----------|
Column Total |       676 |      1324 |      2000 | 
-------------|-----------|-----------|-----------|

我遇到了同樣的問題。我犯的錯誤是將id和label列混合在一起。

我的數據框就像x = [Id,label,Feature 1,Feature 2 ....]我將標簽分配為x [1]而不是x [2]。 嘗試在標准化之前獲取標簽。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM