I have numbers declared as a text (aka nominal) in MS Access. These numbers represent simplified versions of what could have been long sentences as categories.
I currently tried exporting the file as csv
in three ways:
csv
function csv
function (from MS Access) The problem is in R when I try using summary(data)
and see that these number-nominal values are still interpreted as numeric even though the values are enclosed in double or single quotation marks. I am sure of it since I saw these variables given (in the summary
function) the mean, median, and others compared to the ones with characters that are shown with frequencies.
In the example below, both var1
and var2
are nominal wherein the latter is represented by numbers (note that the values of var2
results are changed for security).
var1 var2
Cat : 111 Min. :1
Dog : 222 1st Qu.:1
Bee : 333 Median :8
Yog : 555 Mean :10
Fig : 999 3rd Qu.:1
Kol : 444 Max. :15
(Other):2250
I've thought of appending a character to these number-nominal values (instead of 1, 2, 3, 4, 5
, I'll have 1a, 2a, 3a, 4a, 5a
) to ensure that these are interpreted as nominal but I am hoping for a new solution here before going to that arduous task.
read.table
and family have a colClasses
argument.
See the following examples to see the difference in the results when using different colClasses
:
text <- c("A,B,C", "1,2,3", "2,1,4")
read.csv
A <- read.csv(text = text)
str(A)
# 'data.frame': 2 obs. of 3 variables:
# $ A: int 1 2
# $ B: int 2 1
# $ C: int 3 4
summary(A)
# A B C
# Min. :1.00 Min. :1.00 Min. :3.00
# 1st Qu.:1.25 1st Qu.:1.25 1st Qu.:3.25
# Median :1.50 Median :1.50 Median :3.50
# Mean :1.50 Mean :1.50 Mean :3.50
# 3rd Qu.:1.75 3rd Qu.:1.75 3rd Qu.:3.75
# Max. :2.00 Max. :2.00 Max. :4.00
character
B <- read.csv(text = text, colClasses = "character")
str(B)
# 'data.frame': 2 obs. of 3 variables:
# $ A: chr "1" "2"
# $ B: chr "2" "1"
# $ C: chr "3" "4"
summary(B)
# A B C
# Length:2 Length:2 Length:2
# Class :character Class :character Class :character
# Mode :character Mode :character Mode :character
factor
C <- read.csv(text = text, colClasses = "factor")
str(C)
# 'data.frame': 2 obs. of 3 variables:
# $ A: Factor w/ 2 levels "1","2": 1 2
# $ B: Factor w/ 2 levels "1","2": 2 1
# $ C: Factor w/ 2 levels "3","4": 1 2
summary(C)
# A B C
# 1:1 1:1 3:1
# 2:1 2:1 4:1
The colClasses
argument accepts a vector
, so you can specify on a column-by-column basis what the values should be:
D <- read.csv(text = text1, colClasses = c("integer", "character", "factor"))
str(D)
# 'data.frame': 2 obs. of 3 variables:
# $ A: int 1 2
# $ B: chr "2" "1"
# $ C: Factor w/ 2 levels "3","4": 1 2
summary(D)
# A B C
# Min. :1.00 Length:2 3:1
# 1st Qu.:1.25 Class :character 4:1
# Median :1.50 Mode :character
# Mean :1.50
# 3rd Qu.:1.75
# Max. :2.00
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.