i have matrices given by folloing way:
m <- as.matrix(rbind(c("State", "Murder", "Assault", "UrbanPop", "Rape", "Group"),
c("Alabama", 13.2, 236, 58, 21.2, "A"),
c("Alaska", 10.0, 263, 48, 44.5, "A"),
c("Arizona", 8.1, 294, 80, 31.0, "A"),
c("Arkansas", 8.8, 190, 50, 19.5, "A"),
c("California", 9.0, 276, 91, 40.6, "A"),
c("Colorado", 7.9, 204, 78, 38.7, "A"),
c("Connecticut", 3.3, 110, 77, 11.1, "A"),
c("Delaware", 5.9, 238, 72, 15.8, "A"),
c("Florida", 15.4, 335, 80, 31.9, "A"),
c("Georgia", 17.4, 211, 60, 25.8, "A"),
c("Hawaii", 5.3, 46, 83, 20.2, "A"),
c("Idaho", 2.6, 120, 54, 14.2, "A"),
c("Illinois", 10.4, 249, 83, 24.0, "A"),
c("Indiana", 7.2, 113, 65, 21.0, "A"),
c("Iowa", 2.2, 56, 57, 11.3, "A"),
c("Kansas", 6.0, 115, 66, 18.0, "A"),
c("Kentucky", 9.7, 109, 52, 16.3, "A"),
c("Louisiana", 15.4, 249, 66, 22.2, "A"),
c("Maine", 2.1, 83, 51, 7.8, "B"),
c("Maryland", 11.3, 300, 67, 27.8, "B"),
c("Massachusetts", 4.4, 149, 85, 16.3, "B"),
c("Michigan", 12.1, 255, 74, 35.1, "B"),
c("Minnesota", 2.7, 72, 66, 14.9, "B"),
c("Mississippi", 16.1, 259, 44, 17.1, "B"),
c("Missouri", 9.0, 178, 70, 28.2, "B"),
c("Montana", 6.0, 109, 53, 16.4, "B"),
c("Nebraska", 4.3, 102, 62, 16.5, "C"),
c("Nevada", 12.2, 252, 81, 46.0, "C"),
c("New_Hampshire", 2.1, 57, 56, 9.5, "C"),
c("New_Jersey", 7.4, 159, 89, 18.8, "C"),
c("New_Mexico", 11.4, 285, 70, 32.1, "C"),
c("New_York", 11.1, 254, 86, 26.1, "C"),
c("North_Carolina", 13.0, 337, 45, 16.1, "C"),
c("North_Dakota", 0.8, 45, 44, 7.3, "C"),
c("Ohio", 7.3, 120, 75, 21.4, "D"),
c("Oklahoma", 6.6, 151, 68, 20.0, "D"),
c("Oregon", 4.9, 159, 67, 29.3, "D"),
c("Pennsylvania", 6.3, 106, 72, 14.9, "D"),
c("Rhode_Island", 3.4, 174, 87, 8.3, "D"),
c("South_Carolina", 14.4, 279, 48, 22.5, "D"),
c("South_Dakota", 3.8, 86, 45, 12.8, "D"),
c("Tennessee", 13.2, 188, 59, 26.9, "D"),
c("Texas", 12.7, 201, 80, 25.5, "D"),
c("Utah", 3.2, 120, 80, 22.9, "D"),
c("Vermont", 2.2, 48, 32, 11.2, "D"),
c("Virginia", 8.5, 156, 63, 20.7, "D"),
c("Washington", 4.0, 145, 73, 26.2, "D"),
c("West_Virginia", 5.7, 81, 39, 9.3, "D"),
c("Wisconsin", 2.6, 53, 66, 10.8, "D"),
c("Wyoming", 6.8, 161, 60, 15.6, "D")))
i need to convert this into data.frame (or table) with preserving column and rownames, numericity of numbers and convert anything else (in this example column 'Group') into factors. (Data are'nt always in this format, so code has to be general.)
(Optional step is then to remove one column by given name, that's the reason for using data.frame, as it is very easy to do.)
Then, resulting data.frame (or table, or matrix) is passed into 'scale' function.
My solution consists of several steps:
data <- m[-1,-1]
colnames(data) <- m[1,-1]
rownames(data) <- m[-1,1][m[-1,1]!='']
data <- as.data.frame(data)
now i have data.frame, but it cannot be passed into scale() function ("Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric"). If i use data.matrix(data) function, factors are integered fine, but all doubles are converted into integers too. I am stuck at this for hours.
Thank you in advance
I'll move this to an answer, as it seems not working via comments. You can do the following
data <- data.frame(lapply(data.frame(m[-1,-1], stringsAsFactors = FALSE), type.convert))
Which will convert all the columns of the matrix to the correct formats
str(data)
# 'data.frame': 50 obs. of 5 variables:
# $ X1: num 13.2 10 8.1 8.8 9 7.9 3.3 5.9 15.4 17.4 ...
# $ X2: int 236 263 294 190 276 204 110 238 335 211 ...
# $ X3: int 58 48 80 50 91 78 77 72 80 60 ...
# $ X4: num 21.2 44.5 31 19.5 40.6 38.7 11.1 15.8 31.9 25.8 ...
# $ X5: Factor w/ 4 levels "A","B","C","D": 1 1 1 1 1 1 1 1 1 1 ...
Then, you can set your column/row names as you wish
colnames(data) <- m[1,-1]
rownames(data) <- m[-1,1][m[-1,1]!='']
For scale
you can do
scale(data[-5])
Edit per OPs comment .
As I already said several times, using data.matrix
on factor
s is simply wrong and it will completely mess up your data. Consider the following example
data.matrix(data.frame(A = factor(c("A", "B")),
B = factor(10:11),
C = factor(c("22-11-2014", "23-11-2014"))))
# A B C
# [1,] 1 1 1
# [2,] 2 2 2
data.matrix
returned identical results for these completely different values.
Now back to your real data, If you want to avoid running scale
on factors and you apriori don't know which columns are factors, you can simply create an index which will identify numeric columns and then run scale
only on them, for example
indx <- sapply(data, is.numeric)
scale(data[indx])
Read it as data.frame
and do this later
m = data.frame(rbind.... you data here as above)
rownames(m) = m$X1
colnames(m) = c(t(m[1,]))
req.df = m[-1,-1]
Below is a quick trial that can preserve numeric and factor types.
# convert into data frame
df <- as.data.frame(m[2:nrow(m), 2:ncol(m)], stringsAsFactors = FALSE)
# set names
names(df) <- m[1, 2:ncol(m)]
rownames(df) <- m[2:nrow(m), 1]
# convert types into numeric or factor
df[] <- lapply(df, function(x) if(is.na(as.numeric(x[1]))) as.factor(x) else as.numeric(x))
str(df)
'data.frame': 50 obs. of 5 variables:
$ Murder : num 13.2 10 8.1 8.8 9 7.9 3.3 5.9 15.4 17.4 ...
$ Assault : num 236 263 294 190 276 204 110 238 335 211 ...
$ UrbanPop: num 58 48 80 50 91 78 77 72 80 60 ...
$ Rape : num 21.2 44.5 31 19.5 40.6 38.7 11.1 15.8 31.9 25.8 ...
$ Group : Factor w/ 4 levels "A","B","C","D": 1 1 1 1 1 1 1 1 1 1 ...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.