R version 3.2.2 on Ubuntu 14.04
I am trying to read in R .csv-data (two columns: "id" and "variable1") containing the thousand separator ",". So far no problem. I am using read.csv2 and the data looks like that:
> data <- read.csv2("data.csv", sep = ";", stringsAsFactors = FALSE, dec = ".")
> data[1000:1010, ]
id variable1
1 2,001
1,001 2,002
1,002 2,001
1,003 2,002
1,004 2,001
1,005 2,002
1,006 2,001
1,007 2,002
1,008 2,001
1,009 2,002
1,01 2,001
After that first I tried to use gsub() to remove the commas:
data[, c("id", "variable1")] <- sapply(data[, c("id", "variable1")],
function(x) {as.numeric(gsub("\\,","", as.character(x)))})
> data[1000:1010, ]
id variable1
1 2001
1001 2002
1002 2001
1003 2002
1004 2001
1005 2002
1006 2001
1007 2002
1008 2001
1009 2002
101 2001
I think my problem is already obvious in the first output, because there is a thousand separator, but the "ending zeros" are missing. Like number "1000" is just displayed as "1" and "1010" as "1,01" for the "id"-variable in the data (also in the .csv-data). Of course, R can't identify this.
So my question is: Is there are way to tell R that every number must have three numbers after the thousand separator when reading in the data (or maybe after that), so that I have the correct numbers? The data should look like this:
> data[1000:1010, ]
id variable1
1000 2001
1001 2002
1002 2001
1003 2002
1004 2001
1005 2002
1006 2001
1007 2002
1008 2001
1009 2002
1010 2001
Edit: Thanks you all for your answers. Unfortunately the suggestions will work for this example but not for my data, because I think I chose bad example rows. Other rows in the data can look like this:
id1 variable1
1 1 2,001
999 999 1,102
1000 1 2,001
1001 1,001 2,002
1002 1,002 2,001
Of course, there is twice the number "1". The first is really a "1", but the second should be a "1000". But now I think I can't solve my problem with R. Maybe I need a better export of the original data, because the problem appears also in the .csv data.
删除逗号后,您可以执行以下操作:
data$id <- data$id*(10^(4-nchar(data$id)))
If "," is the only separator, ie all of the numbers are integers, you can set the dec
argument of csv2
(or read.csv
) to "," and multiply by 1000:
data <- read.csv2(
text = "id ; variable1
1 ; 2,001
1,008 ; 2,001
1,009 ; 2,002
1,01 ; 2,001
1,3 ; 2,0",
sep = ";",
stringsAsFactors = FALSE,
header = TRUE,
dec = "," )
.
> 1000*data
id variable1
1 1000 2001
2 1008 2001
3 1009 2002
4 1010 2001
5 1300 2000
>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.