如何读取包含千位分隔符和零的特殊处理（在 R 中）的 .csv 数据？

Question

R version 3.2.2 on Ubuntu 14.04 Ubuntu 14.04 上的 R 版本 3.2.2

I am trying to read in R .csv-data (two columns: "id" and "variable1") containing the thousand separator ",".我正在尝试读取包含千位分隔符“，”的 R .csv 数据（两列：“id”和“variable1”）。 So far no problem.到目前为止没有问题。 I am using read.csv2 and the data looks like that:我正在使用 read.csv2，数据如下所示：

> data <- read.csv2("data.csv", sep = ";", stringsAsFactors = FALSE, dec = ".")
> data[1000:1010, ]
     id        variable1
         1     2,001
     1,001     2,002
     1,002     2,001
     1,003     2,002
     1,004     2,001
     1,005     2,002
     1,006     2,001
     1,007     2,002
     1,008     2,001
     1,009     2,002
      1,01     2,001

After that first I tried to use gsub() to remove the commas:在那之后，我首先尝试使用 gsub() 删除逗号：

data[, c("id", "variable1")] <- sapply(data[, c("id", "variable1")],
          function(x) {as.numeric(gsub("\\,","", as.character(x)))})
> data[1000:1010, ]
     id      variable1
        1      2001
     1001      2002
     1002      2001
     1003      2002
     1004      2001
     1005      2002
     1006      2001
     1007      2002
     1008      2001
     1009      2002
      101      2001

I think my problem is already obvious in the first output, because there is a thousand separator, but the "ending zeros" are missing.我认为我的问题在第一个输出中已经很明显了，因为有千位分隔符，但是缺少“结束零”。 Like number "1000" is just displayed as "1" and "1010" as "1,01" for the "id"-variable in the data (also in the .csv-data).对于数据中的“id”变量（也在 .csv 数据中），数字“1000”仅显示为“1”，而“1010”显示为“1,01”。 Of course, R can't identify this.当然，R 无法识别这一点。

So my question is: Is there are way to tell R that every number must have three numbers after the thousand separator when reading in the data (or maybe after that), so that I have the correct numbers?所以我的问题是：有没有办法告诉 R 在读取数据时（或者可能在那之后）每个数字在千位分隔符之后必须有三个数字，以便我有正确的数字？ The data should look like this:数据应如下所示：

> data[1000:1010, ]
     id      variable1
     1000      2001
     1001      2002
     1002      2001
     1003      2002
     1004      2001
     1005      2002
     1006      2001
     1007      2002
     1008      2001
     1009      2002
     1010      2001

Edit: Thanks you all for your answers.编辑：谢谢大家的回答。 Unfortunately the suggestions will work for this example but not for my data, because I think I chose bad example rows.不幸的是，这些建议适用于这个示例，但不适用于我的数据，因为我认为我选择了错误的示例行。 Other rows in the data can look like this:数据中的其他行可能如下所示：

       id1 variable1
1        1     2,001
999    999     1,102
1000     1     2,001
1001 1,001     2,002
1002 1,002     2,001

Of course, there is twice the number "1".当然，有两倍的数字“1”。 The first is really a "1", but the second should be a "1000".第一个确实是“1”，但第二个应该是“1000”。 But now I think I can't solve my problem with R. Maybe I need a better export of the original data, because the problem appears also in the .csv data.但现在我认为我无法用 R 解决我的问题。也许我需要更好地导出原始数据，因为问题也出现在 .csv 数据中。

Answer 1

删除逗号后，您可以执行以下操作：

data$id <- data$id*(10^(4-nchar(data$id)))

Answer 2

If "," is the only separator, ie all of the numbers are integers, you can set the dec argument of csv2 (or read.csv ) to "," and multiply by 1000:如果 "," 是唯一的分隔符，即所有数字都是整数，您可以将csv2 （或read.csv ）的dec参数设置为 "," 并乘以 1000：

data <- read.csv2(
  text = "id    ; variable1
          1     ; 2,001
          1,008 ; 2,001
          1,009 ; 2,002
          1,01  ; 2,001
          1,3   ; 2,0",
  sep = ";",
  stringsAsFactors = FALSE,
  header = TRUE,
  dec = "," )

. .

> 1000*data
    id variable1
1 1000      2001
2 1008      2001
3 1009      2002
4 1010      2001
5 1300      2000
>

如何读取包含千位分隔符和零的特殊处理（在 R 中）的 .csv 数据？

问题描述

2 个解决方案

解决方案1
0 2015-11-02 09:40:15

解决方案2
0 已采纳 2015-11-02 10:33:05

如何读取包含千位分隔符和零的特殊处理（在 R 中）的 .csv 数据？

问题描述

2 个解决方案

解决方案1 0 2015-11-02 09:40:15

解决方案2 0 已采纳 2015-11-02 10:33:05

解决方案1
0 2015-11-02 09:40:15

解决方案2
0 已采纳 2015-11-02 10:33:05