[英]Importing xlsx data to R when numbers have a comma as decimal separator
.xlsx
file into R so that numbers are represented as numbers, when their original decimal separator is comma not a dot?如何将.xlsx
文件中的数据导入 R 以便数字表示为数字,而它们的原始小数分隔符是逗号而不是点? The only package I know of, when dealing with excel is readxl
from tidyverse
.我知道的唯一 package 在处理 excel 时是来自readxl
的tidyverse
。
I'm looking for a solution that won't need opening and editing excel files in any other software (and can deal with hundreds of columns to import) - if that would be possible I'd export all excels to .csv
and import them using tools I know of, that can take the dec=
argument.我正在寻找一种不需要在任何其他软件中打开和编辑 excel 文件(并且可以处理要导入的数百列)的解决方案 - 如果可能的话,我会将所有 excel 导出到.csv
并导入它们使用我知道的可以使用dec=
参数的工具。
So far my best working solution is to import numbers as characters and then transform it:到目前为止,我最好的工作解决方案是将数字作为字符导入,然后对其进行转换:
library(dplyr)
library(stringr)
var1<- c("2,1", "3,2", "4,5")
var2<- c("1,2", "3,33", "5,55")
var3<- c("3,44", "2,2", "8,88")
df<- data.frame(cbind(var1, var2, var3))
df %>%
mutate_at(vars(contains("var")),
str_replace,
pattern = ",",
replacement = "\\.") %>%
mutate_at(vars(contains("var")), funs(as.numeric))
I suspect strongly that there is some other reason these columns are being read as character, most likely that they are the dreaded "Number Stored as Text".我强烈怀疑这些列被读取为字符还有其他一些原因,很可能它们是可怕的“数字存储为文本”。
For ordinary numbers (stored as numbers), after switching to comma as decimal separator either for an individual file or in the overall system settings, readxl::read_excel
reads in a numeric properly.对于普通数字(存储为数字),在单个文件或整个系统设置中切换为逗号作为小数分隔符后, readxl::read_excel
会正确读取数字。 (This is on my Windows system.) Even when adding a character to one of the cells in that column or setting col_types="text"
, I get the number read in using a period as decimal, not as comma, giving more evidence that readxl
is using the internally stored data type. (这是在我的 Windows 系统上。)即使将字符添加到该列中的一个单元格或设置col_types="text"
,我也会使用句点作为十进制而不是逗号来读取数字,从而提供更多证据表明readxl
正在使用内部存储的数据类型。
The only way I have gotten R to read in a comma as a decimal is when the data is stored in Excel as text instead of as numeric.我让 R 以逗号作为小数读取的唯一方法是,当数据作为文本而不是数字存储在 Excel 中时。 (You can enter this by prefacing the number with a single quote, like '1,7
.) I then get a little green triangle in the corner of the cell, which gives the popup warning "Number Stored as Text". (您可以通过在数字前加上单引号来输入此内容,例如'1,7
。)然后我在单元格的一角得到一个绿色的小三角形,它会弹出警告“数字存储为文本”。 In my exploration, I was surprised to discover that Excel will do calculations on numbers stored as text, so that's not a valid way of checking for this.在我的探索中,我惊讶地发现 Excel 将对存储为文本的数字进行计算,因此这不是一种有效的检查方式。
It's pretty easy to replace the "," with a "."用“。”替换“,”非常容易。 and recast the column as numeric.并将列重铸为数字。 Example:例子:
> x <- c('1,00','2,00','3,00')
> df <- data.frame(x)
> df
x
1 1,00
2 2,00
3 3,00
> df$x <- gsub(',','.',df$x)
> df$x <- as.numeric(df$x)
> df
x
1 1
2 2
3 3
> class(df$x)
[1] "numeric"
>
Just using base R and gsub.只需使用基础 R 和 gsub。
I just had the same problem while dealing with an Excel spreadsheet I had received from a colleague.我在处理从同事那里收到的 Excel 电子表格时遇到了同样的问题。 After I had tried to import the file using readxl
(which failed), I converted the file into a csv
file hoping to solve the problem using read_delim
and fiddling with the locale and decimal sign options.在我尝试使用readxl
导入文件(失败)后,我将文件转换为csv
文件,希望使用read_delim
解决问题并摆弄语言环境和十进制符号选项。 But the problem was still there, no matter which options I used.但无论我使用哪种选项,问题仍然存在。
Here is the solution that worked for me: I found out that the characters that were used in the cells containing the missing values ( .
in my case) were causing trouble.这是对我有用的解决方案:我发现在包含缺失值的单元格中使用的字符(在我的例子中是.
)造成了麻烦。 I went back to the Excel file, replaced .
我回到 Excel 文件,替换为.
in all cells with missing values with blanks while just keeping the default option for the decimals ( ,
).在所有带有空格的缺失值的单元格中,同时保留小数的默认选项( ,
)。 After that, all columns were imported correctly as numeric using readxl
.之后,所有列都使用readxl
正确导入为数字。
If you should face this problem with your decimals set to .
如果您将小数设置为.
make sure to tick the box saying "Match entire cell contents" in Excel before replacing all instances of the missing values .
在替换所有缺失值的实例之前,请务必在 Excel 中勾选“匹配整个单元格内容”框.
Using the readxl
package you can specify the decimal_mark
with locale
:使用readxl
package 您可以使用locale
指定decimal_mark
:
library(readxl)
read_excel("excel_file.xlsx", locale=locale(decimal_mark = ","))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.