简体   繁体   English

当数字有逗号作为小数分隔符时,将 xlsx 数据导入 R

[英]Importing xlsx data to R when numbers have a comma as decimal separator

How can I import data from a .xlsx file into R so that numbers are represented as numbers, when their original decimal separator is comma not a dot?如何将.xlsx文件中的数据导入 R 以便数字表示为数字,而它们的原始小数分隔符是逗号而不是点?

The only package I know of, when dealing with excel is readxl from tidyverse .我知道的唯一 package 在处理 excel 时是来自readxltidyverse

I'm looking for a solution that won't need opening and editing excel files in any other software (and can deal with hundreds of columns to import) - if that would be possible I'd export all excels to .csv and import them using tools I know of, that can take the dec= argument.我正在寻找一种不需要在任何其他软件中打开和编辑 excel 文件(并且可以处理要导入的数百列)的解决方案 - 如果可能的话,我会将所有 excel 导出到.csv并导入它们使用我知道的可以使用dec=参数的工具。

So far my best working solution is to import numbers as characters and then transform it:到目前为止,我最好的工作解决方案是将数字作为字符导入,然后对其进行转换:

library(dplyr)
library(stringr)

var1<- c("2,1", "3,2", "4,5")
var2<- c("1,2", "3,33", "5,55")
var3<- c("3,44", "2,2", "8,88")
df<- data.frame(cbind(var1, var2, var3))

df %>%
      mutate_at(vars(contains("var")),
                str_replace,
                pattern = ",",
                replacement = "\\.") %>%
      mutate_at(vars(contains("var")), funs(as.numeric))

I suspect strongly that there is some other reason these columns are being read as character, most likely that they are the dreaded "Number Stored as Text".我强烈怀疑这些列被读取为字符还有其他一些原因,很可能它们是可怕的“数字存储为文本”。

For ordinary numbers (stored as numbers), after switching to comma as decimal separator either for an individual file or in the overall system settings, readxl::read_excel reads in a numeric properly.对于普通数字(存储为数字),在单个文件或整个系统设置中切换为逗号作为小数分隔符后, readxl::read_excel会正确读取数字。 (This is on my Windows system.) Even when adding a character to one of the cells in that column or setting col_types="text" , I get the number read in using a period as decimal, not as comma, giving more evidence that readxl is using the internally stored data type. (这是在我的 Windows 系统上。)即使将字符添加到该列中的一个单元格或设置col_types="text" ,我也会使用句点作为十进制而不是逗号来读取数字,从而提供更多证据表明readxl正在使用内部存储的数据类型。

The only way I have gotten R to read in a comma as a decimal is when the data is stored in Excel as text instead of as numeric.我让 R 以逗号作为小数读取的唯一方法是,当数据作为文本而不是数字存储在 Excel 中时。 (You can enter this by prefacing the number with a single quote, like '1,7 .) I then get a little green triangle in the corner of the cell, which gives the popup warning "Number Stored as Text". (您可以通过在数字前加上单引号来输入此内容,例如'1,7 。)然后我在单元格的一角得到一个绿色的小三角形,它会弹出警告“数字存储为文本”。 In my exploration, I was surprised to discover that Excel will do calculations on numbers stored as text, so that's not a valid way of checking for this.在我的探索中,我惊讶地发现 Excel 将对存储为文本的数字进行计算,因此这不是一种有效的检查方式。

It's pretty easy to replace the "," with a "."用“。”替换“,”非常容易。 and recast the column as numeric.并将列重铸为数字。 Example:例子:

> x <- c('1,00','2,00','3,00')
> df <- data.frame(x)
> df
     x
1 1,00
2 2,00
3 3,00
> df$x <- gsub(',','.',df$x)
> df$x <- as.numeric(df$x)
> df
  x
1 1
2 2
3 3
> class(df$x)
[1] "numeric"
> 

Just using base R and gsub.只需使用基础 R 和 gsub。

I just had the same problem while dealing with an Excel spreadsheet I had received from a colleague.我在处理从同事那里收到的 Excel 电子表格时遇到了同样的问题。 After I had tried to import the file using readxl (which failed), I converted the file into a csv file hoping to solve the problem using read_delim and fiddling with the locale and decimal sign options.在我尝试使用readxl导入文件(失败)后,我将文件转换为csv文件,希望使用read_delim解决问题并摆弄语言环境和十进制符号选项。 But the problem was still there, no matter which options I used.但无论我使用哪种选项,问题仍然存在。

Here is the solution that worked for me: I found out that the characters that were used in the cells containing the missing values ( . in my case) were causing trouble.这是对我有用的解决方案:我发现在包含缺失值的单元格中使用的字符(在我的例子中是. )造成了麻烦。 I went back to the Excel file, replaced .我回到 Excel 文件,替换为. in all cells with missing values with blanks while just keeping the default option for the decimals ( , ).在所有带有空格的缺失值的单元格中,同时保留小数的默认选项( , )。 After that, all columns were imported correctly as numeric using readxl .之后,所有列都使用readxl正确导入为数字。

If you should face this problem with your decimals set to .如果您将小数设置为. make sure to tick the box saying "Match entire cell contents" in Excel before replacing all instances of the missing values .在替换所有缺失值的实例之前,请务必在 Excel 中勾选“匹配整个单元格内容”框.

Using the readxl package you can specify the decimal_mark with locale :使用readxl package 您可以使用locale指定decimal_mark

library(readxl)
read_excel("excel_file.xlsx", locale=locale(decimal_mark = ","))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R中数字的逗号分隔符? - Comma separator for numbers in R? 正则表达式匹配以逗号作为分隔符的十进制数 - regex expression to match decimal numbers with comma as a separator 如何以逗号作为小数分隔符读取数字? - How to read in numbers with a comma as decimal separator? R将点(“。”)设置为千位分隔符,并将逗号(“,”)设置为十进制分隔符 - R set dot (“.”) as thousands separator and comma (“,”) as decimal separator 使用包readxl将xlsx数据导入R时指定列类型 - Specifying Column Types when Importing xlsx Data to R with Package readxl 读取R中的数据集,其中逗号用于字段分隔符和小数点 - Read dataset in R in which comma is used for field separator and decimal point 如何在formattable()包R中用小数点分隔符逗号替换小数点分隔符点? - How to replace decimal separator dot by decimal separator comma in formattable() package R? R:如何读取带有data.table :: fread的CSV文件,其逗号为十进制,并指向千位分隔符=“。” - R: How can I read a CSV file with data.table::fread, that has a comma as decimal and point as thousand separator=“.” 将数据框导入 R 时从 an.xlsx 文件中标记特定数据 - Tagging specific data from an .xlsx file when importing a data frame to R 冲积图上的数据标签的逗号分隔符(R ggalluvial) - Comma separator for data labels on Alluvial Plot (R ggalluvial)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM