[英]R uploads numeric value into boolean column - why?
I want to upload an Excel file as a dataframe in R. It is a large file with a lot of numbers and some #NV values.我想上传一个 Excel 文件作为 R 中的 dataframe。这是一个包含很多数字和一些#NV 值的大文件。
The upload works good for the majority of columns (in total, there are 4,000 columns).上传适用于大多数列(总共有 4,000 列)。 But for some columns, R changes the columns to "TRUE" or "FALSE", creating a boolean column.
但对于某些列,R 将列更改为“TRUE”或“FALSE”,从而创建 boolean 列。
I don't want that, since all of the columns are supposed to be numeric.我不希望这样,因为所有的列都应该是数字的。
Do you know why R does that?你知道为什么 R 这样做吗?
It would really help if you provided code snippets, because there are many different excel-to-dataframe libraries/methods/behaviors.如果您提供代码片段,那将非常有帮助,因为有许多不同的 excel 到数据框库/方法/行为。
But assuming that you are using writexl
, the read_excel
function has a guess_max
parameter for this kind of case.但假设您使用的是
writexl
,则read_excel
function 有一个针对这种情况的guess_max
参数。 guess_max
is 1000 by default. guess_max
默认为 1000。
Try df <- read_excel(path = filepath, sheet = sheet_name, guess_max = 100000)
试试
df <- read_excel(path = filepath, sheet = sheet_name, guess_max = 100000)
Since dataframes cannot have different data types in the same column, read_excel
has to read your excel file and guess what data type each column should be, before actually filling the dataframe. If a column happens to only have NA values in the first 1000 rows, read_excel
will assume you have a column of booleans, and then all subsequent values encountered in future rows will be cast accordingly.由于数据帧在同一列中不能有不同的数据类型,
read_excel
必须读取您的 excel 文件并猜测每列应该是什么数据类型,然后再实际填充 dataframe。如果一列恰好在前 1000 行中只有 NA 值, read_excel
将假定您有一列布尔值,然后将相应地转换未来行中遇到的所有后续值。 So if you set guess_max
to something huge, you make read_excel
slower, but it might avoid the casting of numerics to booleans.因此,如果将
guess_max
设置为很大的值,会使read_excel
变慢,但它可能会避免将数字转换为布尔值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.