简体繁体 English

R 将数值上传到 boolean 列 - 为什么？

[英]R uploads numeric value into boolean column - why?

原文 2023-01-24 16:08:36 3 1 r/ boolean

I want to upload an Excel file as a dataframe in R. It is a large file with a lot of numbers and some #NV values.我想上传一个 Excel 文件作为 R 中的 dataframe。这是一个包含很多数字和一些#NV 值的大文件。

The upload works good for the majority of columns (in total, there are 4,000 columns).上传适用于大多数列（总共有 4,000 列）。 But for some columns, R changes the columns to "TRUE" or "FALSE", creating a boolean column.但对于某些列，R 将列更改为“TRUE”或“FALSE”，从而创建 boolean 列。

I don't want that, since all of the columns are supposed to be numeric.我不希望这样，因为所有的列都应该是数字的。

Do you know why R does that?你知道为什么 R 这样做吗？

1 个解决方案

It would really help if you provided code snippets, because there are many different excel-to-dataframe libraries/methods/behaviors.如果您提供代码片段，那将非常有帮助，因为有许多不同的 excel 到数据框库/方法/行为。

But assuming that you are using writexl , the read_excel function has a guess_max parameter for this kind of case.但假设您使用的是writexl ，则read_excel function 有一个针对这种情况的guess_max参数。 guess_max is 1000 by default. guess_max默认为 1000。

Try df <- read_excel(path = filepath, sheet = sheet_name, guess_max = 100000)试试df <- read_excel(path = filepath, sheet = sheet_name, guess_max = 100000)

Since dataframes cannot have different data types in the same column, read_excel has to read your excel file and guess what data type each column should be, before actually filling the dataframe. If a column happens to only have NA values in the first 1000 rows, read_excel will assume you have a column of booleans, and then all subsequent values encountered in future rows will be cast accordingly.由于数据帧在同一列中不能有不同的数据类型， read_excel必须读取您的 excel 文件并猜测每列应该是什么数据类型，然后再实际填充 dataframe。如果一列恰好在前 1000 行中只有 NA 值， read_excel将假定您有一列布尔值，然后将相应地转换未来行中遇到的所有后续值。 So if you set guess_max to something huge, you make read_excel slower, but it might avoid the casting of numerics to booleans.因此，如果将guess_max设置为很大的值，会使read_excel变慢，但它可能会避免将数字转换为布尔值。