简体   繁体   English

有没有办法在R中的read_excel(readxl)中按列名分配col_types

[英]Is there any way to assign the col_types by column names in read_excel(readxl) in R

My application is reading the xls and xlsx files using the read_excel function of the readxl package. 我的应用程序是使用readxl包的read_excel函数读取xlsxlsx文件。

The sequence and the exact number of columns are not known earlier while reading the xls or xlsx file. 在读取xlsxlsx文件时,先前不知道序列和确切的列数。 There are 15 predefined columns out of which 10 columns are mandatory and remaining 5 columns are optional . 15个预定义列 ,其中10列必需的 ,剩下5列可选的 So the file will always have minimum 10 columns and at maximum 15 columns. 因此,该文件将始终具有最少10列, 最多15列。

I need to specify the the col-types to the mandatory 10 columns. 我需要将col-types指定为必需的10列。 The only way I can think of is using the column names to specify the col_types as I know for fact that the file has all 10 columns which are mandatory but they are in the random sequence. 我能想到的唯一方法是使用列名来指定col_types因为我知道该文件包含所有10列是必需的,但它们是随机序列。

I tried looking out for the way of doing so but failed to do so. 我试着找出这样做的方法,但未能这样做。

Can anyone help me find a way to assign the col_types by column names? 任何人都可以帮我找到一种按列名分配col_types的方法吗?

I solve the problem by below workaround. 我通过以下解决方法解决了这个问题。 It is not the best way to solve this problem though. 但这不是解决这个问题的最佳方法。 I have read the excel file twice which will take a hit on performance if the file has very large volume of data. 我已经两次读取excel文件 ,如果文件的数据量非常大,将会对性能产生影响。

First read: Building column data type vector - Reading the file for retrieving the columns information(like column names, number of columns and it's types) and building the column_data_types vector which will have the datatype for every column in the file. 首先阅读: 构建列数据类型向量 - 读取文件以检索列信息(如列名,列数及其类型),并构建column_data_types vector ,该vector将具有文件中每列的datatype

#reading .xlsx file
site_data_columns <- read_excel(paste(File$datapath, ".xlsx", sep = ""))

site_data_column_names <- colnames(site_data_columns)

for(i in 1 : length(site_data_column_names)){  

    #where date is a column name
    if(site_data_column_names[i] == "date"){
         column_data_types[i] <- "date"

         #where result is a column name
         } else if (site_data_column_names[i] == "result") {
                      column_data_types[i] <- "numeric"

         } else{
                column_data_types[i] <- "text"
        }
}

Second read: Reading the file content- reading the excel file by supplying col_types parameter with the vector column_data_types which has the column data types . 第二次读取: 读取文件内容 -通过向col_types参数提供具有列data typesvector column_data_types来读取excel文件。

#reading .xlsx file
site_data <- read_excel(paste(File$datapath, ".xlsx", sep = ""), col_types = column_data_types)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM