[英]Is there any way to assign the col_types by column names in read_excel(readxl) in R
My application is reading the xls
and xlsx
files using the read_excel
function of the readxl
package. 我的应用程序是使用
readxl
包的read_excel
函数读取xls
和xlsx
文件。
The sequence and the exact number of columns are not known earlier while reading the xls
or xlsx
file. 在读取
xls
或xlsx
文件时,先前不知道序列和确切的列数。 There are 15 predefined columns out of which 10 columns are mandatory and remaining 5 columns are optional . 有15个预定义列 ,其中10列是必需的 ,剩下5列是可选的 。 So the file will always have minimum 10 columns and at maximum 15 columns.
因此,该文件将始终具有最少10列, 最多15列。
I need to specify the the col-types
to the mandatory 10 columns. 我需要将
col-types
指定为必需的10列。 The only way I can think of is using the column names to specify the col_types
as I know for fact that the file has all 10 columns which are mandatory but they are in the random sequence. 我能想到的唯一方法是使用列名来指定
col_types
因为我知道该文件包含所有10列是必需的,但它们是随机序列。
I tried looking out for the way of doing so but failed to do so. 我试着找出这样做的方法,但未能这样做。
Can anyone help me find a way to assign the col_types by column names? 任何人都可以帮我找到一种按列名分配col_types的方法吗?
I solve the problem by below workaround. 我通过以下解决方法解决了这个问题。 It is not the best way to solve this problem though.
但这不是解决这个问题的最佳方法。 I have read the excel file twice which will take a hit on performance if the file has very large volume of data.
我已经两次读取excel文件 ,如果文件的数据量非常大,将会对性能产生影响。
First read: Building column data type vector - Reading the file for retrieving the columns information(like column names, number of columns and it's types) and building the column_data_types
vector
which will have the datatype
for every column in the file. 首先阅读: 构建列数据类型向量 - 读取文件以检索列信息(如列名,列数及其类型),并构建
column_data_types
vector
,该vector
将具有文件中每列的datatype
。
#reading .xlsx file
site_data_columns <- read_excel(paste(File$datapath, ".xlsx", sep = ""))
site_data_column_names <- colnames(site_data_columns)
for(i in 1 : length(site_data_column_names)){
#where date is a column name
if(site_data_column_names[i] == "date"){
column_data_types[i] <- "date"
#where result is a column name
} else if (site_data_column_names[i] == "result") {
column_data_types[i] <- "numeric"
} else{
column_data_types[i] <- "text"
}
}
Second read: Reading the file content- reading the excel file by supplying col_types
parameter with the vector
column_data_types
which has the column data types
. 第二次读取: 读取文件内容 -通过向
col_types
参数提供具有列data types
的vector
column_data_types
来读取excel文件。
#reading .xlsx file
site_data <- read_excel(paste(File$datapath, ".xlsx", sep = ""), col_types = column_data_types)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.