简体   繁体   English

使用R中的readxl包从excel文件的某些行开始读取特定列

[英]Read specific columns starting from certain rows from excel file using readxl package in R

I'm trying to read an excel file into R. I need to read column A and column C (no B), starting from row 5. Here is what I did: 我正在尝试将excel文件读入R。我需要从第5行开始读取A列和C列(无B)。这是我所做的:

library(readxl)

read_excel('./data/temp.xlsx',  skip=5,
            range=cell_cols(c('A', 'C')))

The code above does not work. 上面的代码不起作用。 First, it does not skip 5 rows. 首先,它不会跳过5行。 It reads from first row. 它从第一行读取。 Secondly, it also read column B, which I do not want. 其次,它也读了B列,我不想。

Does anyone know what I did wrong? 有人知道我做错了吗? I know how to specify the cell range, but how should I pick the specific columns I need? 我知道如何指定单元格范围,但是如何选择所需的特定列呢?

You can use the column_types argument (check ?read_excel ) to skip columns from being read. 您可以使用column_types参数(检查?read_excel )来跳过被读取的列。 For instance, if columns A and C are numeric: 例如,如果列A和C是数字:

readxl::read_excel("/path/to/data.xlsx", 
    col_names = FALSE, 
    skip = 5, 
    col_types=c("numeric", "skip", "numeric"))

NB: if the column types are unknown initially you could read them as text and convert them afterwards. 注意:如果最初不知道列类型,则可以将其读取为文本,然后进行转换。

Borrowing the content from readxl.tidyverse.org . readxl.tidyverse.org借阅内容。 One of your questions regarding why column B is also added is because: 关于为什么还要添加B列的问题之一是因为:

## columns only
read_excel(..., range = cell_cols(1:26))
## is equivalent to all of these
read_excel(..., range = cell_cols(c(1, 26)))
read_excel(..., range = cell_cols("A:Z"))
read_excel(..., range = cell_cols(LETTERS))
read_excel(..., range = cell_cols(c("A", "Z"))

Hence, cell_cols("A:C") is equivalent to cell_cols(c("A", "C")) 因此, cell_cols("A:C")等同于cell_cols(c("A", "C"))

Previously, what I did was in one of my projects was the following. 以前,我在一个项目中所做的工作如下。 I guess you can adapt the following and extract the data by column, then join them together. 我猜您可以调整以下内容并按列提取数据,然后将它们合并在一起。

ranges = list("A5:H18", "A28:H39", "A50:H61")

extracted <- lapply(ranges, function(each_range){
                read_excel(filepath, sheet = 1, range = each_range, na = c("", "-"), col_names = cname, col_types = ctype)
        }) %>%
                reduce(full_join) 

Regarding your question about skipping rows, I'm also not sure because I was also searching for this answer, and found your question on stackoverflow. 关于您关于跳过行的问题,我也不确定,因为我也在搜索此答案,并在stackoverflow上找到了您的问题。

[edit] I think I found some readings on https://github.com/tidyverse/readxl/issues/577 . [编辑]我想我在https://github.com/tidyverse/readxl/issues/577上找到了一些读物。 Anyway, if you use range , you can't do any skip , as range takes precedence over skip and others 无论如何,如果您使用range ,则不能执行任何skip ,因为range优先于skip和其他

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从R中的readxl包中的Excel文件导入特定的工作表,特定的行和特定的列 - Import specific sheets and specific rows and specific columns from Excel file from readxl package in R 使用 readxl 包从 URL 读取 Excel 文件 - Read Excel file from a URL using the readxl package 像在R中使用readxl一样读取Excel数据 - Read excel data as is using readxl in R 使用readxl包范围将所需的单元格或单元格范围从.xlsx文件导入到R data.frame中 - Import the desired cells or cell range from .xlsx file into R data.frame using readxl package range 如何使用“readxl”包从 .xlsx 文件中选择两个特定列? - How to pick two particular columns from .xlsx file using 'readxl' package? R读取excel文件并选择特定的行和列 - R Read excel file and select specific rows and columns 如何使用R语言从第一行开始读取CSV文件中的特定行,直到不使用索引就可以读取到某行 - How to read a specific rows in CSV file in R language starting from first row, up-to some row without using of index read_excel(来自 readxl 包)将 1899-12-30 之前的日期读取为 NA,如何解决这个问题? - read_excel (from readxl package) reads dates before 1899-12-30 as NA, how to workaround this? R readxl::read_excel 无法打开 xls 文件 - R readxl::read_excel failed to open xls file R 文件引用在 readxl::read_excel 的上下文中不起作用 function - R file referencing not working in the context of readxl::read_excel function
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM