简体   繁体   中英

read_delim( ) from tidyverse cannot directly correct misaligned headers of text file as basic read.table() does

I am trying to use tidyverse read_delim() to read a tab-separated text file. I can easily use the basic R's read.table() with no problem but when I tested read_delim() with delim = "\t"; I got a problem. For example, I have a file below, "test.txt". As you can see, the header shifts to the right as the first col is row names without a header.

T1  T2  T3
A   1   4   7
B   2   5   8
C   3   6   9

I can use basic R to read this file successfully:

dat <- read.table("test.txt", header=T, sep="\t")

dat
   T1 T2 T3
A  1  4  7
B  2  5  8
C  3  6  9

But when I tried to use tidyverse read_delim, I got problems:

dat1 <- read_delim("test.txt", delim ="\t")
Rows: 3 Columns: 3                                                                                                   
── Column specification ──────────────────────────────────────────────────────────
Delimiter: "\t"
chr (2): T1, T3
dbl (1): T2
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Warning message:
One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)

I know basic R's read.table() can automatically correct this problem, but could someone tell me if tidyverse read_delim() has a way to resolve this issue? Thank you! -Xiaokuan

The issue isn't exactly that the headers are misaligned - it's that readr doesn't support or recognize row names at all.* readr::read_delim() therefore doesn't account for the fact that row names don't have a column header, and just sees three column names followed by four columns of data.

If your goal is to import your data as a tibble, your best bet is probably to use base::read.table() , then tibble::as_tibble() , using the rownames arg to convert the row names to a regular column.

library(tibble)

dat <- read.table("test.txt", header=T, sep="\t")

as_tibble(dat, rownames = "row")
# A tibble: 3 × 4
  row      T1    T2    T3
  <chr> <dbl> <dbl> <dbl>
1 A         1     4     7
2 B         2     5     8
3 C         3     6     9

Another option would be to manually edit your input file to include a column head above the row names.


*This isn't an oversight, by the way — it's an intentional choice by the tidyverse team, as they believe row names to be bad practice. eg, from the tibble docs: “Generally, it is best to avoid row names, because they are basically a character column with different semantics than every other column.” Also see this interesting discussion from the tibble github.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM