简体   繁体   中英

Remove the first column name in a data frame from fread() in R

I am trying to remove the first name from colnames generated via fread(). The first column name acts only as the title of the row names. Later on in the workflow, this "title" really messes up my data since it's treated as one of the rows, so somehow, I need it to be ignored or non-existent.

a subset of my DGE_file looks like this:

            GENE ATGGCGAACCTACATCCC ATGGCGAGGACTCAAAGT
1: 0610009B22Rik                  1                  0
2: 0610009E02Rik                  0                  0

I tried to remove the first column name like this:

library(Matrix)
library("data.table")

# Read in the dge file
DGE_file<- fread(file="DGE.txt", stringsAsFactors = TRUE)

colnames(DGE_file)<-colnames(DGE_file)[-1]
DGE_file<- as.matrix(DGE_file)

which understandably enough yields the error:

> colnames(DGE_file)<-colnames(DGE_file)[-1]
Error in setnames(x, value) : 
  Can't assign 10000 names to a 10001 column data.table

I have already tried to replace it with NA but it yielded an error in downstream processing that I couldn't work around.

How can I remove the title "gene" or make it "invisible" in downstream processing?

The following should work

library(Matrix)
library("data.table")

# Read in the dge file
DGE_file<- fread(file="DGE.txt", stringsAsFactors = TRUE)
# Set the first column name to the empty string.
names(DGE_file)[1] <- ""

You can read the file without the header and the first line and later set the column names. However, in my own opinion, having a column name without a name or NA as a name might be problematic.

require(magrittr) # for piping
require(data.table) #For reading with fread

# Read in the dge file
#Without header and skiping the first line
DGE_file <- fread(file="DGE.txt",
                  skip = 1,
                  header=FALSE,
                  stringsAsFactors = TRUE)

#Set the column names (for "invisible" name)
DGE_file <- DGE_file %>% 
  purrr::set_names(c("", "ATGGCGAACCTACATCCC",
                     "ATGGCGAGGACTCAAAGT"))

OR

#Set the column names (for NA as the first name)
DGE_file <- DGE_file %>% 
  purrr::set_names(c(NA, "ATGGCGAACCTACATCCC",
                     "ATGGCGAGGACTCAAAGT"))

The base R solution for adding names could look like that:

#Read the file with header 
DGE_file <- fread(file="DGE.txt",
                  header=TRUE,
                  stringsAsFactors = TRUE)

#Set an "inivisible" as a name
names(DGE_file)[1] <- ""

#Or set an NA as a name
names(DGE_file)[1] <- NA

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM