简体   繁体   中英

R: Read in .csv file and convert into multiple column data frame

I am new to R and currently having a plenty of trouble just reading in .csv file and converting it into data.frame with 7 columns. Here is what I am doing:

gene_symbols_table <- as.data.frame(read.csv(file="/home/nikita/Desktop
/CElegans_raw_data/gene_symbols_matching.csv", header=TRUE, sep=","))

After that I am getting a data.frame with dim = 46761 x 1 , but I need it to be 46761 x 7 . I tried the following stackoverflow threads:

  1. How can you read a CSV file in R with different number of columns

  2. read.delim() - errors "more columns than column names" and "header and ''col.names" are of different lengths"

  3. Split a column of a data frame to multiple columns

But somehow nothing is working in my case. Here is how the table looks:

> head(gene_symbols_table, 3)
input.reason.matches.organism.name.primaryIdentifier.symbol.briefDescription.c
lass.secondaryIdentifier
1                     WBGene00008675 MATCH 1 Caenorhabditis elegans    
WBGene00008675 irld-26  Gene F11A5.7
2                      WBGene00008676 MATCH 1 Caenorhabditis elegans 
WBGene00008676 oac-15  Gene F11A5.8
3                            WBGene00008677 MATCH 1 Caenorhabditis elegans 
WBGene00008677   Gene F11A5.9

The .csv file in Excel looks like this:

input   |  reason   |  matches  |   organism.name  |    primaryIdentifier   |  symbol   | 
briefDescription
WBGene00008675  |   MATCH  |    1     |   Caenorhabditis elegans    WBGene00008675  |   irld-26   |   ...   
...

The following code:

gene_symbols_table <- read.table(file="/home/nikita/Desktop
/CElegans_raw_data/gene_symbols_matching.csv", header=FALSE, sep=",", 
col.names = paste0("V",seq_len(7)), fill = TRUE)

Seems to be working, however when I look into dim I can see right away that it is wrong: 20124 x 7 . Then:

V1
1input;reason;matches;organism.name;primaryIdentifier;symbol;briefDescription;class;secondaryIdentifier
2                     WBGene00008675;MATCH;1;Caenorhabditis 
elegans;WBGene00008675;irld-26;;Gene;F11A5.7
3                      WBGene00008676;MATCH;1;Caenorhabditis 
elegans;WBGene00008676;oac-15;;Gene;F11A5.8
  V2 V3 V4 V5
1            
2            
3        

1

So, it is wrong

Other attempts at read.table are giving me the error specified in the second stackoverflow thread.

I have also tried splitting the data.frame with one column into 7, but so far no success.

The sep seems to be space or semi-colon, and not comma from what the table looks like. So either try specifying that, or you could try fread from the data.table package, which automatically detects the separator.

gene_symbols_table <- as.data.frame(fread(file="/home/nikita/Desktop
/CElegans_raw_data/gene_symbols_matching.csv", header=TRUE))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM