Python extract whole content of column from data frame using pandas

Question

I want to extract the whole content of a column from a multi-column data frame using pandas but I am getting only a part of the column.

The code I am using is:

import pandas
import csv
data = pandas.read_csv('data1.csv', usecols = ['dbSNP RS ID'])

import sys  
sys.stdout = open("data2.csv", "w") 
print data

What I get is something like this:

       dbSNP RS ID
0        rs4147951
1        rs2022235
2        rs6425720
3       rs12997193
4        rs9933410
5        rs7142489
...            ...
934963  rs10262938
934964   rs6140985
934965   rs2704067
934966   rs2239441
934967  rs10041689

[934968 rows x 1 columns]

The first 2 lines of the csv file are:

"Probe Set ID","dbSNP RS ID","Chromosome","Physical Position","Strand","ChrX    pseudo-autosomal region 1","Cytoband","Flank","Allele A","Allele B","Associated Gene","Genetic Map","Microsatellite","Fragment Enzyme Type Length Start Stop","Allele Frequencies","Heterozygous Allele Frequencies","Number of individuals","In Hapmap","Strand Versus dbSNP","Copy Number Variation","Probe Count","ChrX pseudo-autosomal region 2","In Final List","Minor Allele","Minor Allele Frequency","% GC","OMIM"

"AFFX-   SNP_10000979","rs4147951","17","66943738","+","0","q24.2","GGATAAGGATGGGCTA[A/G]ATTATCATTGCTGTTA","A","G","ENST00000269080 // intron // 0 // Hs.58351 // ABCA8 // 10351 // ATP-binding cassette, sub-family A (ABC1), member 8 /// ENST00000428549 // intron // 0 // Hs.58351 // ABCA8 // 10351 // ATP-binding cassette, sub-family A (ABC1), member 8 /// ENST00000541225 // intron // 0 // Hs.58351 // ABCA8 // 10351 // ATP-binding cassette, sub-family A (ABC1), member 8 /// ENST00000542396 // intron // 0 // Hs.58351 // ABCA8 // 10351 // ATP-binding cassette, sub-family A (ABC1), member 8 /// NM_007168 // intron // 0 // Hs.58351 // ABCA8 // 10351 // ATP-binding cassette, sub-family A (ABC1), member 8","99.8510 // D17S795 // D17S2182 // --- // --- // deCODE /// 90.7912 // D17S1870 // D17S840 // AFM323TB1 // AFM207VF4 // Marshfield /// 82.3131 // --- // D17S1786 // 147671 // --- // SLM1","D17S795 // downstream // 265562 /// D17S1474E // upstream // 113179","NspI // ACATGT_ACATGT // 536 // 66943408 // 66943943 /// StyI // CCTTGG_CCATGG // 2334 // 66941614 // 66943947","0.3917 // 0.6083 // CEU /// 0.6444 // 0.3556 // CHB /// 0.6000 // 0.4000 // JPT /// 0.5667 // 0.4333 // YRI","0.3833 // CEU /// 0.4889 // CHB /// 0.4444 // JPT /// 0.5667 // YRI","60 // CEU /// 45 // CHB /// 45 // JPT /// 60 // YRI","YES","reverse","---","6","0","YES","A // CEU /// G // CHB /// G // JPT /// G // YRI","0.3917 // CEU /// 0.3556 // CHB /// 0.4000 // JPT /// 0.4333 // YRI","---","---"

Any idea about how to extract the 'dbSNP RS ID' from the 934968 rows??. Thank you very much !

Answer 1

IIUC you should read and write again a .csv file with:

data = pandas.read_csv('data1.csv', usecols = ['dbSNP RS ID'])

data.to_csv('data2.csv')

The problem with your code is that the print function actually writes on file only the part of the file that pandas shows in the terminal prompt. When there are too much rows it splits the output adding ... in the middle.

Python extract whole content of column from data frame using pandas

Question

1 answers

solution1
1 ACCPTED 2015-12-10 14:21:01

Python extract whole content of column from data frame using pandas

Question

1 answers

solution1 1 ACCPTED 2015-12-10 14:21:01

solution1
1 ACCPTED 2015-12-10 14:21:01