繁体   English   中英

Python使用Pandas从数据框中提取列的全部内容

[英]Python extract whole content of column from data frame using pandas

我想使用pandas从多列数据框中提取列的全部内容,但是我只得到了一部分。

我使用的代码是:

import pandas
import csv
data = pandas.read_csv('data1.csv', usecols = ['dbSNP RS ID'])

import sys  
sys.stdout = open("data2.csv", "w") 
print data

我得到的是这样的:

       dbSNP RS ID
0        rs4147951
1        rs2022235
2        rs6425720
3       rs12997193
4        rs9933410
5        rs7142489
...            ...
934963  rs10262938
934964   rs6140985
934965   rs2704067
934966   rs2239441
934967  rs10041689

[934968 rows x 1 columns]

csv文件的前两行是:

"Probe Set ID","dbSNP RS ID","Chromosome","Physical Position","Strand","ChrX    pseudo-autosomal region 1","Cytoband","Flank","Allele A","Allele B","Associated Gene","Genetic Map","Microsatellite","Fragment Enzyme Type Length Start Stop","Allele Frequencies","Heterozygous Allele Frequencies","Number of individuals","In Hapmap","Strand Versus dbSNP","Copy Number Variation","Probe Count","ChrX pseudo-autosomal region 2","In Final List","Minor Allele","Minor Allele Frequency","% GC","OMIM"

"AFFX-   SNP_10000979","rs4147951","17","66943738","+","0","q24.2","GGATAAGGATGGGCTA[A/G]ATTATCATTGCTGTTA","A","G","ENST00000269080 // intron // 0 // Hs.58351 // ABCA8 // 10351 // ATP-binding cassette, sub-family A (ABC1), member 8 /// ENST00000428549 // intron // 0 // Hs.58351 // ABCA8 // 10351 // ATP-binding cassette, sub-family A (ABC1), member 8 /// ENST00000541225 // intron // 0 // Hs.58351 // ABCA8 // 10351 // ATP-binding cassette, sub-family A (ABC1), member 8 /// ENST00000542396 // intron // 0 // Hs.58351 // ABCA8 // 10351 // ATP-binding cassette, sub-family A (ABC1), member 8 /// NM_007168 // intron // 0 // Hs.58351 // ABCA8 // 10351 // ATP-binding cassette, sub-family A (ABC1), member 8","99.8510 // D17S795 // D17S2182 // --- // --- // deCODE /// 90.7912 // D17S1870 // D17S840 // AFM323TB1 // AFM207VF4 // Marshfield /// 82.3131 // --- // D17S1786 // 147671 // --- // SLM1","D17S795 // downstream // 265562 /// D17S1474E // upstream // 113179","NspI // ACATGT_ACATGT // 536 // 66943408 // 66943943 /// StyI // CCTTGG_CCATGG // 2334 // 66941614 // 66943947","0.3917 // 0.6083 // CEU /// 0.6444 // 0.3556 // CHB /// 0.6000 // 0.4000 // JPT /// 0.5667 // 0.4333 // YRI","0.3833 // CEU /// 0.4889 // CHB /// 0.4444 // JPT /// 0.5667 // YRI","60 // CEU /// 45 // CHB /// 45 // JPT /// 60 // YRI","YES","reverse","---","6","0","YES","A // CEU /// G // CHB /// G // JPT /// G // YRI","0.3917 // CEU /// 0.3556 // CHB /// 0.4000 // JPT /// 0.4333 // YRI","---","---"

关于如何从934968行中提取“ dbSNP RS ID”的任何想法? 非常感谢你 !

IIUC,您应该使用以下命令再次读写一个.csv文件:

data = pandas.read_csv('data1.csv', usecols = ['dbSNP RS ID'])

data.to_csv('data2.csv')

您的代码的问题在于, print功能实际上仅将熊猫在终端提示符下显示的文件部分写入文件。 当行太多时,它将输出分割成中间的...

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM