简体   繁体   English

如何将CSV文件的选定标题转换为Pandas数据框

[英]How to turn selected headers of a CSV file into a Pandas data frame

I have the following CSV data 我有以下CSV数据

id,gene,celltype,stem,stem,stem,bcell,bcell,tcell
id,gene,organs,bm,bm,fl,pt,pt,bm
id,gene,organs,stem1,stem2,stem3,b1,b2,t1
134,foo,about_foo,20,10,11,23,22,79
222,bar,about_bar,17,13,55,12,13,88

The first three lines are the header. 前三行是标题。 What I want to do is to select line 1 and 3 and turn it into a data frame that looks like this: 我要选择的是第1行和第3行,然后将其转换为如下所示的数据框:

Coln1 Coln2
stem  stem1
stem  stem2
stem  stem3
bcell b1
bcell b2
tcell t1

I am stuck with the following: 我坚持以下几点:

import pandas as pd
df = pd.read_csv("http://dpaste.com/00AWDBW.txt",header=None,index_col=[1,2]).iloc[:, 1:]

You can use parameters nrows and skiprows in read_csv : 您可以使用参数nrowsskiprowsread_csv

import pandas as pd
import io

temp=u"""id,gene,celltype,stem,stem,stem,bcell,bcell,tcell
id,gene,organs,bm,bm,fl,pt,pt,bm
id,gene,organs,stem1,stem2,stem3,b1,b2,t1
134,foo,about_foo,20,10,11,23,22,79
222,bar,about_bar,17,13,55,12,13,88"""

#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp),header=None,index_col=[1,2], nrows=2, skiprows=[1])
df = df.ix[:, 1:].reset_index(drop=True).T
df.columns = ['Coln1', 'Coln2']
print df.reset_index(drop=True)

   Coln1  Coln2
0   stem  stem1
1   stem  stem2
2   stem  stem3
3  bcell     b1
4  bcell     b2
5  tell     t1

To select top 3 header into columns do this: 要将前3个标题选择为列,请执行以下操作:

df = pd.read_csv(io.StringIO(temp),header=None,index_col=[1,2], nrows=3, skiprows=[4])
df = df.ix[:, 1:].reset_index(drop=True).T
df.columns = ['Coln1', 'Coln2','Coln3']
print df.reset_index(drop=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM