[英]Pandas - read data (two-row header, index column)
I have a data file that looks like this (miRNA-seq data from TCGA): 我有一个看起来像这样的数据文件(来自TCGA的miRNA-seq数据):
A X X X Y Y Y Z Z Z
B a b c a b c a b c
0
1 regular 5x9
2 data matrix
3
4
A
describes the first header row, B
describes the first column. A
描述了第一标题行, B
描述了第一列。 I want to use pandas.read_csv
to return a DataFrame such that I can access rows by something like df[0]
and columns by something like df['X']['a']
, eventually delete selected rows and/or columns. 我想使用
pandas.read_csv
返回一个DataFrame,这样我就可以按df[0]
类的方式访问行,并按df['X']['a']
类的方式访问列,最终删除选定的行和/或列。 On my search I have found that MultiIndex could help my, however df = pandas.read_csv("datafile", header=[0, 1]); print(df.index)
在我的搜索中,我发现MultiIndex可以帮助我,但是
df = pandas.read_csv("datafile", header=[0, 1]); print(df.index)
df = pandas.read_csv("datafile", header=[0, 1]); print(df.index)
does return an Index
. df = pandas.read_csv("datafile", header=[0, 1]); print(df.index)
确实返回Index
。
Thank you for any suggestions. 感谢您的任何建议。
EDIT: Some sample data (tab-separated) 编辑:一些示例数据(制表符分隔)
Hybridization REF TCGA-2V-A95S-01A-11R-A37G-13 TCGA-2V-A95S-01A-11R-A37G-13 TCGA-2V-A95S-01A-11R-A37G-13 TCGA-2Y-A9GS-01A-12R-A38M-13 TCGA-2Y-A9GS-01A-12R-A38M-13 TCGA-2Y-A9GS-01A-12R-A38M-13 TCGA-2Y-A9GT-01A-11R-A38M-13 TCGA-2Y-A9GT-01A-11R-A38M-13 TCGA-2Y-A9GT-01A-11R-A38M-13
miRNA_ID read_count reads_per_million_miRNA_mapped cross-mapped read_count reads_per_million_miRNA_mapped cross-mapped read_count reads_per_million_miRNA_mapped cross-mapped
hsa-let-7a-1 17377 4045.749542 N 47187 7077.368096 N 31765 8956.551210 N
hsa-let-7a-2 34913 8128.517796 N 94766 14213.530526 Y 64148 18087.355487 N
hsa-let-7a-3 17496 4073.455371 N 47683 7151.760928 N 31782 8961.344580 N
hsa-let-7b 33546 7810.249993 N 46089 6912.683963 N 64948 18312.925799 N
hsa-let-7c 1349 314.077006 N 12185 1827.573913 Y 14075 3968.627681 N
hsa-let-7d 1735 403.946335 N 1763 264.424523 N 1176 331.588359 N
Try this out: 试试看:
df=pd.read_csv('zhoop.csv', header=[0,1], index_col=0)
Note: in order to index rows you would use df.loc[rownum], not just df[rownum]. 注意:为了索引行,您将使用df.loc [rownum],而不仅仅是df [rownum]。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.