简体   繁体   English

熊猫csv / multiindex子集

[英]Pandas csv/multiindex subsetting

I am trying to read a .csv file that has two rows of header information as a multiindex, so that I can later access a column given 2 identifiers. 我正在尝试读取具有两行标头信息作为multiindex的.csv文件,以便以后可以访问具有2个标识符的列。 The file looks like this (tab-delimited), and the values that are NA are deliberately that way: 该文件看起来像这样(制表符分隔),并且NA的值故意是这样的:

ind Human Human Human Mouse Mouse Mouse ...
(null) Codon Freq minmax Codon Freq minmax ...
0 ATG 12.5 -5.2 --- NA NA ...
1 AAA 8.9 -25.5 --- NA NA ...
2 GGA 16.5 12.4 ATG 11.9 6.5 ...

I can read the file in with two rows of headers, but this results in an object of class 'pandas.core.frame.DataFrame' instead of 'pandas.core.index.MultiIndex': 我可以读取带有两行标题的文件,但这会导致对象为“ pandas.core.frame.DataFrame”类,而不是“ pandas.core.index.MultiIndex”类:
data = pd.read_csv('alignment.csv', sep="\\t", header=[0,1])

When I try specifying index_col=0, as some examples in the documentation do, I get a "IndexError: list index out of range" error, which was a solution to several related questions but for some reason isn't working for me. 当我尝试指定index_col = 0时(如本文档中的某些示例所示),我收到“ IndexError:列表索引超出范围”错误,它是针对一些相关问题的解决方案,但由于某些原因对我不起作用。

Moving on, I've attempted to subset the data in a variety of ways, all of which have failed. 继续前进,我试图以各种方式对数据进行子集化,但所有方法都失败了。 The closest I've gotten (I think) to what I want is by doing 我认为(我认为)最接近我想要做的是
temp = data.ix[:,[("","ind"),("Human","minmax")]]
...which at least gives me a DataFrame of the right dimensions and labeled correctly, but all of the values have been replaced with NaN. ...这至少为我提供了正确尺寸的DataFrame并正确标记,但所有值均已替换为NaN。 Using .loc gives me an error about being improperly sorted, and I haven't been able to get .xs to work at all. 使用.loc给我一个关于排序不正确的错误,而且我根本无法使.xs正常工作。

Essentially I'm looking for a way to subset the data set based on the species and the parameter (eg human and minmax). 本质上,我正在寻找一种基于种类和参数(例如human和minmax)对数据集进行子集化的方法。 I've looked through several related questions here but haven't been able to solve the problem yet. 我在这里浏览了几个相关的问题,但是还不能解决问题。 How could I achieve this? 我怎样才能做到这一点?

Hmm... it seems to work for me... what version of Pandas/Python are you using? 嗯...似乎对我有用...您使用的是哪个版本的Pandas / Python?

df= pd.read_clipboard(header = [0,1], index_col=0)

df
Out[389]: 
ind    Human              Mouse               ...
(null) Codon  Freq minmax Codon  Freq minmax  ...
0        ATG  12.5   -5.2   ---   NaN    NaN  ...
1        AAA   8.9  -25.5   ---   NaN    NaN  ...
2        GGA  16.5   12.4   ATG  11.9    6.5  ...


df.Human.minmax
df.Human.minmax
Out[390]: 
0    -5.2
1   -25.5
2    12.4
Name: minmax, dtype: float64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM