简体   繁体   English

熊猫使用MultiIndex切片数据

[英]Pandas slicing data with MultiIndex

I have some features that I want to write to some csv files. 我有一些要写入某些csv文件的功能。 I want to use pandas for this approach if possible. 如果可能的话,我想将熊猫用于这种方法。
I am following the instruction in here and have created some dummy data to check it out. 我正在按照此处的说明进行操作,并创建了一些虚拟数据以将其检出。 Basically there are some activities with a random number of features belonging to them. 基本上,有些活动具有属于它们的随机数量的功能。

import io
data = io.StringIO('''Activity,id,value,value,value,value,value,value,value,value,value
Run,1,1,2,2,5,6,4,3,2,1
Run,1,2,4,4,10,12,8,6,4,2
Stand,2,1.5,3.,3.,7.5,9.,6.,4.5,3.,1.5
Sit,3,0.5,1.,1.,2.5,3.,2.,1.5,1.,0.5
Sit,3,0.6,1.2,1.2,3.,3.6,2.4,1.8,1.2,0.6
Run, 2, 0.8, 1.6, 1.6, 4. , 4.8, 3.2, 2.4, 1.6, 0.8
''')
df_unindexed = pd.read_csv(data)
df = df_unindexed.set_index(['Activity', 'id'])

When I run: 当我跑步时:

df.xs('Run')

I get 我懂了

    value  value.1  value.2  value.3  value.4  value.5  value.6  value.7  \
id                                                                         
1     1.0      2.0      2.0      5.0      6.0      4.0      3.0      2.0   
1     2.0      4.0      4.0     10.0     12.0      8.0      6.0      4.0   
2     0.8      1.6      1.6      4.0      4.8      3.2      2.4      1.6   
    value.8  
id           
1       1.0  
1       2.0  
2       0.8 

which almost what I want, that is all run activities. 这几乎是我想要的,那就是所有run活动。 I want to remove the 1st row and 1st column, ie the header and the id column. 我想删除第一行和第一列,即标题和id列。 How do I achieve this? 我该如何实现?

Also a second question is when I want only one activity, how do I get it. 另外一个第二个问题是,当我只想要一项活动时,如何获得它。
When using 使用时

idx = pd.IndexSlice
df.loc[idx['Run', 1], :]

gives

             value  value.1  value.2  value.3  value.4  value.5  value.6  \
Activity id                                                                
Run      1     1.0      2.0      2.0      5.0      6.0      4.0      3.0   
         1     2.0      4.0      4.0     10.0     12.0      8.0      6.0   
             value.7  value.8  
Activity id                    
Run      1       2.0      1.0  
         1       4.0      2.0  

but slicing does not work as I would expect. 但切片无法像我期望的那样工作。 For example trying 例如尝试

df.loc[idx['Run', 1], 2:11]

instead produces an error: 而是产生一个错误:

TypeError: cannot do slice indexing on with these indexers [2] of 'int'> TypeError:无法使用“ int”>的这些索引器[2]进行切片索引

So, how do I get my features in this place? 那么,如何在这个地方获得功能?

PS If not clear I am new to Pandas so be gentle. PS:如果不清楚,我对Pandas并不Pandas所以要保持温柔。 Also the column id is editable to be unique to each activity or to whole dataset if this makes things easier etc 此外,列id可以编辑,以使每个活动或整个数据集都是唯一的,如果这样会使事情变得更容易等。

You can use a little hack - get columns names by positions, because iloc for MultiIndex is not yet supported : 您可以使用一些技巧-通过位置获取列名称,因为尚不支持 iloc for MultiIndex

print (df.columns[2:11])
Index(['value.2', 'value.3', 'value.4', 'value.5', 'value.6', 'value.7',
       'value.8'],
      dtype='object')

idx = pd.IndexSlice
print (df.loc[idx['Run', 1], df.columns[2:11]])
             value.2  value.3  value.4  value.5  value.6  value.7  value.8
Activity id                                                               
Run      1       2.0      5.0      6.0      4.0      3.0      2.0      1.0
         1       4.0     10.0     12.0      8.0      6.0      4.0      2.0

If want save file to csv without index and columns: 如果要将文件保存到没有索引和列的csv中:

df.xs('Run').to_csv(file, index=False, header=None)

I mostly look at https://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-integer when I'm stuck with these kind of issues. 当我遇到这类问题时,我通常会查看https://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-integer

Without any testing I think you can remove rows and columns like 没有任何测试,我想您可以删除行和列,例如

df = df.drop(['rowindex'], axis=0)
df = df.drop(['colname'], axis=1)

Avoid the problem by recognizing the index columns at CSV read-time: 通过在CSV读取时识别索引列来避免此问题:

pd.read_csv(header=0, # to read in the header row as a header row, and 
... index_col=['id'] or index_col=0 to pick the index column.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM