[英]Pandas slicing data with MultiIndex
I have some features that I want to write to some csv files. 我有一些要写入某些csv文件的功能。 I want to use pandas for this approach if possible.
如果可能的话,我想将熊猫用于这种方法。
I am following the instruction in here and have created some dummy data to check it out. 我正在按照此处的说明进行操作,并创建了一些虚拟数据以将其检出。 Basically there are some activities with a random number of features belonging to them.
基本上,有些活动具有属于它们的随机数量的功能。
import io
data = io.StringIO('''Activity,id,value,value,value,value,value,value,value,value,value
Run,1,1,2,2,5,6,4,3,2,1
Run,1,2,4,4,10,12,8,6,4,2
Stand,2,1.5,3.,3.,7.5,9.,6.,4.5,3.,1.5
Sit,3,0.5,1.,1.,2.5,3.,2.,1.5,1.,0.5
Sit,3,0.6,1.2,1.2,3.,3.6,2.4,1.8,1.2,0.6
Run, 2, 0.8, 1.6, 1.6, 4. , 4.8, 3.2, 2.4, 1.6, 0.8
''')
df_unindexed = pd.read_csv(data)
df = df_unindexed.set_index(['Activity', 'id'])
When I run: 当我跑步时:
df.xs('Run')
I get 我懂了
value value.1 value.2 value.3 value.4 value.5 value.6 value.7 \
id
1 1.0 2.0 2.0 5.0 6.0 4.0 3.0 2.0
1 2.0 4.0 4.0 10.0 12.0 8.0 6.0 4.0
2 0.8 1.6 1.6 4.0 4.8 3.2 2.4 1.6
value.8
id
1 1.0
1 2.0
2 0.8
which almost what I want, that is all run
activities. 这几乎是我想要的,那就是所有
run
活动。 I want to remove the 1st row and 1st column, ie the header and the id
column. 我想删除第一行和第一列,即标题和
id
列。 How do I achieve this? 我该如何实现?
Also a second question is when I want only one activity, how do I get it. 另外一个第二个问题是,当我只想要一项活动时,如何获得它。
When using 使用时
idx = pd.IndexSlice
df.loc[idx['Run', 1], :]
gives 给
value value.1 value.2 value.3 value.4 value.5 value.6 \
Activity id
Run 1 1.0 2.0 2.0 5.0 6.0 4.0 3.0
1 2.0 4.0 4.0 10.0 12.0 8.0 6.0
value.7 value.8
Activity id
Run 1 2.0 1.0
1 4.0 2.0
but slicing does not work as I would expect. 但切片无法像我期望的那样工作。 For example trying
例如尝试
df.loc[idx['Run', 1], 2:11]
instead produces an error: 而是产生一个错误:
TypeError: cannot do slice indexing on with these indexers [2] of 'int'>
TypeError:无法使用“ int”>的这些索引器[2]进行切片索引
So, how do I get my features in this place? 那么,如何在这个地方获得功能?
PS If not clear I am new to Pandas
so be gentle. PS:如果不清楚,我对
Pandas
并不Pandas
所以要保持温柔。 Also the column id
is editable to be unique to each activity or to whole dataset if this makes things easier etc 此外,列
id
可以编辑,以使每个活动或整个数据集都是唯一的,如果这样会使事情变得更容易等。
You can use a little hack - get columns names by positions, because iloc
for MultiIndex
is not yet supported : 您可以使用一些技巧-通过位置获取列名称,因为尚不支持
iloc
for MultiIndex
:
print (df.columns[2:11])
Index(['value.2', 'value.3', 'value.4', 'value.5', 'value.6', 'value.7',
'value.8'],
dtype='object')
idx = pd.IndexSlice
print (df.loc[idx['Run', 1], df.columns[2:11]])
value.2 value.3 value.4 value.5 value.6 value.7 value.8
Activity id
Run 1 2.0 5.0 6.0 4.0 3.0 2.0 1.0
1 4.0 10.0 12.0 8.0 6.0 4.0 2.0
If want save file to csv without index and columns: 如果要将文件保存到没有索引和列的csv中:
df.xs('Run').to_csv(file, index=False, header=None)
I mostly look at https://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-integer when I'm stuck with these kind of issues. 当我遇到这类问题时,我通常会查看https://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-integer 。
Without any testing I think you can remove rows and columns like 没有任何测试,我想您可以删除行和列,例如
df = df.drop(['rowindex'], axis=0)
df = df.drop(['colname'], axis=1)
Avoid the problem by recognizing the index columns at CSV read-time: 通过在CSV读取时识别索引列来避免此问题:
pd.read_csv(header=0, # to read in the header row as a header row, and
... index_col=['id'] or index_col=0 to pick the index column.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.