简体   繁体   English

多索引切片(Pandas)挣扎

[英]Struggling with MultiIndex Slicing (Pandas)

I am organizing my data folders into multiindex dataframes with a structure similar to this: 我正在将我的数据文件夹组织成具有类似于以下内容的多索引数据帧:

In: df
Out: 
Sweep  Time       Primary  Secondary     720nm     473nm  PMTShutter                                                      
Sweep1 0.00000 -87.429810  -4.882812  0.000610  0.000305    0.000000
       0.00005 -87.445068  -4.882812  0.000610  0.001221    0.000000
       0.00010 -87.451172  -4.272460  0.000000  0.000916    0.000000
           ...        ...       ...       ...         ...  
Sweep5 0.68655 -87.261963  -4.272461  0.000305  0.000916    0.000305
       0.68660 -87.258911  -4.272461  0.000305  0.000916    0.000305
       0.68665 -87.252808  -5.493164  0.000000  0.000916    0.000305
       0.68670 -87.261963  -4.272461  0.000305  0.000916    0.000305

I am getting nowhere reading through the documentation for Pandas to try and figure out how to slice parts of this based on the two indexes though. 我无处可通,无法阅读有关Pandas的文档,以尝试弄清楚如何根据两个索引对部分内容进行切片。

For example, I figured df['Sweep1'] would return everything for Sweep1. 例如,我认为df ['Sweep1']将为Sweep1返回所有内容。 It does not, though. 但事实并非如此。 However, df.loc['Sweep1'] works how I would expect it. 但是,df.loc ['Sweep1']以我期望的方式工作。 Why is this the case? 为什么会这样呢?

I seem to be completely unable to index by the Time index. 我似乎完全无法按时间索引编制索引。 For example, a very typical part of our analysis is to average data points over a specific range of time, or to find a maximum or minimum over a specific period of time. 例如,我们分析的一个非常典型的部分是对特定时间范围内的数据点求平均,或者找到特定时间段内的最大值或最小值。 How, then, do I slice out a region of data based on a specific period of the Time index (eg Time 0.0sec through 0.5sec). 然后,如何根据时间索引的特定时间段(例如,时间0.0秒到0.5秒)切出数据区域。

I can achieve this if I know the exact number of data points in that range (ie range * sampling freq), but the point of setting the Time to one of the indexes was to get around having to do that. 如果我知道该范围内的数据点的确切数量(即范围*采样频率),则可以实现此目的,但是将Time设置为索引之一的目的是避免必须这样做。

Similarly, if I want to plot let's say Sweep1 Primary by Time - I can't seem to figure out how to use the Time index as my x-axis. 类似地,如果我想按时间绘制Sweep1 Primary,则似乎无法弄清楚如何将Time索引用作我的x轴。

So, I guess my main question is: How would I slice out data points from the different columns based on both the Sweep number and a certain sub-region of Time. 因此,我想我的主要问题是:如何根据扫描数和“时间”的某个子区域从不同的列中切出数据点。 That at least will point me in the right direction I think. 至少这将为我指明正确的方向。

Thanks 谢谢

Question 1: 问题1:

df['Sweep1'] is the correct syntax to return a column called Sweep1. df['Sweep1']是返回称为Sweep1的列的正确语法。 To slice an index (row), you would do df.ix['Sweep1'] . 要切片索引(行),可以执行df.ix['Sweep1']

Question 2: 问题2:

You'll need to create a boolean series before slicing through Time. 在对时间进行切片之前,您需要创建一个布尔序列。 I think the easiest way would be to use it as a column, like 我认为最简单的方法是将其用作列,例如

Time = df.reset_index('Time').Time
boolean = Time[(Time >= 0) & (Time < 0.5)]
result =  df.loc[('Sweep1', boolean),:]

The following should give you columns Primary and PMTShutter in the range t1-t2 of index Sweep1 . 以下应该为您提供索引Sweep1 t1-t2范围内的PrimaryPMTShutter列。 t1 and t2 are floats. t1和t2是浮点数。

idx = pandas.IndexSlice
df.loc[idx['Sweep1',t1:t2],['Primary', 'PMTShutter']]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM