简体   繁体   English

根据两个值选择 Pandas Dataframe 中的部分行

[英]Selecting portions of rows in Pandas Dataframe based on two values

I'm trying to select a subset of rows and columns from a pandas dataframe that I'm going to eventually graph.我正在尝试 select 来自 pandas dataframe 的行和列的子集,我将最终绘制它。 My data is currently structured:我的数据目前是结构化的:

                  0       2        3  ...      177     178  Timestamp
1                                     ...                            
6:54:36   7/26/2019   -35.0   -34.75  ...     8.75     9.0   06:54:36  
 500 a  7/26/2019  3880.0  4068.00  ...  4562.00  4398.0   06:54:36
 500 b  7/26/2019  3462.0  3458.00  ...  3604.00  3718.0   06:54:36
 600 a  7/26/2019     NaN      NaN  ...      NaN     NaN   06:54:36
 600 b  7/26/2019     NaN      NaN  ...      NaN     NaN   06:54:36
 700 a  7/26/2019  3462.0  3684.00  ...  3821.00  3800.0   06:54:36
 700 b  7/26/2019  4290.0  4414.00  ...  4303.00  4336.0   06:54:36
 900 a  7/26/2019  2863.0  3059.00  ...  3075.00  3313.0   06:54:36
 900 b  7/26/2019  4480.0  4632.00  ...  4873.00  4843.0   06:54:36
1000 a  7/26/2019     NaN      NaN  ...  4426.00  4751.0   06:54:36
1000 b  7/26/2019     NaN      NaN  ...  4388.00  4239.0   06:54:36
6:54:40   7/26/2019   -35.0   -34.75  ...     8.75     9.0   06:54:40
 500 a  7/26/2019  3995.0  4056.00  ...  4571.00  4480.0   06:54:40
 500 b  7/26/2019  3837.0  3974.00  ...  3720.00  3619.0   06:54:40
 600 a  7/26/2019     NaN      NaN  ...      NaN     NaN   06:54:40
 600 b  7/26/2019     NaN      NaN  ...      NaN     NaN   06:54:40
 700 a  7/26/2019  3501.0  3468.00  ...  3897.00  3911.0   06:54:40
 700 b  7/26/2019  4422.0  4331.00  ...  4737.00  4505.0   06:54:40
 900 a  7/26/2019  2681.0  2749.00  ...  3375.00  3269.0   06:54:40
 900 b  7/26/2019  4542.0  4602.00  ...  4505.00  4442.0   06:54:40
1000 a  7/26/2019     NaN      NaN  ...      NaN     NaN   06:54:40
1000 b  7/26/2019     NaN      NaN  ...      NaN     NaN   06:54:40

I want to plot a values and b values in columns 2-178 on two separate plots (a plot and b plot), and I want to do this for each period of time.我想在两个单独的图(a plot 和 b 图)上的第 2-178 列中的 plot a 值和 b 值,我想在每个时间段都这样做。 I'll eventually want to click through plotting each time to see the changes over time (like a plotting GUI).我最终希望每次都单击绘图以查看随时间的变化(如绘图 GUI)。 I need to pull out the selected columns based on time and index name for each set of timestamps.我需要根据每组时间戳的时间和索引名称提取选定的列。 For example, I want:例如,我想要:

a500 = [3880.0  4068.00  ...  4562.00  4398.0]
a600 =  [NaN      NaN  ...      NaN     NaN]
a700 = [3462.0  3684.00  ...  3821.00  3800.0]
a900 = [2863.0  3059.00  ...  3075.00  3313.0]
a1000 = [ NaN      NaN  ...  4426.00  4751.0]

And I want to be able to update on button click to:我希望能够在按钮单击时更新:

a500 = [3995.0  4056.00  ...  4571.00  4480.0]
a600 =  [NaN      NaN  ...      NaN     NaN]
a700 = [ 3501.0  3468.00  ...  3897.00  3911.0]
a900 = [2681.0  2749.00  ...  3375.00  3269.0]
a1000 = [ NaN      NaN  ...      NaN     NaN]

I won't know the timestamps in advance.我不会提前知道时间戳。 The structure of the rows should be consistent throughout the entire dataframe (row that starts with time and associated values, followed by alternating a and b rows, then repeat for new time value).行的结构应该在整个 dataframe 中保持一致(以时间和相关值开始的行,然后是交替的 a 和 b 行,然后重复新的时间值)。 I would like to be able to keep NaNs because these are non-zero values that I do not want to graph as zeros.我希望能够保留 NaN,因为这些是我不想将其绘制为零的非零值。

I've tried using .loc to search for rows that start with the value that I want (eg a500=data.loc['500 a'] ), but it kicks out error messages (eg KeyError: '500 a' ).我尝试使用.loc搜索以我想要的值开头的行(例如a500=data.loc['500 a'] ),但它会弹出错误消息(例如KeyError: '500 a' )。

Tl;dr: need help selecting subsets of rows based on columns in a pandas dataframe as a step towards graphing. Tl; dr:需要帮助根据 pandas dataframe 中的列选择行子集作为图形化的一个步骤。

It took a lot of playing around, but I did manage to get .iloc to work:花了很多时间,但我确实设法让.iloc工作:

n=1
m=n+10
subdf=df.iloc[n:m]
newdf=subdf[subdf.columns[1:178].tolist()]

This solution works for me because I know this dataframe has repeating row labels and a defined number of columns.这个解决方案对我有用,因为我知道这个 dataframe 具有重复的行标签和定义的列数。 The n and m values are placeholders for when I eventually want to iteratively graph portions of my dataframe.当我最终想要迭代地绘制 dataframe 的部分图形时,n 和 m 值是占位符。 So, as long as the number of associated rows is constant for a value (eg I have 10 rows for every new timestamp), this solution will work.因此,只要一个值的关联行数是恒定的(例如,每个新时间戳我有 10 行),这个解决方案就可以工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM