[英]Selecting portions of rows in Pandas Dataframe based on two values
I'm trying to select a subset of rows and columns from a pandas dataframe that I'm going to eventually graph.我正在尝试 select 来自 pandas dataframe 的行和列的子集,我将最终绘制它。 My data is currently structured:我的数据目前是结构化的:
0 2 3 ... 177 178 Timestamp
1 ...
6:54:36 7/26/2019 -35.0 -34.75 ... 8.75 9.0 06:54:36
500 a 7/26/2019 3880.0 4068.00 ... 4562.00 4398.0 06:54:36
500 b 7/26/2019 3462.0 3458.00 ... 3604.00 3718.0 06:54:36
600 a 7/26/2019 NaN NaN ... NaN NaN 06:54:36
600 b 7/26/2019 NaN NaN ... NaN NaN 06:54:36
700 a 7/26/2019 3462.0 3684.00 ... 3821.00 3800.0 06:54:36
700 b 7/26/2019 4290.0 4414.00 ... 4303.00 4336.0 06:54:36
900 a 7/26/2019 2863.0 3059.00 ... 3075.00 3313.0 06:54:36
900 b 7/26/2019 4480.0 4632.00 ... 4873.00 4843.0 06:54:36
1000 a 7/26/2019 NaN NaN ... 4426.00 4751.0 06:54:36
1000 b 7/26/2019 NaN NaN ... 4388.00 4239.0 06:54:36
6:54:40 7/26/2019 -35.0 -34.75 ... 8.75 9.0 06:54:40
500 a 7/26/2019 3995.0 4056.00 ... 4571.00 4480.0 06:54:40
500 b 7/26/2019 3837.0 3974.00 ... 3720.00 3619.0 06:54:40
600 a 7/26/2019 NaN NaN ... NaN NaN 06:54:40
600 b 7/26/2019 NaN NaN ... NaN NaN 06:54:40
700 a 7/26/2019 3501.0 3468.00 ... 3897.00 3911.0 06:54:40
700 b 7/26/2019 4422.0 4331.00 ... 4737.00 4505.0 06:54:40
900 a 7/26/2019 2681.0 2749.00 ... 3375.00 3269.0 06:54:40
900 b 7/26/2019 4542.0 4602.00 ... 4505.00 4442.0 06:54:40
1000 a 7/26/2019 NaN NaN ... NaN NaN 06:54:40
1000 b 7/26/2019 NaN NaN ... NaN NaN 06:54:40
I want to plot a values and b values in columns 2-178 on two separate plots (a plot and b plot), and I want to do this for each period of time.我想在两个单独的图(a plot 和 b 图)上的第 2-178 列中的 plot a 值和 b 值,我想在每个时间段都这样做。 I'll eventually want to click through plotting each time to see the changes over time (like a plotting GUI).我最终希望每次都单击绘图以查看随时间的变化(如绘图 GUI)。 I need to pull out the selected columns based on time and index name for each set of timestamps.我需要根据每组时间戳的时间和索引名称提取选定的列。 For example, I want:例如,我想要:
a500 = [3880.0 4068.00 ... 4562.00 4398.0]
a600 = [NaN NaN ... NaN NaN]
a700 = [3462.0 3684.00 ... 3821.00 3800.0]
a900 = [2863.0 3059.00 ... 3075.00 3313.0]
a1000 = [ NaN NaN ... 4426.00 4751.0]
And I want to be able to update on button click to:我希望能够在按钮单击时更新:
a500 = [3995.0 4056.00 ... 4571.00 4480.0]
a600 = [NaN NaN ... NaN NaN]
a700 = [ 3501.0 3468.00 ... 3897.00 3911.0]
a900 = [2681.0 2749.00 ... 3375.00 3269.0]
a1000 = [ NaN NaN ... NaN NaN]
I won't know the timestamps in advance.我不会提前知道时间戳。 The structure of the rows should be consistent throughout the entire dataframe (row that starts with time and associated values, followed by alternating a and b rows, then repeat for new time value).行的结构应该在整个 dataframe 中保持一致(以时间和相关值开始的行,然后是交替的 a 和 b 行,然后重复新的时间值)。 I would like to be able to keep NaNs because these are non-zero values that I do not want to graph as zeros.我希望能够保留 NaN,因为这些是我不想将其绘制为零的非零值。
I've tried using .loc
to search for rows that start with the value that I want (eg a500=data.loc['500 a']
), but it kicks out error messages (eg KeyError: '500 a'
).我尝试使用.loc
搜索以我想要的值开头的行(例如a500=data.loc['500 a']
),但它会弹出错误消息(例如KeyError: '500 a'
)。
Tl;dr: need help selecting subsets of rows based on columns in a pandas dataframe as a step towards graphing. Tl; dr:需要帮助根据 pandas dataframe 中的列选择行子集作为图形化的一个步骤。
It took a lot of playing around, but I did manage to get .iloc
to work:花了很多时间,但我确实设法让.iloc
工作:
n=1
m=n+10
subdf=df.iloc[n:m]
newdf=subdf[subdf.columns[1:178].tolist()]
This solution works for me because I know this dataframe has repeating row labels and a defined number of columns.这个解决方案对我有用,因为我知道这个 dataframe 具有重复的行标签和定义的列数。 The n and m values are placeholders for when I eventually want to iteratively graph portions of my dataframe.当我最终想要迭代地绘制 dataframe 的部分图形时,n 和 m 值是占位符。 So, as long as the number of associated rows is constant for a value (eg I have 10 rows for every new timestamp), this solution will work.因此,只要一个值的关联行数是恒定的(例如,每个新时间戳我有 10 行),这个解决方案就可以工作。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.