简体   繁体   English

需要将一个Pandas(Python)数据框与另一个数据框的值进行比较

[英]Need to compare one Pandas (Python) dataframe with values from another dataframe

So I've pulled data from an sql server, and inputted into a dataframe. 因此,我从sql服务器中提取了数据,并将其输入到数据帧中。 All the data is of discrete form, and increases in a 0.1 step in one direction (0.0, 0.1, 0.2... 9.8, 9.9, 10.0), with multiple power values for each step (eg 1000, 1412, 134.5, 657.1 at 0.1), (14.5, 948.1, 343.8 at 5.5) - hopefully you see what I'm trying to say. 所有数据均为离散形式,并在一个方向上以0.1步长递增(0.0、0.1、0.2 ... 9.8、9.9、10.0),每一步具有多个功效值(例如,在1000、1412、134.5、657.1 0.1),(14.5,948.1,343.8 at 5.5)-希望您能明白我的意思。

I've managed to group the data into these individual steps using the following, and have then taken the mean and standard deviation for each group. 我设法使用以下方法将数据分组为这些单独的步骤,然后采用了每组的均值和标准差。

group = df.groupby('step').power.mean() group2 = df.groupby('step').power.std().fillna(0)

This results in two data frames (group and group2) which have the mean and standard deviation for each of the 0.1 steps. 这将导致两个数据帧(组和组2)具有0.1个步长的平均值和标准差。 It's then easy to create an upper and lower limit for each step using the following: 然后,可以使用以下步骤为每个步骤创建上限和下限:

upperlimit = group + 3*group2 lowerlimit = group - 3*group2 lowerlimit[lowerlimit<0] = 0

Now comes the bit I'm confused about! 现在有点让我感到困惑的地方! I need to go back into the original dataframe and remove rows/instances where the power value is outside these calculated limits (note there is a different upper and lower limit for each 0.1 step). 我需要回到原始数据帧,并删除幂值超出这些计算的限制的行/实例(请注意,每个0.1步都有不同的上限和下限)。

Here's 50 lines of the sample data: 这是50行示例数据:

Index    Power              Step
0        106.0              5.0
1        200.4              5.5
2        201.4              5.6
3        226.9              5.6
4        206.8              5.6
5        177.5              5.3
6        124.0              4.9
7        121.0              4.8
8         93.9              4.7
9        135.6              5.0
10       211.1              5.6
11       265.2              6.0
12       281.4              6.2
13       417.9              6.9
14       546.0              7.4
15       619.9              7.9
16       404.4              7.1
17       241.4              5.8
18        44.3              3.9
19        72.1              4.6
20        21.1              3.3
21         6.3              2.3
22         0.0              0.8
23         0.0              0.9
24         0.0              3.2
25         0.0              4.6
26        33.3              4.2
27        97.7              4.7
28        91.0              4.7
29       105.6              4.8
30        97.4              4.6
31       126.7              5.0
32       134.3              5.0
33       133.4              5.1
34       301.8              6.3
35       298.5              6.3
36       312.1              6.5
37       505.3              7.5
38       491.8              7.3
39       404.6              6.8
40       324.3              6.6
41       347.2              6.7
42       365.3              6.8
43       279.7              6.3
44       351.4              6.8
45       350.1              6.7
46       573.5              7.9
47       490.1              7.5
48       520.4              7.6
49       548.2              7.9

To put you goal another way, you want to perform some manipulations on grouped data, and then project the results of those manipulations back to the ungrouped rows so you can use them for filtering those rows. 换句话说,您希望对分组数据执行一些操作,然后将这些操作的结果投影回未分组的行,以便可以使用它们来过滤那些行。 One way to do this is with transform : 一种方法是使用transform

The transform method returns an object that is indexed the same (same size) as the one being grouped. transform方法返回一个索引的对象与被分组的对象相同(大小相同)。 Thus, the passed transform function should return a result that is the same size as the group chunk. 因此,传递的转换函数应返回与组块大小相同的结果。

You can then create the new rows directly: 然后,您可以直接创建新行:

df['upper'] = df.groupby('step').power.transform(lambda p: p.mean() + 3*p.std().fillna(0))
df['lower'] = df.groupby('step').power.transform(lambda p: p.mean() - 3*p.std().fillna(0))
df.loc[df['lower'] < 0, 'lower'] = 0

And sort accordingly: 并据此排序:

df = df[(df.power <= df.upper) & (df.power >= df.lower())]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 比较一个 dataframe 中的元素,并将 output 与另一个 dataframe 与 Z3A43B4F88325D2402C2 - Compare elements from one dataframe and put the output to another dataframe with pandas Python Pandas - 如何将 dataframe 的两列的值与另一个 Dataframe 的列进行比较? - Python Pandas - How to compare values from two columns of a dataframe to another Dataframe columns? Python Pandas 如何比较一个 Dataframe 中的日期与另一个 ZC699575A5E8AFD9E22A7ECC8CAB 中的日期? - Python Pandas how to compare date from one Dataframe with dates in another Dataframe? 比较 Pandas 和 Dataframe 在 Python 中的值 - Compare Values of a Pandas Dataframe in Python 用另一个数据框中的值替换一个熊猫数据框中的值 - Replacing values in one pandas dataframe with values from another dataframe 比较另一列 dataframe 中一列的值 dataframe - Compare values of one column of dataframe in another dataframe 熊猫从另一个数据帧填充一个数据帧上的空值 - Pandas fill empty values on one dataframe from another dataframe 熊猫从一个数据框检索值并在另一数据框上进行计算 - pandas retrieve values from one dataframe and do calculation on another dataframe 从另一个数据帧中减去一个Pandas Dataframe中的属性值 - Subtracting values of attributes within one Pandas Dataframe from another dataframe 如何使用熊猫中另一个数据框的值更新一个数据框 - How to update one dataframe using values from another dataframe in pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM