[英]Need to compare one Pandas (Python) dataframe with values from another dataframe
So I've pulled data from an sql server, and inputted into a dataframe. 因此,我从sql服务器中提取了数据,并将其输入到数据帧中。 All the data is of discrete form, and increases in a 0.1 step in one direction (0.0, 0.1, 0.2... 9.8, 9.9, 10.0), with multiple power values for each step (eg 1000, 1412, 134.5, 657.1 at 0.1), (14.5, 948.1, 343.8 at 5.5) - hopefully you see what I'm trying to say.
所有数据均为离散形式,并在一个方向上以0.1步长递增(0.0、0.1、0.2 ... 9.8、9.9、10.0),每一步具有多个功效值(例如,在1000、1412、134.5、657.1 0.1),(14.5,948.1,343.8 at 5.5)-希望您能明白我的意思。
I've managed to group the data into these individual steps using the following, and have then taken the mean and standard deviation for each group. 我设法使用以下方法将数据分组为这些单独的步骤,然后采用了每组的均值和标准差。
group = df.groupby('step').power.mean() group2 = df.groupby('step').power.std().fillna(0)
This results in two data frames (group and group2) which have the mean and standard deviation for each of the 0.1 steps. 这将导致两个数据帧(组和组2)具有0.1个步长的平均值和标准差。 It's then easy to create an upper and lower limit for each step using the following:
然后,可以使用以下步骤为每个步骤创建上限和下限:
upperlimit = group + 3*group2 lowerlimit = group - 3*group2 lowerlimit[lowerlimit<0] = 0
Now comes the bit I'm confused about! 现在有点让我感到困惑的地方! I need to go back into the original dataframe and remove rows/instances where the power value is outside these calculated limits (note there is a different upper and lower limit for each 0.1 step).
我需要回到原始数据帧,并删除幂值超出这些计算的限制的行/实例(请注意,每个0.1步都有不同的上限和下限)。
Here's 50 lines of the sample data: 这是50行示例数据:
Index Power Step
0 106.0 5.0
1 200.4 5.5
2 201.4 5.6
3 226.9 5.6
4 206.8 5.6
5 177.5 5.3
6 124.0 4.9
7 121.0 4.8
8 93.9 4.7
9 135.6 5.0
10 211.1 5.6
11 265.2 6.0
12 281.4 6.2
13 417.9 6.9
14 546.0 7.4
15 619.9 7.9
16 404.4 7.1
17 241.4 5.8
18 44.3 3.9
19 72.1 4.6
20 21.1 3.3
21 6.3 2.3
22 0.0 0.8
23 0.0 0.9
24 0.0 3.2
25 0.0 4.6
26 33.3 4.2
27 97.7 4.7
28 91.0 4.7
29 105.6 4.8
30 97.4 4.6
31 126.7 5.0
32 134.3 5.0
33 133.4 5.1
34 301.8 6.3
35 298.5 6.3
36 312.1 6.5
37 505.3 7.5
38 491.8 7.3
39 404.6 6.8
40 324.3 6.6
41 347.2 6.7
42 365.3 6.8
43 279.7 6.3
44 351.4 6.8
45 350.1 6.7
46 573.5 7.9
47 490.1 7.5
48 520.4 7.6
49 548.2 7.9
To put you goal another way, you want to perform some manipulations on grouped data, and then project the results of those manipulations back to the ungrouped rows so you can use them for filtering those rows. 换句话说,您希望对分组数据执行一些操作,然后将这些操作的结果投影回未分组的行,以便可以使用它们来过滤那些行。 One way to do this is with
transform
: 一种方法是使用
transform
:
The transform method returns an object that is indexed the same (same size) as the one being grouped.
transform方法返回一个索引的对象与被分组的对象相同(大小相同)。 Thus, the passed transform function should return a result that is the same size as the group chunk.
因此,传递的转换函数应返回与组块大小相同的结果。
You can then create the new rows directly: 然后,您可以直接创建新行:
df['upper'] = df.groupby('step').power.transform(lambda p: p.mean() + 3*p.std().fillna(0))
df['lower'] = df.groupby('step').power.transform(lambda p: p.mean() - 3*p.std().fillna(0))
df.loc[df['lower'] < 0, 'lower'] = 0
And sort accordingly: 并据此排序:
df = df[(df.power <= df.upper) & (df.power >= df.lower())]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.