简体   繁体   中英

Need to compare one Pandas (Python) dataframe with values from another dataframe

So I've pulled data from an sql server, and inputted into a dataframe. All the data is of discrete form, and increases in a 0.1 step in one direction (0.0, 0.1, 0.2... 9.8, 9.9, 10.0), with multiple power values for each step (eg 1000, 1412, 134.5, 657.1 at 0.1), (14.5, 948.1, 343.8 at 5.5) - hopefully you see what I'm trying to say.

I've managed to group the data into these individual steps using the following, and have then taken the mean and standard deviation for each group.

group = df.groupby('step').power.mean() group2 = df.groupby('step').power.std().fillna(0)

This results in two data frames (group and group2) which have the mean and standard deviation for each of the 0.1 steps. It's then easy to create an upper and lower limit for each step using the following:

upperlimit = group + 3*group2 lowerlimit = group - 3*group2 lowerlimit[lowerlimit<0] = 0

Now comes the bit I'm confused about! I need to go back into the original dataframe and remove rows/instances where the power value is outside these calculated limits (note there is a different upper and lower limit for each 0.1 step).

Here's 50 lines of the sample data:

Index    Power              Step
0        106.0              5.0
1        200.4              5.5
2        201.4              5.6
3        226.9              5.6
4        206.8              5.6
5        177.5              5.3
6        124.0              4.9
7        121.0              4.8
8         93.9              4.7
9        135.6              5.0
10       211.1              5.6
11       265.2              6.0
12       281.4              6.2
13       417.9              6.9
14       546.0              7.4
15       619.9              7.9
16       404.4              7.1
17       241.4              5.8
18        44.3              3.9
19        72.1              4.6
20        21.1              3.3
21         6.3              2.3
22         0.0              0.8
23         0.0              0.9
24         0.0              3.2
25         0.0              4.6
26        33.3              4.2
27        97.7              4.7
28        91.0              4.7
29       105.6              4.8
30        97.4              4.6
31       126.7              5.0
32       134.3              5.0
33       133.4              5.1
34       301.8              6.3
35       298.5              6.3
36       312.1              6.5
37       505.3              7.5
38       491.8              7.3
39       404.6              6.8
40       324.3              6.6
41       347.2              6.7
42       365.3              6.8
43       279.7              6.3
44       351.4              6.8
45       350.1              6.7
46       573.5              7.9
47       490.1              7.5
48       520.4              7.6
49       548.2              7.9

To put you goal another way, you want to perform some manipulations on grouped data, and then project the results of those manipulations back to the ungrouped rows so you can use them for filtering those rows. One way to do this is with transform :

The transform method returns an object that is indexed the same (same size) as the one being grouped. Thus, the passed transform function should return a result that is the same size as the group chunk.

You can then create the new rows directly:

df['upper'] = df.groupby('step').power.transform(lambda p: p.mean() + 3*p.std().fillna(0))
df['lower'] = df.groupby('step').power.transform(lambda p: p.mean() - 3*p.std().fillna(0))
df.loc[df['lower'] < 0, 'lower'] = 0

And sort accordingly:

df = df[(df.power <= df.upper) & (df.power >= df.lower())]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM