Could you please help me to understand logic using for loop on dataframe?

Question

I have a dataframe like below;

groups.head()

   reconstruction_error    true_class
183484   0.290310            0
255448   0.225183            0
244749   1.047568            0
63919    0.541144            0
11475    0.694220            0
83053    3.471817            1
221041   5.833817            1 
6717     13.987760           1
263080   0.809129            1
231978   4.028153            1

Below code is plotting index versus reconsturction error and changing marker color if true_class is 0 or 1. I really dont understand how indicator 'name' matching true_class columns? Maybe it represents reconsturction_error? And also same for indicator 'group'. Could you please explain structure here?

Graphic here

threshold = 6.0

groups = error_df.groupby('true_class')
fig, ax = plt.subplots(figsize=(12, 8))

for name, group in groups:
    ax.plot(group.index, group.reconstruction_error, marker='o', ms=2.0, linestyle='',
            label = "Fraud" if name == 1 else "Normal",
            color = "red" if name == 1 else "blue")
ax.hlines(threshold, ax.get_xlim()[0], ax.get_xlim()[1], colors="green", zorder=100, label='Threshold')
ax.legend()
plt.title("Reconstruction error for different classes")
plt.ylabel("Reconstruction error")
plt.xlabel("Data point index")
plt.show();

Answer 1

Note that you performed grouping on true_class :

groups = error_df.groupby('true_class')

Then, to see how does your loop work, run:

for name, group in groups:
    print(name)
    print(group)

The result is:

0
        reconstruction_error  true_class
183484              0.290310           0
255448              0.225183           0
244749              1.047568           0
63919               0.541144           0
11475               0.694220           0
1
        reconstruction_error  true_class
83053               3.471817           1
221041              5.833817           1
6717               13.987760           1
263080              0.809129           1
231978              4.028153           1

You can see that true_class is saved just as name , because for... in groups: retrieves tuples containing of the grouping key and the group itself.

So your loop performs actually 2 plots:

one for group with grouping key ( name ) == 0 ,
and the second for group with grouping key ( name ) == 1 .

In each case:

label is set to Normal or Fraud ,
color is set to blue or red .

Could you please help me to understand logic using for loop on dataframe?

Question

1 answers

solution1
3 ACCPTED 2020-08-07 17:49:16

Could you please help me to understand logic using for loop on dataframe?

Question

1 answers

solution1 3 ACCPTED 2020-08-07 17:49:16

solution1
3 ACCPTED 2020-08-07 17:49:16