简体   繁体   中英

Python: How to plot outliers values obtained from scatter plot in a time series graph?

I am doing anomaly detection and for the same, I am using Isolation Forest method.

My data:

My dataframe for the task: lineplot is the name of the df

ContextID   BacksGas_Flow_sccm  StepID  Time_ms iso_forest
427 7290057 1.7578125   1   09:20:15.273    1
428 7290057 1.7578125   1   09:20:15.513    1
429 7290057 1.953125    2   09:20:15.744    1
430 7290057 1.85546875  2   09:20:16.814    1
431 7290057 1.7578125   2   09:20:17.833    1
432 7290057 1.7578125   2   09:20:18.852    1
433 7290057 1.7578125   2   09:20:19.872    1
434 7290057 1.7578125   2   09:20:20.892    1
435 7290057 1.7578125   2   09:20:22.42     1
436 7290057 16.9921875  5   09:20:23.82    -1
437 7290057 46.19140625 5   09:20:24.102    -1
438 7290057 46.19140625 5   09:20:25.122    -1
439 7290057 46.6796875  5   09:20:26.142    1
440 7290057 46.6796875  5   09:20:27.162    1
441 7290057 46.6796875  5   09:20:28.181    1
442 7290057 46.6796875  5   09:20:29.232    1
443 7290057 46.6796875  5   09:20:30.361    1
444 7290057 46.6796875  5   09:20:31.381    1
445 7290057 46.6796875  5   09:20:32.401    1
446 7290057 46.6796875  5   09:20:33.431    1
447 7290057 46.6796875  5   09:20:34.545    1
448 7290057 46.6796875  5   09:20:34.761    1
449 7290057 46.6796875  5   09:20:34.972    1
450 7290057 46.6796875  5   09:20:36.50     1
451 7290057 46.6796875  5   09:20:37.120    1
452 7290057 46.6796875  7   09:20:38.171    1
453 7290057 46.6796875  7   09:20:39.261    1
454 7290057 46.6796875  7   09:20:40.280    1
455 7290057 46.6796875  12  09:20:41.429    1
456 7290057 46.6796875  12  09:20:42.449    1
457 7290057 46.6796875  12  09:20:43.469    1
458 7290057 46.6796875  12  09:20:44.499    1
459 7290057 46.6796875  12  09:20:45.559    1
460 7290057 46.6796875  12  09:20:45.689    1
461 7290057 47.16796875 12  09:20:46.710    -1
462 7290057 46.6796875  12  09:20:47.749    1
463 7290057 46.6796875  15  09:20:48.868    1
464 7290057 46.6796875  15  09:20:49.889    1
465 7290057 46.6796875  16  09:20:50.910    1
466 7290057 46.6796875  16  09:20:51.938    1
467 7290057 24.21875    19  09:20:52.999    -1
468 7290057 38.76953125 19  09:20:54.27     -1
469 7290057 80.46875    19  09:20:55.68     -1
470 7290057 72.75390625 19  09:20:56.128    1
471 7290057 59.5703125  19  09:20:57.247    -1
472 7290057 63.671875   19  09:20:58.278    -1
473 7290057 70.5078125  19  09:20:59.308    -1
474 7290057 71.875  19  09:21:00.337         1
475 7290057 69.82421875 19  09:21:01.358    -1
476 7290057 69.23828125 19  09:21:02.408    -1
477 7290057 69.23828125 19  09:21:03.548    -1
478 7290057 72.4609375  19  09:21:04.597    1
479 7290057 73.4375 19  09:21:05.615        1
480 7290057 73.4375 19  09:21:06.647        1
481 7290057 73.4375 19  09:21:07.675        1
482 7290057 73.4375 19  09:21:08.697        1
483 7290057 73.4375 19  09:21:09.727        1
484 7290057 74.21875    19  09:21:10.796    1
485 7290057 75.1953125  19  09:21:11.827    1
486 7290057 75.1953125  19  09:21:12.846    1
487 7290057 75.1953125  19  09:21:13.865    1
488 7290057 75.1953125  19  09:21:14.886    1
489 7290057 75.1953125  19  09:21:15.907    1
490 7290057 75.9765625  19  09:21:16.936    1
491 7290057 75.9765625  19  09:21:17.975    1
492 7290057 75.9765625  19  09:21:18.997    1
493 7290057 75.9765625  19  09:21:20.27     1
494 7290057 75.9765625  19  09:21:21.55     1
495 7290057 75.9765625  19  09:21:22.75     1
496 7290057 75.9765625  19  09:21:23.95     1
497 7290057 76.85546875 19  09:21:24.204    1
498 7290057 76.85546875 19  09:21:25.225    1
499 7290057 76.85546875 19  09:21:25.957    1
500 7290057 76.85546875 19  09:21:26.984    1
501 7290057 75.9765625  19  09:21:27.995    1
502 7290057 75.9765625  19  09:21:29.2      1
503 7290057 76.7578125  19  09:21:30.13     1
504 7290057 76.7578125  19  09:21:31.33     1
505 7290057 76.7578125  19  09:21:32.59     1
506 7290057 76.7578125  19  09:21:33.142    1
507 7290057 76.7578125  19  09:21:34.153    1
508 7290057 75.87890625 19  09:21:34.986    1
509 7290057 75.87890625 19  09:21:35.131    1
510 7290057 75.87890625 19  09:21:35.272    1
511 7290057 75.87890625 19  09:21:35.451    1
512 7290057 76.7578125  19  09:21:36.524    1
513 7290057 76.7578125  19  09:21:37.651    1
514 7290057 76.7578125  19  09:21:38.695    1
515 7290057 76.7578125  19  09:21:39.724    1
516 7290057 76.7578125  19  09:21:40.760    1
517 7290057 76.7578125  19  09:21:41.783    1
518 7290057 76.7578125  19  09:21:42.802    1
519 7290057 76.7578125  19  09:21:43.822    1
520 7290057 76.7578125  19  09:21:44.862    1
521 7290057 76.7578125  19  09:21:45.884    1
522 7290057 76.7578125  19  09:21:46.912    1
523 7290057 76.7578125  19  09:21:47.933    1
524 7290057 76.7578125  19  09:21:48.952    1
525 7290057 76.7578125  19  09:21:49.972    1
526 7290057 76.7578125  19  09:21:51.72     1
527 7290057 77.5390625  19  09:21:52.290    1
528 7290057 77.5390625  19  09:21:52.92     1
529 7290057 77.5390625  19  09:21:53.361    1
530 7290057 77.5390625  19  09:21:54.435    1
531 7290057 76.66015625 19  09:21:55.602    1
532 7290057 76.66015625 19  09:21:56.621    1
533 7290057 72.94921875 22  09:21:57.652    1
534 7290057 3.90625 24  09:21:58.749        -1
535 7290057 2.5390625   24  09:21:59.801    -1
536 7290057 2.1484375   24  09:22:00.882    1
537 7290057 2.05078125  24  09:22:01.259    1
538 7290057 2.1484375   24  09:22:01.53     1
539 7290057 1.953125    24  09:22:02.281    1
540 7290057 1.953125    24  09:22:03.311    1
541 7290057 2.1484375   24  09:22:04.331    1
542 7290057 2.1484375   24  09:22:05.351    1
543 7290057 1.953125    24  09:22:06.432    1
544 7290057 1.85546875  24  09:22:07.519    1
545 7290057 1.7578125   24  09:22:08.549    1
546 7290057 1.85546875  24  09:22:09.710    1
547 7290057 1.7578125   24  09:22:10.738    1
548 7290057 1.85546875  24  09:22:11.798    1
549 7290057 1.953125    24  09:22:12.820    1
550 7290057 1.85546875  1   09:22:13.610    1
551 7290057 1.85546875  1   09:22:14.629    1
552 7290057 1.953125    1   09:22:15.649    1
553 7290057 1.85546875  2   09:22:16.679    1
554 7290057 1.85546875  2   09:22:17.709    1
555 7290057 1.85546875  2   09:22:18.729    1
556 7290057 1.953125    2   09:22:19.748    1
557 7290057 1.85546875  2   09:22:20.768    1
558 7290057 1.7578125   3   09:22:21.788    1
559 7290057 1.7578125   3   09:22:22.808    1
560 7290057 1.85546875  3   09:22:23.829    1
561 7290057 1.953125    3   09:22:24.848    1
562 7290057 1.85546875  3   09:22:25.898    1
563 7290057 1.953125    3   09:22:27.39     1
564 7290057 1.953125    3   09:22:28.66     1
565 7290057 1.7578125   3   09:22:29.87     1
566 7290057 1.85546875  3   09:22:30.108    1
567 7290057 1.7578125   3   09:22:31.129    1
568 7290057 1.953125    3   09:22:32.147    1
569 7290057 1.85546875  3   09:22:33.187    1

My code:

x_axis = lineplot.values[:,3]
y_axis = lineplot.values[:,1]
plt.figure(1)
plt.plot(x_axis, y_axis)

This gives me a plot as follows: 在此处输入图片说明

and then I implemented the Isolated Forest:

from sklearn.ensemble import IsolationForest
n_estimators = 50
iso_forest = IsolationForest(behaviour='new', n_estimators = n_estimators, max_samples = 'auto')
lineplot['iso_forest'] = iso_forest.fit_predict(lineplot.values[:,[1]])
plt.figure(2)
plt.scatter(lineplot.values[lineplot['iso_forest'] == 1, 2], lineplot.values[lineplot['iso_forest'] == 1, 1], c = 'green', label = 'Normal')
plt.scatter(lineplot.values[lineplot['iso_forest'] == -1, 2], lineplot.values[lineplot['iso_forest'] == -1, 1], c = 'red', label = 'Outlier')

and I get the following scatter plot:

在此处输入图片说明

What I would like to achieve now is the values at which the red points that are there on the scatter plot, must be pointed out on the first graph as red dots something like this: (this graph is just an example as to what I want to do)

在此处输入图片说明

Is it possible to achieve something like this?

Thanks

You can do it as follows:

You can club both your plots and then make them both have the same x-axis

if you try this:

plt.figure(2)
plt.plot(x_axis, y_axis)
plt.scatter(lineplot.values[lineplot['iso_forest'] == 1, 3], lineplot.values[lineplot['iso_forest'] == 1, 1], c = 'green', label = 'Normal')
plt.scatter(lineplot.values[lineplot['iso_forest'] == -1, 3], lineplot.values[lineplot['iso_forest'] == -1, 1], c = 'red', label = 'Outlier')

You'll get what you need

I am working on the same thing but with different algorithm. Thanks, dude!

my data schemas are:

CHECK_TIME  DEVICE  METRIC  % Used Space-VALUE  ANOMALY ANOMALY_SCORE

This works for me:

plt.figure(2)
plt.plot(anomaly_data.values[:,0], anomaly_data.values[:,3])
plt.scatter(anomaly_data.values[anomaly_data['ANOMALY'] == 'False', 0], 
anomaly_data.values[anomaly_data['ANOMALY'] == 'False', 3], c = 'green', 
label = 'Normal')
plt.scatter(anomaly_data.values[anomaly_data['ANOMALY'] == 'True', 0], 
anomaly_data.values[anomaly_data['ANOMALY'] == 'True', 3], c = 'red', label = 'Outlier')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM