简体   繁体   English

从列表/ XY散点中删除异常值

[英]Removing outliers from lists/XY scatter

I have two lists containing heart beat intervals (Y-axis, in ms; IBIs below) and their absolute timepoints (X-axis, in ms; RR_times below). 我有两个列表,其中包含心跳间隔(Y轴,以毫秒为单位;下面的IBI)及其绝对时间点(X轴,以毫秒为单位;下面的RR_times)。 There are some misreadings, such that the first list contains outliers that need to be removed, and the second one their corresponding timepoints. 存在一些误读,因此第一个列表包含需要删除的异常值,第二个列表包含它们的相应时间点。 It would be optimal if the outliers in the first list are NaN-ed so that the total time for the recording remains the same. 如果第一个列表中的离群值是NaN-ed,那将是最佳选择,以便记录的总时间保持不变。

 RR_times = [411, 827, 1241,  1653,  2066,   2481,   2894,   3308,
     3714,   4126,   4532,   4938,   5343,   5751,   6156,   6552,
     6951,   7346,   7749,   8149,   8546,   8944,   9338,   9735,
    10123,  10511,  10905,  11290,  11675,  12060,  12441,  12825,
    13205,  13581,  13960,  14342,  14717,  15087,  15462,  15829,
    16204,  16531,  16902,  17304,  17670,  18040,  18398,  18762,
    19127,  19465,  19823,  20196,  20554,  20906,  21256,  21609,
    21959,  22264,  22637,  22995,  23308,  23649,  24012,  24352,
    24687,  25026,  25390,  25681,  26014,  26347,  26680,  27330,
    27985,  28628,  28951,  29596,  29915,  30238,  30562,  31191,
    31826,  32141,  32461,  32775,  33095,  33382,  33695,  34029,
    34341,  34654,  34967,  35281,  35595,  36220,  36530,  36844,
    37150,  37462,  37775,  38084,  38395,  38703,  39014,  39324,
    39632,  39937,  40246,  40554,  40862,  41169,  41479,  41787,
    42095,  42406,  42714,  43019,  43330,  43642,  43945,  44254,
    44563,  44871,  45183,  45491,  45796,  46101,  46410,  46713,
    47327,  47632,  47937,  48244,  48555,  48867,  49177,  49488,
    49792,  50094,  50398,  50707,  50993,  51324,  51626,  51931,
    52239,  52550,  52857,  53161,  53773,  54080,  54387,  54693,
    54998,  55311,  55617,  55924,  56235,  56547,  56852,  57159,
    57470,  57781,  58091,  58400,  58709,  59020,  59331,  59644,
    59955,  60265,  60579,  60890,  61206,  61521,  61833,  62149,
    62463,  62772,  63088,  63403,  63716,  64034,  64352,  64665,
    64984,  65624,  65940,  66262,  66578,  66900,  67221,  67543,
    67861,  68179,  68504,  68819,  69145,  69459,  69782,  70111,
    70428,  70747,  71070,  71389,  71710,  72036,  72358,  72680,
    73003,  73326,  73648,  73973,  74296,  74620,  74944,  75269,
    75592,  75916,  76241,  76566,  76889,  77216,  77543,  77869,
    78191,  78518,  78843,  79165,  79496,  79823,  80148,  80479,
    80803,  81128,  81459,  81783,  82110,  82439,  82771,  83095,
    83426,  83757,  84086,  84416,  84741,  85074,  85400,  85729,
    86060,  86390,  86719,  87051,  87380,  87711,  88041,  88373,
    88705,  89029,  89365,  89698,  90023,  90356,  90690,  91019,
    91352,  91684,  92014,  92347,  92681,  93014,  93349,  93678,
    94011,  94344,  94675,  95009,  95339,  95673,  96007,  96341,
    96668,  97002,  97337,  97665,  98003,  98335,  98668,  99003,
    99339,  99673, 100007, 100346, 100684, 101017, 101357, 101693,
   102028, 102368, 102705, 103043, 103380, 103718, 104061, 104403,
   104736, 105077, 105421, 105756, 106096, 106437, 106777, 107118,
   107461, 107800, 108141, 108485, 108822, 109167, 109507, 109848,
   110196, 110538, 110884, 111230, 111571, 111918, 112263, 112606,
   112952, 113639, 113987, 114336, 114680, 115025, 115372, 115722,
   116068, 116418, 116766, 117114, 117464, 117811, 118158, 118511,
   118858, 119208, 119557, 119904, 120257, 120606, 120952, 121303,
   121655, 122003, 122354, 122707, 123057, 123408, 123760, 124114,
   124466, 124815, 125172, 125523, 125879, 126231, 126586, 126946,
   127298, 127653, 128014, 128369, 128724, 129084, 129441, 129794,
   130150, 130504, 130863, 131219, 131576, 131937, 132297, 132653,
   133012, 133375, 133731, 134091, 134455, 134813, 135174, 135534,
   135897, 136258, 136621, 136986, 137349, 137711, 138073, 138439,
   138799, 139164, 139526, 139887, 140253, 140617, 140977, 141344,
   141706, 142071, 142438, 142803, 143170, 143537, 143904, 144274,
   144641, 145011, 145382, 145749, 146124, 146493, 146864, 147235,
   147605, 147977, 148346, 148718, 149085, 149455, 149826, 150195,
   150566, 150936, 151310, 151676, 152048, 152423, 152795, 153167,
   153539, 153916, 154290, 154661, 155036, 155408, 155782, 156159,
   156530, 156905, 157280, 157655, 158029, 158404, 158783, 159157,
   159532, 159910, 160290, 160660, 161037, 161415, 161786, 162161,
   162538, 162913, 163289, 163665, 164040, 164415, 164789, 165164,
   165539, 165911, 166286, 166661, 167040, 167418, 167791, 168169,
   168545, 168922, 169300, 169676, 170053, 170429, 170811, 171195,
   171571, 171952, 172335, 172717, 173098, 173484, 173869, 174254,
   174637, 175020, 175403, 175785, 176167, 176552, 176933, 177316,
   177698, 178080, 178463, 178840, 179224, 179603, 179979, 180360,
   180739, 181114, 181492, 181870, 182248, 182626, 183001, 183378,
   183752, 184128, 184503, 184876, 185252, 185629, 186003, 186384,
   186760, 187134, 187515, 187900, 188281, 188656, 189031, 189415,
   189798, 190176, 190555, 190936, 191313, 191692, 192069, 192448,
   192824, 193203, 193578, 193953, 194330, 194707]

IBIs = [411,416,414,412,413,415, 413, 414, 406, 412, 406, 406, 405,
   408, 405, 396, 399, 395, 403, 400, 397, 398, 394, 397, 388, 388,
   394, 385, 385, 385, 381, 384, 380, 376, 379, 382, 375, 370, 375,
   367, 375, 327, 371, 402, 366, 370, 358, 364, 365, 338, 358, 373,
   358, 352, 350, 353, 350, 305, 373, 358, 313, 341, 363, 340, 335,
   339, 364, 291, 333, 333, 333, 650, 655, 643, 323, 645, 319, 323,
   324, 629, 635, 315, 320, 314, 320, 287, 313, 334, 312, 313, 313,
   314, 314, 625, 310, 314, 306, 312, 313, 309, 311, 308, 311, 310,
   308, 305, 309, 308, 308, 307, 310, 308, 308, 311, 308, 305, 311,
   312, 303, 309, 309, 308, 312, 308, 305, 305, 309, 303, 614, 305,
   305, 307, 311, 312, 310, 311, 304, 302, 304, 309, 286, 331, 302,
   305, 308, 311, 307, 304, 612, 307, 307, 306, 305, 313, 306, 307,
   311, 312, 305, 307, 311, 311, 310, 309, 309, 311, 311, 313, 311,
   310, 314, 311, 316, 315, 312, 316, 314, 309, 316, 315, 313, 318,
   318, 313, 319, 640, 316, 322, 316, 322, 321, 322, 318, 318, 325,
   315, 326, 314, 323, 329, 317, 319, 323, 319, 321, 326, 322, 322,
   323, 323, 322, 325, 323, 324, 324, 325, 323, 324, 325, 325, 323,
   327, 327, 326, 322, 327, 325, 322, 331, 327, 325, 331, 324, 325,
   331, 324, 327, 329, 332, 324, 331, 331, 329, 330, 325, 333, 326,
   329, 331, 330, 329, 332, 329, 331, 330, 332, 332, 324, 336, 333,
   325, 333, 334, 329, 333, 332, 330, 333, 334, 333, 335, 329, 333,
   333, 331, 334, 330, 334, 334, 334, 327, 334, 335, 328, 338, 332,
   333, 335, 336, 334, 334, 339, 338, 333, 340, 336, 335, 340, 337,
   338, 337, 338, 343, 342, 333, 341, 344, 335, 340, 341, 340, 341,
   343, 339, 341, 344, 337, 345, 340, 341, 348, 342, 346, 346, 341,
   347, 345, 343, 346, 687, 348, 349, 344, 345, 347, 350, 346, 350,
   348, 348, 350, 347, 347, 353, 347, 350, 349, 347, 353, 349, 346,
   351, 352, 348, 351, 353, 350, 351, 352, 354, 352, 349, 357, 351,
   356, 352, 355, 360, 352, 355, 361, 355, 355, 360, 357, 353, 356,
   354, 359, 356, 357, 361, 360, 356, 359, 363, 356, 360, 364, 358,
   361, 360, 363, 361, 363, 365, 363, 362, 362, 366, 360, 365, 362,
   361, 366, 364, 360, 367, 362, 365, 367, 365, 367, 367, 367, 370,
   367, 370, 371, 367, 375, 369, 371, 371, 370, 372, 369, 372, 367,
   370, 371, 369, 371, 370, 374, 366, 372, 375, 372, 372, 372, 377,
   374, 371, 375, 372, 374, 377, 371, 375, 375, 375, 374, 375, 379,
   374, 375, 378, 380, 370, 377, 378, 371, 375, 377, 375, 376, 376,
   375, 375, 374, 375, 375, 372, 375, 375, 379, 378, 373, 378, 376,
   377, 378, 376, 377, 376, 382, 384, 376, 381, 383, 382, 381, 386,
   385, 385, 383, 383, 383, 382, 382, 385, 381, 383, 382, 382, 383,
   377, 384, 379, 376, 381, 379, 375, 378, 378, 378, 378, 375, 377,
   374, 376, 375, 373, 376, 377, 374, 381, 376, 374, 381, 385, 381,
   375, 375, 384, 383, 378, 379, 381, 377, 379, 377, 379, 376, 379,
   375, 375, 377, 377]

Plotting the whole dataset gives: 绘制整个数据集可得出: 心率数据与时间的关系

I previously used an above:below-filter, but that does not work for longer recordings in which the trace spans across larger values (in some recordings the intervals span from 300 (during training) to 1500 (after a period of resting). 我以前使用了above:below-filter,但不适用于较长的记录,在该记录中,迹线跨越较大的值(在某些记录中,间隔从300(训练期间)到1500(经过一段休息时间)。

What is the best way to remove the outliers in this case, and how would one go about implementing it? 在这种情况下,消除异常值的最佳方法是什么?如何实施它? Moving average, exclusion based on stdev, median filter...? 移动平均,基于stdev的排除,中值过滤器...?

Here's a ugly approach that seems to work: 这是一个似乎可行的丑陋方法:

import numpy as np
RR_times = np.array([411, 827, 1241, ...])
IBIs = np.array([411, 416, 414, ...])
diffs = [np.abs(IBIs[i]-IBIs[i+1]) for i in range(len(IBIs)-1)]
IBIs_cleaned = np.full(IBIs.shape, np.nan) # create a array full of NaNs
IBIs_cleaned[0] = IBIs[0] # The first value isn't a outlier

for i in range(1, len(IBIs)):
    if np.abs(IBIs[i]-IBIs[i-1]) < np.mean(diffs) and IBIs[i] < 1.6 * np.mean(IBIs):
        IBIs_cleaned[i] = IBIs[i]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM