简体   繁体   中英

Removing outliers from lists/XY scatter

I have two lists containing heart beat intervals (Y-axis, in ms; IBIs below) and their absolute timepoints (X-axis, in ms; RR_times below). There are some misreadings, such that the first list contains outliers that need to be removed, and the second one their corresponding timepoints. It would be optimal if the outliers in the first list are NaN-ed so that the total time for the recording remains the same.

 RR_times = [411, 827, 1241,  1653,  2066,   2481,   2894,   3308,
     3714,   4126,   4532,   4938,   5343,   5751,   6156,   6552,
     6951,   7346,   7749,   8149,   8546,   8944,   9338,   9735,
    10123,  10511,  10905,  11290,  11675,  12060,  12441,  12825,
    13205,  13581,  13960,  14342,  14717,  15087,  15462,  15829,
    16204,  16531,  16902,  17304,  17670,  18040,  18398,  18762,
    19127,  19465,  19823,  20196,  20554,  20906,  21256,  21609,
    21959,  22264,  22637,  22995,  23308,  23649,  24012,  24352,
    24687,  25026,  25390,  25681,  26014,  26347,  26680,  27330,
    27985,  28628,  28951,  29596,  29915,  30238,  30562,  31191,
    31826,  32141,  32461,  32775,  33095,  33382,  33695,  34029,
    34341,  34654,  34967,  35281,  35595,  36220,  36530,  36844,
    37150,  37462,  37775,  38084,  38395,  38703,  39014,  39324,
    39632,  39937,  40246,  40554,  40862,  41169,  41479,  41787,
    42095,  42406,  42714,  43019,  43330,  43642,  43945,  44254,
    44563,  44871,  45183,  45491,  45796,  46101,  46410,  46713,
    47327,  47632,  47937,  48244,  48555,  48867,  49177,  49488,
    49792,  50094,  50398,  50707,  50993,  51324,  51626,  51931,
    52239,  52550,  52857,  53161,  53773,  54080,  54387,  54693,
    54998,  55311,  55617,  55924,  56235,  56547,  56852,  57159,
    57470,  57781,  58091,  58400,  58709,  59020,  59331,  59644,
    59955,  60265,  60579,  60890,  61206,  61521,  61833,  62149,
    62463,  62772,  63088,  63403,  63716,  64034,  64352,  64665,
    64984,  65624,  65940,  66262,  66578,  66900,  67221,  67543,
    67861,  68179,  68504,  68819,  69145,  69459,  69782,  70111,
    70428,  70747,  71070,  71389,  71710,  72036,  72358,  72680,
    73003,  73326,  73648,  73973,  74296,  74620,  74944,  75269,
    75592,  75916,  76241,  76566,  76889,  77216,  77543,  77869,
    78191,  78518,  78843,  79165,  79496,  79823,  80148,  80479,
    80803,  81128,  81459,  81783,  82110,  82439,  82771,  83095,
    83426,  83757,  84086,  84416,  84741,  85074,  85400,  85729,
    86060,  86390,  86719,  87051,  87380,  87711,  88041,  88373,
    88705,  89029,  89365,  89698,  90023,  90356,  90690,  91019,
    91352,  91684,  92014,  92347,  92681,  93014,  93349,  93678,
    94011,  94344,  94675,  95009,  95339,  95673,  96007,  96341,
    96668,  97002,  97337,  97665,  98003,  98335,  98668,  99003,
    99339,  99673, 100007, 100346, 100684, 101017, 101357, 101693,
   102028, 102368, 102705, 103043, 103380, 103718, 104061, 104403,
   104736, 105077, 105421, 105756, 106096, 106437, 106777, 107118,
   107461, 107800, 108141, 108485, 108822, 109167, 109507, 109848,
   110196, 110538, 110884, 111230, 111571, 111918, 112263, 112606,
   112952, 113639, 113987, 114336, 114680, 115025, 115372, 115722,
   116068, 116418, 116766, 117114, 117464, 117811, 118158, 118511,
   118858, 119208, 119557, 119904, 120257, 120606, 120952, 121303,
   121655, 122003, 122354, 122707, 123057, 123408, 123760, 124114,
   124466, 124815, 125172, 125523, 125879, 126231, 126586, 126946,
   127298, 127653, 128014, 128369, 128724, 129084, 129441, 129794,
   130150, 130504, 130863, 131219, 131576, 131937, 132297, 132653,
   133012, 133375, 133731, 134091, 134455, 134813, 135174, 135534,
   135897, 136258, 136621, 136986, 137349, 137711, 138073, 138439,
   138799, 139164, 139526, 139887, 140253, 140617, 140977, 141344,
   141706, 142071, 142438, 142803, 143170, 143537, 143904, 144274,
   144641, 145011, 145382, 145749, 146124, 146493, 146864, 147235,
   147605, 147977, 148346, 148718, 149085, 149455, 149826, 150195,
   150566, 150936, 151310, 151676, 152048, 152423, 152795, 153167,
   153539, 153916, 154290, 154661, 155036, 155408, 155782, 156159,
   156530, 156905, 157280, 157655, 158029, 158404, 158783, 159157,
   159532, 159910, 160290, 160660, 161037, 161415, 161786, 162161,
   162538, 162913, 163289, 163665, 164040, 164415, 164789, 165164,
   165539, 165911, 166286, 166661, 167040, 167418, 167791, 168169,
   168545, 168922, 169300, 169676, 170053, 170429, 170811, 171195,
   171571, 171952, 172335, 172717, 173098, 173484, 173869, 174254,
   174637, 175020, 175403, 175785, 176167, 176552, 176933, 177316,
   177698, 178080, 178463, 178840, 179224, 179603, 179979, 180360,
   180739, 181114, 181492, 181870, 182248, 182626, 183001, 183378,
   183752, 184128, 184503, 184876, 185252, 185629, 186003, 186384,
   186760, 187134, 187515, 187900, 188281, 188656, 189031, 189415,
   189798, 190176, 190555, 190936, 191313, 191692, 192069, 192448,
   192824, 193203, 193578, 193953, 194330, 194707]

IBIs = [411,416,414,412,413,415, 413, 414, 406, 412, 406, 406, 405,
   408, 405, 396, 399, 395, 403, 400, 397, 398, 394, 397, 388, 388,
   394, 385, 385, 385, 381, 384, 380, 376, 379, 382, 375, 370, 375,
   367, 375, 327, 371, 402, 366, 370, 358, 364, 365, 338, 358, 373,
   358, 352, 350, 353, 350, 305, 373, 358, 313, 341, 363, 340, 335,
   339, 364, 291, 333, 333, 333, 650, 655, 643, 323, 645, 319, 323,
   324, 629, 635, 315, 320, 314, 320, 287, 313, 334, 312, 313, 313,
   314, 314, 625, 310, 314, 306, 312, 313, 309, 311, 308, 311, 310,
   308, 305, 309, 308, 308, 307, 310, 308, 308, 311, 308, 305, 311,
   312, 303, 309, 309, 308, 312, 308, 305, 305, 309, 303, 614, 305,
   305, 307, 311, 312, 310, 311, 304, 302, 304, 309, 286, 331, 302,
   305, 308, 311, 307, 304, 612, 307, 307, 306, 305, 313, 306, 307,
   311, 312, 305, 307, 311, 311, 310, 309, 309, 311, 311, 313, 311,
   310, 314, 311, 316, 315, 312, 316, 314, 309, 316, 315, 313, 318,
   318, 313, 319, 640, 316, 322, 316, 322, 321, 322, 318, 318, 325,
   315, 326, 314, 323, 329, 317, 319, 323, 319, 321, 326, 322, 322,
   323, 323, 322, 325, 323, 324, 324, 325, 323, 324, 325, 325, 323,
   327, 327, 326, 322, 327, 325, 322, 331, 327, 325, 331, 324, 325,
   331, 324, 327, 329, 332, 324, 331, 331, 329, 330, 325, 333, 326,
   329, 331, 330, 329, 332, 329, 331, 330, 332, 332, 324, 336, 333,
   325, 333, 334, 329, 333, 332, 330, 333, 334, 333, 335, 329, 333,
   333, 331, 334, 330, 334, 334, 334, 327, 334, 335, 328, 338, 332,
   333, 335, 336, 334, 334, 339, 338, 333, 340, 336, 335, 340, 337,
   338, 337, 338, 343, 342, 333, 341, 344, 335, 340, 341, 340, 341,
   343, 339, 341, 344, 337, 345, 340, 341, 348, 342, 346, 346, 341,
   347, 345, 343, 346, 687, 348, 349, 344, 345, 347, 350, 346, 350,
   348, 348, 350, 347, 347, 353, 347, 350, 349, 347, 353, 349, 346,
   351, 352, 348, 351, 353, 350, 351, 352, 354, 352, 349, 357, 351,
   356, 352, 355, 360, 352, 355, 361, 355, 355, 360, 357, 353, 356,
   354, 359, 356, 357, 361, 360, 356, 359, 363, 356, 360, 364, 358,
   361, 360, 363, 361, 363, 365, 363, 362, 362, 366, 360, 365, 362,
   361, 366, 364, 360, 367, 362, 365, 367, 365, 367, 367, 367, 370,
   367, 370, 371, 367, 375, 369, 371, 371, 370, 372, 369, 372, 367,
   370, 371, 369, 371, 370, 374, 366, 372, 375, 372, 372, 372, 377,
   374, 371, 375, 372, 374, 377, 371, 375, 375, 375, 374, 375, 379,
   374, 375, 378, 380, 370, 377, 378, 371, 375, 377, 375, 376, 376,
   375, 375, 374, 375, 375, 372, 375, 375, 379, 378, 373, 378, 376,
   377, 378, 376, 377, 376, 382, 384, 376, 381, 383, 382, 381, 386,
   385, 385, 383, 383, 383, 382, 382, 385, 381, 383, 382, 382, 383,
   377, 384, 379, 376, 381, 379, 375, 378, 378, 378, 378, 375, 377,
   374, 376, 375, 373, 376, 377, 374, 381, 376, 374, 381, 385, 381,
   375, 375, 384, 383, 378, 379, 381, 377, 379, 377, 379, 376, 379,
   375, 375, 377, 377]

Plotting the whole dataset gives: 心率数据与时间的关系

I previously used an above:below-filter, but that does not work for longer recordings in which the trace spans across larger values (in some recordings the intervals span from 300 (during training) to 1500 (after a period of resting).

What is the best way to remove the outliers in this case, and how would one go about implementing it? Moving average, exclusion based on stdev, median filter...?

Here's a ugly approach that seems to work:

import numpy as np
RR_times = np.array([411, 827, 1241, ...])
IBIs = np.array([411, 416, 414, ...])
diffs = [np.abs(IBIs[i]-IBIs[i+1]) for i in range(len(IBIs)-1)]
IBIs_cleaned = np.full(IBIs.shape, np.nan) # create a array full of NaNs
IBIs_cleaned[0] = IBIs[0] # The first value isn't a outlier

for i in range(1, len(IBIs)):
    if np.abs(IBIs[i]-IBIs[i-1]) < np.mean(diffs) and IBIs[i] < 1.6 * np.mean(IBIs):
        IBIs_cleaned[i] = IBIs[i]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM