Applying multiple masks to arrays

Question

I'm explaining what I'm actually hoping to do in case there's a higher level suggestion that obviates the question entirely.

I have scientific data that I store in three arrays: wave , flux , error . These stand for wavelength, flux, and error values. The arrays are about 4000 elements long (and the index number of the arrays corresponds to the pixel number of the detector).

There are various tests that I do, but for this example let's just say I do 2 tests where I need to effectively mask out the associated arrays.

masks = []
masks.append(wave > 5500.35)
masks.append(flux / wave > 8.5)

Subquestion : I can easily do the 2-mask case like:

fullmask = [x[0] and x[1] for x in zip(masks[0], masks[1])]

but what's the way to do it for arbitrary numbers of masks?

Real question : Is there a way to apply all masks to each of the arrays ( wave , flux , error ), and keep the original index numbers? By "keep the original index numbers" I mean that I could, in principle, take the average pixel number of the masked wave array (the original index numbers)? That is: if wave[98:99] were the only parts not masked, the average pixel would be 98.5.

Meta question: is this the best way to be doing any of this stuff?

EDIT

So here's some sample data to play around with.

wave = array([5000, 5001, 5002, 5003, 5004, 5005, 5006, 5007, 5008, 5009, 5010,
   5011, 5012, 5013, 5014, 5015, 5016, 5017, 5018, 5019, 5020, 5021,
   5022, 5023, 5024, 5025, 5026, 5027, 5028, 5029, 5030, 5031, 5032,
   5033, 5034, 5035, 5036, 5037, 5038, 5039, 5040, 5041, 5042, 5043,
   5044, 5045, 5046, 5047, 5048, 5049, 5050, 5051, 5052, 5053, 5054,
   5055, 5056, 5057, 5058, 5059, 5060, 5061, 5062, 5063, 5064, 5065,
   5066, 5067, 5068, 5069, 5070, 5071, 5072, 5073, 5074, 5075, 5076,
   5077, 5078, 5079, 5080, 5081, 5082, 5083, 5084, 5085, 5086, 5087,
   5088, 5089, 5090, 5091, 5092, 5093, 5094, 5095, 5096, 5097, 5098,
   5099])

flux = array([ 112.65878609,  109.2008992 ,  113.30629929,  117.17002715,
   103.19663878,  110.42131523,  106.00841123,  100.27882741,
   103.89160905,  102.29402469,  105.58894696,  103.21314852,
    96.97242814,  106.70130478,  108.83891225,  110.60598803,
    95.10361887,  109.39734257,  103.08289878,  104.97258911,
    96.46606257,  106.75993458,   99.25386914,  105.91429417,
   105.83752232,  100.53312657,   99.74871394,  107.12735837,
   108.81187473,   96.51418895,   99.71311101,   94.08702553,
    98.81198643,   93.84567201,  103.21444519,   94.7027134 ,
    99.61842203,  103.71336458,  100.8697998 ,   92.1564786 ,
    96.56711985,   94.7728761 ,   82.65194671,   83.52280884,
    86.57960844,   73.6700194 ,   66.11794666,   61.01624627,
    63.19944529,   55.50283247,   62.09172307,   59.55436092,
    75.66399466,   70.69397378,   64.27899192,   73.80248662,
    89.17119606,   78.97024327,   82.3334254 ,  100.82581489,
   102.77937201,   99.37717696,   96.2215563 ,  104.52291339,
    93.7581944 ,   93.32154346,  103.57018896,  108.08682518,
   105.2711359 ,  100.00242988,  100.86934866,  103.20764384,
   104.19274473,  101.3314802 ,  102.75057114,   94.02347591,
    95.48758551,  106.0099397 ,   99.50733501,   97.88110415,
   107.54266965,  107.76126331,   98.14882302,  101.55654606,
   101.02418212,  106.82324958,   95.52086925,  102.65957133,
   104.93806492,  103.22762427,  108.02087993,  106.71911141,
    97.24396195,  103.3450277 ,  113.99870588,  106.4145751 ,
   110.08294674,  109.40908288,  118.61518086,  114.37341062])

error = array([ 11.72799338,  22.33423611,  16.89347382,  12.80063102,
   23.99242356,  25.15863754,  20.44765811,  14.84358628,
   19.16343785,  19.5703491 ,  18.44427035,  19.08648083,
   19.09116433,  12.22098884,  14.81280352,  11.35010222,
   18.59850136,  15.78855734,  21.85877638,  20.12179042,
   22.04894395,  21.986731  ,  13.26738352,  16.10987762,
   24.28528627,  30.11866128,  25.30220842,  25.02100014,
   29.38560916,  16.8192307 ,  29.15097205,  23.56805267,
   15.17285709,  18.27495747,  18.63750452,  18.61618504,
   11.45940025,  21.95805701,  24.22923951,  11.76824052,
   19.75465065,  14.72979889,  15.45936176,  14.73227474,
   28.91683627,  22.90534472,  16.82376093,  21.47830226,
   20.05012214,  16.74393817,  17.79456361,  20.80008233,
   19.32059989,  23.23471888,  13.77434964,  17.56121752,
   15.96716163,  18.5294016 ,  28.31005939,  13.66340359,
   10.38160267,  16.09621015,  18.25125683,  20.95954331,
   21.31996941,  24.51998489,  16.58831953,  15.25427142,
   23.93065281,  30.4552266 ,  16.94527367,  16.92730802,
   17.79659417,  18.85080572,  18.0839428 ,  23.93949481,
   26.60243553,  13.68320208,  16.74669921,  20.30238694,
   12.74773905,  19.20810456,  20.7189417 ,  20.73402554,
   17.12106905,  25.06475175,  13.0947528 ,  28.16437938,
   22.4803386 ,  13.71143627,   6.60617725,  20.41186825,
   23.54924934,  22.25930658,  20.09337438,  24.94705884,
   18.58056249,   5.58653271,  18.71242702,  17.83578444])


# How I created masks, or just jump to next comment if it's too painful to look at...
masks = []
masks.append(flux/error > 4.0) # high error
absorptionMask1 = (wave < 5060)
absorptionMask2 = (wave > 5040)
bob = [all(x) for x in zip(absorptionMask1, absorptionMask2)]
absorptionMask = ~np.array(bob)
masks.append(absorptionMask) 

# The resulting mask
masks = [array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
       True,  True,  True,  True,  True,  True,  True,  True,  True,
       True,  True,  True,  True,  True,  True,  True, False, False,
       True, False,  True, False, False,  True,  True,  True,  True,
       True,  True,  True,  True,  True,  True,  True,  True, False,
      False, False, False, False, False, False, False, False, False,
       True,  True,  True,  True, False,  True,  True,  True,  True,
       True,  True, False,  True,  True,  True, False,  True,  True,
       True,  True,  True, False, False,  True,  True,  True,  True,
       True,  True,  True,  True,  True,  True, False,  True,  True,
       True,  True,  True,  True,  True,  True,  True,  True,  True,  True], dtype=bool),
array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
       True,  True,  True,  True,  True,  True,  True,  True,  True,
       True,  True,  True,  True,  True,  True,  True,  True,  True,
       True,  True,  True,  True,  True,  True,  True,  True,  True,
       True,  True,  True,  True,  True, False, False, False, False,
      False, False, False, False, False, False, False, False, False,
      False, False, False, False, False, False,  True,  True,  True,
       True,  True,  True,  True,  True,  True,  True,  True,  True,
       True,  True,  True,  True,  True,  True,  True,  True,  True,
       True,  True,  True,  True,  True,  True,  True,  True,  True,
       True,  True,  True,  True,  True,  True,  True,  True,  True,  True], dtype=bool)]


# More in a bit, should get you a feel for what I'm looking at.

Answer 1

otherwise you can use boolean operators, let's define en example:

d=np.arange(10)
masks = [d>5, d % 2 == 0, d<8]

you can use reduce to combine all of them:

from functools import reduce

total_mask = reduce(np.logical_and, masks)

you can also use boolean operators explicitely if you need to manually choose the masks:

total_mask = masks[0] & masks[1] & masks[2]

Answer 2

I think you're looking for the star operator:

fullmask = [all(mask) for mask in zip(*masks)]

...although I'm not sure I understand your data structure completely.

Answer 3

How about using numpy record arrays ?

import numpy as np

# create some data
pixel = np.arange(4000)
wave = pixel / 4000. + 5500
flux = pixel / 4000. + 9.5 * 5500
data = np.rec.fromarrays((pixel, wave, flux), names='pixel, wave, flux')

mask = data.wave > 5500.25
mask &= data.flux / data.wave > 8.5

print data[mask].pixel.mean()

Answer 4

If I understand properly, what you want is to filter the arrays.

Here's an example for filtering an array

your_array = [1, 5, 6000]    
filter(lambda elem: elem > 5000, your_array)

This returns [6000]

When you say "keep the original index numbers", I think you mean you want to test your condition on each element and store the result for each element? If so, you may want to use map

your_array = [1, 5, 6000]
map(lambda elem: elem > 5000, your_array)

This returns [False, False, True]

You can replace all of the lambda's with functions you define if you have more complex conditions.

PS I think it would help if you give an example input and example output of what you want. The wording of the question is confusing.

EDIT:

With the example data, I think this is what you want, feel free to comment. This method helps you avoid storing lists of True, False and then finding the index of the elements you want afterwards. It will return you a list of the indexes and allow you to use fewer steps to compute the average.

# Given wave, error, and flux the way you defined

# If wave is [21.2, 34.1, 43.423], then this returns [(0, 21.2), (1, 34.1), (2, 43.423)]
# Each element is now a tuple of (index, elem)
enum_wave = enumerate(wave)

# Returns a list of the indexes that pass the condition
# For example, if only 98, and 99 aren't filtered out, this will return [98, 99]
masked_wave = [index for index, elem in enum_wave if elem > 5060]

# To find the average
sum(masked_wave) / float(len(masked_wave))

Answer 5

You can also do this instead of using functools.reduce :

combined_mask = np.full(len(masks), True)
for mask in masks:
    combined_mask &= mask

Applying multiple masks to arrays

Question

5 answers

solution1
11 ACCPTED 2012-07-18 08:08:49

solution2
7 2012-07-18 07:35:06

solution3
2 2012-07-18 08:13:15

solution4
1 2012-07-18 08:26:03

solution5
0 2020-09-20 04:41:24

Applying multiple masks to arrays

Question

5 answers

solution1 11 ACCPTED 2012-07-18 08:08:49

solution2 7 2012-07-18 07:35:06

solution3 2 2012-07-18 08:13:15

solution4 1 2012-07-18 08:26:03

solution5 0 2020-09-20 04:41:24

solution1
11 ACCPTED 2012-07-18 08:08:49

solution2
7 2012-07-18 07:35:06

solution3
2 2012-07-18 08:13:15

solution4
1 2012-07-18 08:26:03

solution5
0 2020-09-20 04:41:24