简体   繁体   中英

Numpy arrays instead of python lists - using nditer to create a 2d array from two 1d arrays

The following code works, but despite some effort can't figure out how to use numpy arrays (using nditer) rather than python lists (using enumerate).

It is for a psychology experiment where each trial presents one of four stimuli are presented and the participant's reaction time is recorded. Average reaction times are then calculated for each of the four trial types by creating an 2d array from the two 1d arrays.

trialTypeData = [3, 0, 2, 1, 1, 0, 2, 3]
Rt = [900, 1200, 1300, 1400, 1100, 1200, 1300, 1400]

RtByTrialType = [0, 0, 0, 0]
meanRtByTrialType = [0, 0, 0, 0]

for trialType in range(0, 4):
    RtByTrialType[trialType] = [Rt[i] for i, x in enumerate(trialTypeData) if x == trialType]
meanRtByTrialType[trialType] = sum(RtByTrialType[trialType])/len(RtByTrialType[trialType])

print "Average latencies by Trialtype:"
print (meanTrialTypeRt)

For this kind of data analysis, I'd recommend using pandas instead of numpy ; it makes a lot of things much easier. In this case, you can do it using groupby (to collect items by type) and then mean :

>>> import pandas as pd
>>> trialTypeData = [3, 0, 2, 1, 1, 0, 2, 3]
>>> Rt = [900, 1200, 1300, 1400, 1100, 1200, 1300, 1400]
>>> df = pd.DataFrame({"Rt": Rt, "type": trialTypeData})
>>> df
     Rt  type
0   900     3
1  1200     0
2  1300     2
3  1400     1
4  1100     1
5  1200     0
6  1300     2
7  1400     3

[8 rows x 2 columns]
>>> df.groupby("type").mean()
        Rt
type      
0     1200
1     1250
2     1300
3     1150

[4 rows x 1 columns]

Don't use nditer . There are better ways:

meanTrialTypeRt = [Rt[trialTypeData == trialType].mean()
                   for trialType in xrange(4)]

For each trial type, this selects the locations where trialTypeData is equal to trialType , gets those locations from Rt , and computes the mean. There are probably even better ways to do this with NumPy or SciPy statistical routines I'm unfamiliar with or don't remember at the moment; that list comprehension I'm using is a big red flag, and the runtime of this routine still grows unnecessarily with the number of trial types.

(Note that Rt and trialTypeData will need to be NumPy arrays for this to work.)

Here's another way:

trialTypeData = np.array([3, 0, 2, 1, 1, 0, 2, 3])
Rt = np.array([900, 1200, 1300, 1400, 1100, 1200, 1300, 1400])

meanTrialTypeRt = np.bincount(trialTypeData, Rt) / np.bincount(trialTypeData)

Or if you know that there are the same number of instances for each trial type:

n_trials = 4
order = trialTypeData.argsort()
RtByTrialType = Rt[order].reshape((n_trials, -1))
meanTrialTypeRt = RtByTrialType.mean(1)

The second method might be slower (or not I haven't timed it), but it produces the RtByTrialType array which you can use if you need it later. The -1 in the reshape tells numpy to figure out what that value should be to make the reshape work.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM