简体   繁体   中英

Itertools: selecting in pandas based on previous three rows, or previous elements in a list

Hoping for some help with a problem that has stumped me all day. I have data from an experiment in which subjects are asked via the screen to press one of four buttons on the keyboard - 'm', 'x', 'n', 'z' - for 1600 trials. On the even trials, the button pressing obeys a randomly selected pattern (eg mnzxmnzxmnzx), but on odd trials, the button to press is randomly chosen. The data set I've been given contains only which key the subject pressed on which trial. I need to find out:

(1) what the subject's pattern was. (I tried this, since the pattern repeats)

def find_pattern(df):
'''find the pattern for this subject'''
   criterion = df['trial'].isin([1, 3, 5, 7])
   the_pattern = df[criterion].circle_key.tolist()
   return df


df = df.groupby('sid').apply(find_pattern)

(2) find out what the possible combinations of this subjects pattern are (ie if I pressed 'm' the next pattern element will be 'x')

for this i tried a bunch of different itertools but none worked out exactly as I want. I want to basically take the list:

 ['m', 'x', 'z', 'n'] 

for each that I got in (1) and do all possible combinations of two IN ORDER. So this would be:

 [('m', 'x'), ('x', 'z'), ('z', 'n'), ('n', 'm')]

And there are no other possibilities. Then, I want to create a column that makes a triplet out of the last three trials (including the current one), as in column triplet below. I feel like there must be some sort of rolling window, or simple way select the last three trials. I've tried various things that are wrong - I can't seem to figure out how to refer to the "current" row in a data frame (without using a for list)...

I need these values because I need to compare whether the first and last element of the triplet are one of the possible combinations ( possible_comb ). (so for trial 3, the answer would be TRUE, and trial 4 the answer would be FALSE).

Any help would be greatly appreciated. My current data looks like this:

trial sid key
1     1   'm'
2     1   'm'  
3     1   'x'
4     1   'n'
5     1   'x'
6     1   'x'
7     1   'n'
1     2   'm'
2     2   'm'
...   ... 

I'd like for it to look like this:

trial sid key    pattern               possible_comb                                 triplet
1     1   'm'    ['m', 'x', 'x', 'n']  [('m','x'), ('x','x'), ('x','n'), ('n', 'm')] NaN
2     1   'm'    ['m', 'x', 'x', 'n']  [('m','x'), ('x','x'), ('x','n'), ('n', 'm')] NaN
3     1   'x'    ['m', 'x', 'x', 'n']  [('m','x'), ('x','x'), ('x','n'), ('n', 'm')] ['m', 'm', 'x']
4     1   'n'    ['m', 'x', 'x', 'n']  [('m','x'), ('x','x'), ('x','n'), ('n', 'm')] ['m', 'x', 'n']
5     1   'x'    ['m', 'x', 'x', 'n']  [('m','x'), ('x','x'), ('x','n'), ('n', 'm')] ['x', 'n', 'x']
6     1   'x'    ['m', 'x', 'x', 'n']  [('m','x'), ('x','x'), ('x','n'), ('n', 'm')] ['n', 'x', 'x'] 
7     1   'n'    ['m', 'x', 'x', 'n']  [('m','x'), ('x','x'), ('x','n'), ('n', 'm')] ['x', 'x', 'n']
1     2   'n'    ['n', 'x', 'm', 'n']  [('m','x'), ('x','x'), ('x','n'), ('n', 'm')] NaN
2     2   'm'    ['n', 'x', 'm', 'n']  [('m','x'), ('x','x'), ('x','n'), ('n', 'm')] NaN
...   ... 

To get the "pattern", you just need to group on the subject id and take every other element. That can be done with this:

>>> d.groupby('sid')['key'].apply(lambda c: list(c[::2]))
sid
1      ['m', 'x', 'x', 'n']

(I truncated your example to include just one subject, since you only included partial data from subject 2, which is too short to have a "pattern" as such. So this is the pattern for subject 1.)

If you want to duplicate that data in every row of the original DataFrame for the corresponding subject, use map to grab the pattern for each subject ID:

>>> d['pattern'] = d.sid.map(d.groupby('sid')['key'].apply(lambda c: list(c[::2])))
>>> d
   trial  sid  key               pattern
0      1    1  'm'  ['m', 'x', 'x', 'n']
1      2    1  'm'  ['m', 'x', 'x', 'n']
2      3    1  'x'  ['m', 'x', 'x', 'n']
3      4    1  'n'  ['m', 'x', 'x', 'n']
4      5    1  'x'  ['m', 'x', 'x', 'n']
5      6    1  'x'  ['m', 'x', 'x', 'n']
6      7    1  'n'  ['m', 'x', 'x', 'n']

To get the sequential combinations, you can just add the first element onto the end (so that the sequence "loops around"), then extract the combos by grabbing two-element sublists, with a function like this:

def getCombs(pattern):
    pattern = pattern + [pattern[0]]
    return [pattern[ix:ix+2] for ix in xrange(len(pattern)-1)]

Then you can get the patterns into your DataFrame:

>>> d['combs'] = d.pattern.map(getCombs)
>>> d.combs
0    [['m', 'x'], ['x', 'x'], ['x', 'n'], ['n', 'm']]
1    [['m', 'x'], ['x', 'x'], ['x', 'n'], ['n', 'm']]
2    [['m', 'x'], ['x', 'x'], ['x', 'n'], ['n', 'm']]
3    [['m', 'x'], ['x', 'x'], ['x', 'n'], ['n', 'm']]
4    [['m', 'x'], ['x', 'x'], ['x', 'n'], ['n', 'm']]
5    [['m', 'x'], ['x', 'x'], ['x', 'n'], ['n', 'm']]
6    [['m', 'x'], ['x', 'x'], ['x', 'n'], ['n', 'm']]
Name: combs, dtype: object

(I display only the "combs" column here because including all columns makes it too wide to show comfortably.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM