简体   繁体   中英

how to swap two columns and flip a third in panda data frame?

I'm conducting an experiment(using python 2.7, panda 0.23.4) where I have three levels of a stimulus {a,b,c} and present all different combinations to participants, and they have to choose which one was rougher? (example: Stimulus 1 = a , Stimulus 2=b, participant choose 1 indicating stimulus 1 was rougher)

After the experiment, I have a data frame with three columns like this:

import pandas as pd

d = {'Stim1':  ['a', 'b', 'a', 'c', 'b', 'c'],
     'Stim2': ['b', 'a', 'c', 'a', 'c', 'b'],
     'Answer': [1, 2, 2, 1, 2, 1]}
df = pd.DataFrame(d)

        Stim1  Stim2  Answer
    0     a     b      1
    1     b     a      2
    2     a     c      2
    3     c     a      1
    4     b     c      2
    5     c     b      1 

For my analysis, the order of which stimulus came first doesn't matter. Stim1= a, Stim2= b is the same as Stim1= b, Stim2= a. I'm trying to figure out how can I swap Stim1 and Stim2 and flip their Answer to be like this:

        Stim1  Stim2  Answer
    0     a     b      1
    1     a     b      1
    2     a     c      2
    3     a     c      2
    4     b     c      2
    5     b     c      2

I read that np.where can be used, but it would do one thing at a time, where I want to do two (swap and flip).

Is there some way to use another function to do swap and flip at the same time?

Can you try if this works for you?

import pandas as pd
import numpy as np

df = pd.DataFrame(d)

# keep a copy of the original Stim1 column
s = df['Stim1'].copy()

# sort the values
df[['Stim1', 'Stim2']] = np.sort(df[['Stim1', 'Stim2']].values)

# exchange the Answer if the order has changed
df['Answer'] = df['Answer'].where(df['Stim1'] == s, df['Answer'].replace({1:2,2:1}))

output:

  Stim1 Stim2  Answer
0     a     b       1
1     a     b       1
2     a     c       2
3     a     c       2
4     b     c       2
5     b     c       2

You can start by building a boolean series that indicates which rows should be swapped or not:

>>> swap = df['Stim1'] > df['Stim2']
>>> swap
0    False
1     True
2    False
3     True
4    False
5     True
dtype: bool

Then build the fully swapped dataframe as follows:

>>> swapped_df = pd.concat([
...     df['Stim1'].rename('Stim2'),
...     df['Stim2'].rename('Stim1'),
...     3 - df['Answer'],
... ], axis='columns')
>>> swapped_df
  Stim2 Stim1  Answer
0     a     b       2
1     b     a       1
2     a     c       1
3     c     a       2
4     b     c       1
5     c     b       2

Finally, use .mask() to select initial rows or swapped rows:

>>> df.mask(swap, swapped_df)
  Stim1 Stim2  Answer
0     a     b       1
1     a     b       1
2     a     c       2
3     a     c       2
4     b     c       2
5     b     c       2

NB .mask is roughly the same as .where , but it replaces rows where the series is True instead of keeping the rows that are True . This is exactly the same:

>>> swapped_df.where(swap, df)
  Stim2 Stim1  Answer
0     b     a       1
1     b     a       1
2     c     a       2
3     c     a       2
4     c     b       2
5     c     b       2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM