merge several columns into one

Question

I have a data frame like this (im trying to adapt it, since its in spanish and copy paste doesnt help)

     Question 1 opt. A  Question 1 opt. B  Question 1 opt. C  Question 2 opt. A    Question 2 opt. B  
 0     NaN                    NaN                 yes              NaN                 NaN
 1     NaN                    None                NaN              Uber                NaN
 2     NaN                    NaN                 NaN              Didi                NaN

So, many columns are really an answer to the same question, only different option. What I would like to do is some kind of merge like this:

    Question 1    Question 2    
 0     yes            NaN                  
 1     None           Uber                  
 2     NaN            Didi

That is, to somehow summarize all the answers for each question into a single column (provided all are mutually exclusive). Tagging each one would be a plus. I believe a for loop could do it, but Im very bad at implementing it, and loops are strongly advised to not be used in python.

Answer 1

You can use str.extract to extract the question part from the columns then groupby the dataframe on this extracted series along axis=1 and aggregate using first :

g = df.columns.str.extract(r'(Question \d+)', expand=False)
out = df.groupby(g, axis=1).first()

Result:

  Question 1 Question 2
0        yes        NaN
1       None       Uber
2        NaN       Didi

Answer 2

Try this:

(pd.wide_to_long(df.reset_index(), ['Question 1', 'Question 2'], 'index', 'Option', sep=' ', suffix='.*')\
  .dropna(how='all')
  .max(level=1)
  .reset_index())

Output:

   Option Question 1 Question 2
0  opt. C        yes        NaN
1  opt. A        NaN       Uber
2  opt. B       None        NaN

Answer 3

Use fillna to replace None and NaN with emtpy string. Then rest is simple concatenation

Code:

import pandas as pd
import numpy as np

data = {'Question 1 opt. A' : [np.nan, np.nan, np.nan],
        'Question 1 opt. B' : [np.nan, None, np.nan],
        'Question 1 opt. C' : ['yes', np.nan, np.nan],
        'Question 2 opt. A' : [np.nan, 'Uber','Didi'],
        'Question 2 opt. B' : [np.nan, np.nan, np.nan]}
        
df = pd.DataFrame(data)
print(df)
df.fillna('', inplace=True)
df['Question 1'] = df['Question 1 opt. A'] + df['Question 1 opt. B'] + df['Question 1 opt. C']
df['Question 2'] =  df['Question 2 opt. A'] + df['Question 2 opt. B']
print(df)

Output:

   Question 1 opt. A  Question 1 opt. B Question 1 opt. C Question 2 opt. A  Question 2 opt. B
0                NaN                NaN               yes               NaN                NaN
1                NaN                NaN               NaN              Uber                NaN
2                NaN                NaN               NaN              Didi                NaN
  Question 1 opt. A Question 1 opt. B Question 1 opt. C Question 2 opt. A Question 2 opt. B Question 1 Question 2
0                                                   yes                                            yes
1                                                                    Uber                                    Uber
2                                                                    Didi                                    Didi

merge several columns into one

Question

3 answers

solution1
1 2020-12-23 15:22:29

solution2
1 2020-12-23 15:22:39

solution3
0 2020-12-23 16:08:41

merge several columns into one

Question

3 answers

solution1 1 2020-12-23 15:22:29

solution2 1 2020-12-23 15:22:39

solution3 0 2020-12-23 16:08:41

solution1
1 2020-12-23 15:22:29

solution2
1 2020-12-23 15:22:39

solution3
0 2020-12-23 16:08:41