I have a data frame like this (im trying to adapt it, since its in spanish and copy paste doesnt help)
Question 1 opt. A Question 1 opt. B Question 1 opt. C Question 2 opt. A Question 2 opt. B
0 NaN NaN yes NaN NaN
1 NaN None NaN Uber NaN
2 NaN NaN NaN Didi NaN
So, many columns are really an answer to the same question, only different option. What I would like to do is some kind of merge like this:
Question 1 Question 2
0 yes NaN
1 None Uber
2 NaN Didi
That is, to somehow summarize all the answers for each question into a single column (provided all are mutually exclusive). Tagging each one would be a plus. I believe a for loop could do it, but Im very bad at implementing it, and loops are strongly advised to not be used in python.
You can use str.extract
to extract the question part from the columns then groupby
the dataframe on this extracted series along axis=1
and aggregate using first
:
g = df.columns.str.extract(r'(Question \d+)', expand=False)
out = df.groupby(g, axis=1).first()
Result:
Question 1 Question 2
0 yes NaN
1 None Uber
2 NaN Didi
Try this:
(pd.wide_to_long(df.reset_index(), ['Question 1', 'Question 2'], 'index', 'Option', sep=' ', suffix='.*')\
.dropna(how='all')
.max(level=1)
.reset_index())
Output:
Option Question 1 Question 2
0 opt. C yes NaN
1 opt. A NaN Uber
2 opt. B None NaN
Use fillna to replace None and NaN with emtpy string. Then rest is simple concatenation
Code:
import pandas as pd
import numpy as np
data = {'Question 1 opt. A' : [np.nan, np.nan, np.nan],
'Question 1 opt. B' : [np.nan, None, np.nan],
'Question 1 opt. C' : ['yes', np.nan, np.nan],
'Question 2 opt. A' : [np.nan, 'Uber','Didi'],
'Question 2 opt. B' : [np.nan, np.nan, np.nan]}
df = pd.DataFrame(data)
print(df)
df.fillna('', inplace=True)
df['Question 1'] = df['Question 1 opt. A'] + df['Question 1 opt. B'] + df['Question 1 opt. C']
df['Question 2'] = df['Question 2 opt. A'] + df['Question 2 opt. B']
print(df)
Output:
Question 1 opt. A Question 1 opt. B Question 1 opt. C Question 2 opt. A Question 2 opt. B
0 NaN NaN yes NaN NaN
1 NaN NaN NaN Uber NaN
2 NaN NaN NaN Didi NaN
Question 1 opt. A Question 1 opt. B Question 1 opt. C Question 2 opt. A Question 2 opt. B Question 1 Question 2
0 yes yes
1 Uber Uber
2 Didi Didi
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.