I have the following dataframe:
A B C
I am motivated Agree 4
I am motivated Strongly Agree 5
I am motivated Disagree 6
I am open-minded Agree 4
I am open-minded Disagree 4
I am open-minded Strongly Disagree 3
Where column A is the question, column B is the answer, and column C is the frequency of "Strongly Agree", "Agree", "Disagree", and "Strongly Disagree" for the questions in column A.
How can I convert it into the following dataframe?
Strongly Agree Agree Disagree Strongly Disagree
I am motivated 5 4 6 0
I am open-minded 0 4 4 3
I tried looking at groupby() for columns from other posts but could not figure it out. Using python 3
Use DataFrame.pivot_table() method:
In [250]: df.pivot_table(index='A', columns='B', values='C', aggfunc='sum', fill_value=0)
Out[250]:
B Agree Disagree Strongly Agree Strongly Disagree
A
I am motivated 4 6 5 0
I am open-minded 4 4 0 3
Because these are already frequency counts, we can assume that we have unique Question
/ Opinion
pairs. So, we can use set_index
and unstack
as there won't be a need to aggregate. This should save us some time with efficiency. We could accomplish the same goal with pivot
, however, pivot
doesn't have a fill_value
option that enables us to preserve dtype
df.set_index(['A', 'B']).C.unstack(fill_value=0)
B Agree Disagree Strongly Agree Strongly Disagree
A
I am motivated 4 6 5 0
I am open-minded 4 4 0 3
Extra Credit
Turn 'B'
into a pd.Categorical
and the columns will be sorted
df.B = pd.Categorical(
df.B, ['Strongly Disagree', 'Disagree', 'Agree', 'Strongly Agree'], True)
df.set_index(['A', 'B']).C.unstack(fill_value=0)
B Strongly Disagree Disagree Agree Strongly Agree
A
I am motivated 0 6 4 5
I am open-minded 3 4 4 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.