I basically want to take this dataframe:
collector_id date_created row_id question_id respondent_id survey_id
0 24785342 2015-02-25 00:40:00 3055824979 319047238 5004656403 101692922
1 24785342 2015-02-25 00:40:00 3055824979 319047238 5004656404 101692922
2 24785342 2015-02-25 00:40:00 3055824980 319047238 5004656405 101692922
3 24785342 2015-02-25 00:40:00 3055824980 319047238 5004656406 101692922
4 24785342 2015-02-25 00:40:00 3055824980 319047238 5004656407 101692922
5 24785342 2015-02-25 00:40:00 3055824980 319047238 5004656408 101692922
6 24785342 2015-02-25 00:40:00 3055824981 319047238 5004656409 101692922
and turn it into:
collector_id date_created 319047238 respondent_id survey_id
0 24785342 2015-02-25 00:40:00 3055824979 5004656403 101692922
1 24785342 2015-02-25 00:40:00 3055824979 5004656404 101692922
2 24785342 2015-02-25 00:40:00 3055824980 5004656405 101692922
3 24785342 2015-02-25 00:40:00 3055824980 5004656406 101692922
4 24785342 2015-02-25 00:40:00 3055824980 5004656407 101692922
5 24785342 2015-02-25 00:40:00 3055824980 5004656408 101692922
6 24785342 2015-02-25 00:40:00 3055824981 5004656409 101692922
Which is taking every question id and turning it into a column and then putting the row_ids underneath it.
This seems to work:
df = df.pivot_table(
'question_id', ['respondent_id', 'survey_id'], 'row_id'
).reset_index()
it returns:
row_id respondent_id survey_id 3055827274 3055827275 3055827276
0 5004658716 101693626 319047673 NaN NaN
1 5004658717 101693626 319047673 NaN NaN
2 5004658718 101693626 NaN 319047673 NaN
3 5004658719 101693626 NaN 319047673 NaN
4 5004658720 101693626 NaN 319047673 NaN
5 5004658721 101693626 NaN 319047673 NaN
6 5004658722 101693626 NaN NaN 319047673
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.