[英]Transpose first row into column and simplify repeating columns in a pandas dataframe
Despite spending half the day on Stack Overflow, I have not found a solution.尽管在 Stack Overflow 上花了半天时间,但我还没有找到解决方案。 Working in python 3.9.0, I need to clean a dataframe.
在 python 3.9.0 中工作,我需要清理 dataframe。 The first row should be transposed into a column, the second row needs to be made a header, and the repeating columns ('political_rights', 'civil_liberties, 'status') need to be simplified into only 3 columns.
第一行要转成一列,第二行需要做成header,重复的列('political_rights','civil_liberties','status')只需简化为3列。 This can be done by making the values in the column "country" repeat for each year.
这可以通过使“国家”列中的值每年重复来完成。 Whenever I accomplish one thing, I mess up another so any help/advice is deeply appreciated!
每当我完成一件事时,我就会把另一件事搞砸,所以任何帮助/建议都非常感谢!
Simiplified version of current dataframe (actual df: 207 rows × 148 columns):当前dataframe的简化版(实际df:207行×148列):
df_bad = pd.DataFrame({'col1': ['years', 'country', 'Afghanistan', 'Albania', 'Algeria', 'Andorra'],
'col2': [1972, 'political_rights', 4, 7, 6, 4],
'col3': [1972, 'civil_liberties', 5, 7, 6, 3],
'col4': [1972, 'status', 'PF', 'NF', 'NF', 'NF'],
'col5': [1973, 'political_rights', 7, 7, 6, 4],
'col6': [1973, 'civil_liberties', 6, 7, 6, 4],
'col7': [1973, 'status', 'NF', 'NF', 'NF', 'PF']})
Simiplified version of desired dataframe (future df: 10250 rows × 5 columns):所需 dataframe 的简化版本(未来 df:10250 行 × 5 列):
df = pd.DataFrame({'country': ['Afghanistan', 'Albania', 'Algeria', 'Afghanistan', 'Albania', 'Algeria'],
'years': [1972, 1972, 1972, 1973, 1973, 1973],
'political_rights': [4, 7, 6, 7, 7, 6],
'civil_liberties': [5, 7, 6, 6, 7, 6],
'status': ['PF', 'NF', 'NF', 'NF', 'NF', 'NF']})
s = df_bad.T
s.columns = s.loc['col1']
s = s.drop('col1').set_index(['years', 'country'])
s = s.stack().rename_axis(['years', None, 'country'])
s = s.unstack(1).reset_index()
Transpose the dataframe转置 dataframe
0 1 2 3 4 5
col1 years country Afghanistan Albania Algeria Andorra
col2 1972 political_rights 4 7 6 4
col3 1972 civil_liberties 5 7 6 3
col4 1972 status PF NF NF NF
col5 1973 political_rights 7 7 6 4
col6 1973 civil_liberties 6 7 6 4
col7 1973 status NF NF NF PF
Set the columns to col1
values, then drop
col1
and set the index to years
and country
将列设置为
col1
值,然后drop
col1
并将索引设置为years
和country
col1 Afghanistan Albania Algeria Andorra
years country
1972 political_rights 4 7 6 4
civil_liberties 5 7 6 3
status PF NF NF NF
1973 political_rights 7 7 6 4
civil_liberties 6 7 6 4
status NF NF NF PF
Stack
the dataframe to reshape into multiindex series then rename axis Stack
dataframe 以重塑为多索引系列,然后重命名轴
years country
1972 political_rights Afghanistan 4
Albania 7
Algeria 6
Andorra 4
civil_liberties Afghanistan 5
Albania 7
Algeria 6
Andorra 3
status Afghanistan PF
Albania NF
Algeria NF
Andorra NF
1973 political_rights Afghanistan 7
Albania 7
Algeria 6
Andorra 4
civil_liberties Afghanistan 6
Albania 7
Algeria 6
Andorra 4
status Afghanistan NF
Albania NF
Algeria NF
Andorra PF
dtype: object
Unstack
the series on level=1
to reshape back to dataframe在
level=1
上取消堆叠系列以重塑回Unstack
years country civil_liberties political_rights status
0 1972 Afghanistan 5 4 PF
1 1972 Albania 7 7 NF
2 1972 Algeria 6 6 NF
3 1972 Andorra 3 4 NF
4 1973 Afghanistan 6 7 NF
5 1973 Albania 7 7 NF
6 1973 Algeria 6 6 NF
7 1973 Andorra 4 4 PF
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.