[英]Read unstructured data pandas
I have data set in excel which has not an excellent table format. 我在excel中设置了数据集,但表格格式不佳。 Here is the sample:
这是示例:
Country Male Female
2010 2011 2012 2013 2014 2010 2011 2012 2013 2014
AFG 182 134 94 87 85 120 150 95 75 92
BLZ 200 250 150 125 45 210 140 125 101 21
I want to read this data in Python and put it into pandas data frame like: 我想在Python中读取此数据并将其放入pandas数据框架中,例如:
Country Year Male Female
AFG 2010 182 120
...
Is there any way to this in Python/Pandas without manipulating the original data set? 在不操纵原始数据集的情况下,Python / Pandas中有什么办法吗?
You can fine the sample data set here: 您可以在此处细化样本数据集:
https://expirebox.com/download/173bc0880dd9da56ccff2796aa1274ed.html https://expirebox.com/download/173bc0880dd9da56ccff2796aa1274ed.html
Thanks 谢谢
A solution - provided by pandas native excel reader options. 解决方案-由熊猫本机excel读者选项提供。
found the technique here: reading excel sheet as multiindex dataframe through pd.read_excel() 在这里找到了这项技术: 通过pd.read_excel()将Excel工作表读取为多索引数据帧
df = pd.read_excel('Sample.xlsx',header=[0,1],index_col=[0,1])
which gives: 这使:
Country Male Female
1990 2000 2010 2015 2016 1990 2000 2010 2015 2016
AFG Afghanistan 127.0000 96.5000 70.0000 58.7000 56.9000 113.2000 84.7000 61.2000 50.8000 49.2000
ALB Albania 38.1000 25.5000 16.4000 13.7000 13.3000 31.0000 20.6000 13.2000 11.1000 10.7000
DZA Algeria 45.0000 36.7000 24.9000 23.2000 22.9000 37.5000 31.1000 22.0000 20.5000 20.2000
AND Andorra 8.0000 4.3000 3.2000 2.7000 2.7000 6.6000 3.7000 2.7000 2.3000 2.3000
AGO Angola 140.6000 132.7000 82.4000 62.5000 60.0000 120.9000 112.8000 68.0000 51.0000 49.0000
and to finish out to the desired layout use stack() 并完成所需的布局,请使用stack()
df.stack() df.stack()
Country Female Male
AFG Afghanistan 1990 113.2000 127.0000
2000 84.7000 96.5000
2010 61.2000 70.0000
2015 50.8000 58.7000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.