[英]Python Pandas Reformat stacked columns to long(?) format
I have a csv file that looks like this:我有一个如下所示的 csv 文件:
And I want it to look like this:我希望它看起来像这样:
Basically, the country, month, year, and code would repeat, and the Valor export and Volumen export would be unique for the row.. The ending columns can be in any order.基本上,国家、月份、年份和代码将重复,并且 Valor 导出和 Volumen 导出对于该行将是唯一的。结束列可以是任何顺序。
I have tried doing a series of melt, pivot, transpose etc. with no luck.我试过做一系列融化、旋转、转置等,但没有成功。 Can any please provide any guidance?任何人都可以提供任何指导吗? Any hints would be greatly appreciated.任何提示将不胜感激。 I am just stumped how to deal with the original csv having the year/month/export value being "stacked"...我只是很难过如何处理具有“堆叠”年/月/出口值的原始 csv ......
Here is the dictionary that can be used to recreate the original csv as a Pandas Data Frame:这是可用于将原始 csv 重新创建为 Pandas 数据框的字典:
pd.DataFrame([{'Unnamed: 0': np.nan, 'Unnamed: 1': np.nan, '2017': 'Enero', 'Unnamed: 3': np.nan, 'Unnamed: 4': 'Febrero', 'Unnamed: 5': np.nan}, {'Unnamed: 0': np.nan, 'Unnamed: 1': np.nan, '2017': 'Valor export', 'Unnamed: 3': 'Volumen export', 'Unnamed: 4': 'Valor export', 'Unnamed: 5': 'Volumen export'}, {'Unnamed: 0': np.nan, 'Unnamed: 1': np.nan, '2017': np.nan, 'Unnamed: 3': np.nan, 'Unnamed: 4': np.nan, 'Unnamed: 5': np.nan}, {'Unnamed: 0': '080390110000 SA-2017', 'Unnamed: 1': 'USA', '2017': '29200.10725', 'Unnamed: 3': '67198.189', 'Unnamed: 4': '38631.16383', 'Unnamed: 5': '87962.196'}, {'Unnamed: 0': '090390110000 SA-2017', 'Unnamed: 1': 'Mexico', '2017': '9283.79255', 'Unnamed: 3': '21638.126', 'Unnamed: 4': '9785.40009', 'Unnamed: 5': '22863.867'}, {'Unnamed: 0': '010390110000 SA-2017 ', 'Unnamed: 1': 'Canada', '2017': '8017.55675', 'Unnamed: 3': '19352.178', 'Unnamed: 4': '11137.27057', 'Unnamed: 5': '27020.428'}, {'Unnamed: 0': '070390110000 SA-2017', 'Unnamed: 1': 'Brazil', '2017': '3786.44363', 'Unnamed: 3': '8704.871', 'Unnamed: 4': '4553.70795', 'Unnamed: 5': '10583.833'}, {'Unnamed: 0': '060390110000 SA-2017', 'Unnamed: 1': 'Italy', '2017': '4809.76636', 'Unnamed: 3': '12411.691', 'Unnamed: 4': '4304.02052', 'Unnamed: 5': '11198.063'}, {'Unnamed: 0': '000390110000 SA-2017 ', 'Unnamed: 1': 'Spain', '2017': '2290.65793', 'Unnamed: 3': '6227.269', 'Unnamed: 4': '3269.41957', 'Unnamed: 5': '9118.595'}, {'Unnamed: 0': '0990390110000 SA-2017 ', 'Unnamed: 1': 'Costa Rica', '2017': '1855.70035', 'Unnamed: 3': '4687.714', 'Unnamed: 4': '2668.57892', 'Unnamed: 5': '6425.365'}, {'Unnamed: 0': '0040390110000 SA-2017 ', 'Unnamed: 1': 'Honduras', '2017': '1823.358', 'Unnamed: 3': '4223.521', 'Unnamed: 4': '250.2036', 'Unnamed: 5': '603.392'}])
Again, any help would be greatly appreciated再次,任何帮助将不胜感激
Try:尝试:
df.iloc[:, 0] = df.iloc[:, 0].str.strip()
idx1 = pd.MultiIndex.from_product(
[("2017",), ("Enero", "Febrero"), ("Valor Export", "Volumen Export")],
names=("Year", "Month", "Export"),
)
idx2 = pd.MultiIndex.from_frame(df.iloc[3:, :2], names=("Code", "Country"))
df2 = (
pd.DataFrame(df.iloc[3:, 2:].values, columns=idx1, index=idx2)
.stack(level=(0, 1))
.reset_index()
)
df2.columns.name = None
print(df2.sort_values(by=["Year", "Month"]).to_markdown(index=False))
Prints:印刷:
Code代码 | Country国家 | Year年 | Month月 | Valor Export价值出口 | Volumen Export批量出口 |
---|---|---|---|---|---|
080390110000 SA-2017 080390110000 SA-2017 | USA美国 | 2017 2017 | Enero埃内罗 | 29200.1 29200.1 | 67198.2 67198.2 |
090390110000 SA-2017 090390110000 SA-2017 | Mexico墨西哥 | 2017 2017 | Enero埃内罗 | 9283.79 9283.79 | 21638.1 21638.1 |
010390110000 SA-2017 010390110000 SA-2017 | Canada加拿大 | 2017 2017 | Enero埃内罗 | 8017.56 8017.56 | 19352.2 19352.2 |
070390110000 SA-2017 070390110000 SA-2017 | Brazil巴西 | 2017 2017 | Enero埃内罗 | 3786.44 3786.44 | 8704.87 8704.87 |
060390110000 SA-2017 060390110000 SA-2017 | Italy意大利 | 2017 2017 | Enero埃内罗 | 4809.77 4809.77 | 12411.7 12411.7 |
000390110000 SA-2017 000390110000 SA-2017 | Spain西班牙 | 2017 2017 | Enero埃内罗 | 2290.66 2290.66 | 6227.27 6227.27 |
0990390110000 SA-2017 0990390110000 SA-2017 | Costa Rica哥斯达黎加 | 2017 2017 | Enero埃内罗 | 1855.7 1855.7 | 4687.71 4687.71 |
0040390110000 SA-2017 0040390110000 SA-2017 | Honduras洪都拉斯 | 2017 2017 | Enero埃内罗 | 1823.36 1823.36 | 4223.52 4223.52 |
080390110000 SA-2017 080390110000 SA-2017 | USA美国 | 2017 2017 | Febrero费布雷罗 | 38631.2 38631.2 | 87962.2 87962.2 |
090390110000 SA-2017 090390110000 SA-2017 | Mexico墨西哥 | 2017 2017 | Febrero费布雷罗 | 9785.4 9785.4 | 22863.9 22863.9 |
010390110000 SA-2017 010390110000 SA-2017 | Canada加拿大 | 2017 2017 | Febrero费布雷罗 | 11137.3 11137.3 | 27020.4 27020.4 |
070390110000 SA-2017 070390110000 SA-2017 | Brazil巴西 | 2017 2017 | Febrero费布雷罗 | 4553.71 4553.71 | 10583.8 10583.8 |
060390110000 SA-2017 060390110000 SA-2017 | Italy意大利 | 2017 2017 | Febrero费布雷罗 | 4304.02 4304.02 | 11198.1 11198.1 |
000390110000 SA-2017 000390110000 SA-2017 | Spain西班牙 | 2017 2017 | Febrero费布雷罗 | 3269.42 3269.42 | 9118.59 9118.59 |
0990390110000 SA-2017 0990390110000 SA-2017 | Costa Rica哥斯达黎加 | 2017 2017 | Febrero费布雷罗 | 2668.58 2668.58 | 6425.36 6425.36 |
0040390110000 SA-2017 0040390110000 SA-2017 | Honduras洪都拉斯 | 2017 2017 | Febrero费布雷罗 | 250.204 250.204 | 603.392 603.392 |
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.