简体   繁体   English

Python Pandas 将堆积列重新格式化为长(?)格式

[英]Python Pandas Reformat stacked columns to long(?) format

I have a csv file that looks like this:我有一个如下所示的 csv 文件:

当前 csv

And I want it to look like this:我希望它看起来像这样:

所需格式

Basically, the country, month, year, and code would repeat, and the Valor export and Volumen export would be unique for the row.. The ending columns can be in any order.基本上,国家、月份、年份和代码将重复,并且 Valor 导出和 Volumen 导出对于该行将是唯一的。结束列可以是任何顺序。

I have tried doing a series of melt, pivot, transpose etc. with no luck.我试过做一系列融化、旋转、转置等,但没有成功。 Can any please provide any guidance?任何人都可以提供任何指导吗? Any hints would be greatly appreciated.任何提示将不胜感激。 I am just stumped how to deal with the original csv having the year/month/export value being "stacked"...我只是很难过如何处理具有“堆叠”年/月/出口值的原始 csv ......

Here is the dictionary that can be used to recreate the original csv as a Pandas Data Frame:这是可用于将原始 csv 重新创建为 Pandas 数据框的字典:

pd.DataFrame([{'Unnamed: 0': np.nan, 'Unnamed: 1': np.nan, '2017': 'Enero', 'Unnamed: 3': np.nan, 'Unnamed: 4': 'Febrero', 'Unnamed: 5': np.nan}, {'Unnamed: 0': np.nan, 'Unnamed: 1': np.nan, '2017': 'Valor export', 'Unnamed: 3': 'Volumen export', 'Unnamed: 4': 'Valor export', 'Unnamed: 5': 'Volumen export'}, {'Unnamed: 0': np.nan, 'Unnamed: 1': np.nan, '2017': np.nan, 'Unnamed: 3': np.nan, 'Unnamed: 4': np.nan, 'Unnamed: 5': np.nan}, {'Unnamed: 0': '080390110000 SA-2017', 'Unnamed: 1': 'USA', '2017': '29200.10725', 'Unnamed: 3': '67198.189', 'Unnamed: 4': '38631.16383', 'Unnamed: 5': '87962.196'}, {'Unnamed: 0': '090390110000 SA-2017', 'Unnamed: 1': 'Mexico', '2017': '9283.79255', 'Unnamed: 3': '21638.126', 'Unnamed: 4': '9785.40009', 'Unnamed: 5': '22863.867'}, {'Unnamed: 0': '010390110000 SA-2017 ', 'Unnamed: 1': 'Canada', '2017': '8017.55675', 'Unnamed: 3': '19352.178', 'Unnamed: 4': '11137.27057', 'Unnamed: 5': '27020.428'}, {'Unnamed: 0': '070390110000 SA-2017', 'Unnamed: 1': 'Brazil', '2017': '3786.44363', 'Unnamed: 3': '8704.871', 'Unnamed: 4': '4553.70795', 'Unnamed: 5': '10583.833'}, {'Unnamed: 0': '060390110000 SA-2017', 'Unnamed: 1': 'Italy', '2017': '4809.76636', 'Unnamed: 3': '12411.691', 'Unnamed: 4': '4304.02052', 'Unnamed: 5': '11198.063'}, {'Unnamed: 0': '000390110000 SA-2017 ', 'Unnamed: 1': 'Spain', '2017': '2290.65793', 'Unnamed: 3': '6227.269', 'Unnamed: 4': '3269.41957', 'Unnamed: 5': '9118.595'}, {'Unnamed: 0': '0990390110000 SA-2017 ', 'Unnamed: 1': 'Costa Rica', '2017': '1855.70035', 'Unnamed: 3': '4687.714', 'Unnamed: 4': '2668.57892', 'Unnamed: 5': '6425.365'}, {'Unnamed: 0': '0040390110000 SA-2017 ', 'Unnamed: 1': 'Honduras', '2017': '1823.358', 'Unnamed: 3': '4223.521', 'Unnamed: 4': '250.2036', 'Unnamed: 5': '603.392'}])

Again, any help would be greatly appreciated再次,任何帮助将不胜感激

Try:尝试:

df.iloc[:, 0] = df.iloc[:, 0].str.strip()

idx1 = pd.MultiIndex.from_product(
    [("2017",), ("Enero", "Febrero"), ("Valor Export", "Volumen Export")],
    names=("Year", "Month", "Export"),
)
idx2 = pd.MultiIndex.from_frame(df.iloc[3:, :2], names=("Code", "Country"))

df2 = (
    pd.DataFrame(df.iloc[3:, 2:].values, columns=idx1, index=idx2)
    .stack(level=(0, 1))
    .reset_index()
)
df2.columns.name = None
print(df2.sort_values(by=["Year", "Month"]).to_markdown(index=False))

Prints:印刷:

Code代码 Country国家 Year Month Valor Export价值出口 Volumen Export批量出口
080390110000 SA-2017 080390110000 SA-2017 USA美国 2017 2017 Enero埃内罗 29200.1 29200.1 67198.2 67198.2
090390110000 SA-2017 090390110000 SA-2017 Mexico墨西哥 2017 2017 Enero埃内罗 9283.79 9283.79 21638.1 21638.1
010390110000 SA-2017 010390110000 SA-2017 Canada加拿大 2017 2017 Enero埃内罗 8017.56 8017.56 19352.2 19352.2
070390110000 SA-2017 070390110000 SA-2017 Brazil巴西 2017 2017 Enero埃内罗 3786.44 3786.44 8704.87 8704.87
060390110000 SA-2017 060390110000 SA-2017 Italy意大利 2017 2017 Enero埃内罗 4809.77 4809.77 12411.7 12411.7
000390110000 SA-2017 000390110000 SA-2017 Spain西班牙 2017 2017 Enero埃内罗 2290.66 2290.66 6227.27 6227.27
0990390110000 SA-2017 0990390110000 SA-2017 Costa Rica哥斯达黎加 2017 2017 Enero埃内罗 1855.7 1855.7 4687.71 4687.71
0040390110000 SA-2017 0040390110000 SA-2017 Honduras洪都拉斯 2017 2017 Enero埃内罗 1823.36 1823.36 4223.52 4223.52
080390110000 SA-2017 080390110000 SA-2017 USA美国 2017 2017 Febrero费布雷罗 38631.2 38631.2 87962.2 87962.2
090390110000 SA-2017 090390110000 SA-2017 Mexico墨西哥 2017 2017 Febrero费布雷罗 9785.4 9785.4 22863.9 22863.9
010390110000 SA-2017 010390110000 SA-2017 Canada加拿大 2017 2017 Febrero费布雷罗 11137.3 11137.3 27020.4 27020.4
070390110000 SA-2017 070390110000 SA-2017 Brazil巴西 2017 2017 Febrero费布雷罗 4553.71 4553.71 10583.8 10583.8
060390110000 SA-2017 060390110000 SA-2017 Italy意大利 2017 2017 Febrero费布雷罗 4304.02 4304.02 11198.1 11198.1
000390110000 SA-2017 000390110000 SA-2017 Spain西班牙 2017 2017 Febrero费布雷罗 3269.42 3269.42 9118.59 9118.59
0990390110000 SA-2017 0990390110000 SA-2017 Costa Rica哥斯达黎加 2017 2017 Febrero费布雷罗 2668.58 2668.58 6425.36 6425.36
0040390110000 SA-2017 0040390110000 SA-2017 Honduras洪都拉斯 2017 2017 Febrero费布雷罗 250.204 250.204 603.392 603.392

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM