[英]Split DataFrame column into two + MultiIndex
我有代表人们跨越国界的DataFrame 流
flows = DataFrame([[1,2],[3,4]], index=['Monday', 'Tuesday'], columns=['CZ>DE', 'HU>AT'])
CZ>DE HU>AT
Monday 1 2
Tuesday 3 4
我想将每列分为两栏,分别代表每个边界的国家增量/减量。 我当前的代码和期望的结果是这样的
country_from = lambda x: x[:2]
country_to = lambda x: x[3:]
flows_from = -1*flows.copy()
flows_from.columns = pd.MultiIndex.from_tuples([(border, country_from(border)) for border in flows.columns])
flows_to = flows.copy()
flows_to.columns = pd.MultiIndex.from_tuples([(border, country_to(border)) for border in flows.columns])
country_flows = pd.concat([flows_from, flows_to], axis=1)
country_flows = country_flows.groupby(level=[0,1], axis=1).sum()
CZ>DE HU>AT
CZ DE AT HU
Monday -1 1 2 -2
Tuesday -3 3 4 -4
这个解决方案很冗长,我怀疑它可以做得更好。 会有一个主意吗?
您可以创建元组来定义MultiIndex的级别:
tuples = [(i,k) for i, j in zip(flows.columns,[i.split('>') for i in flows.columns]) for k in j]
x = flows.values
然后:
data = np.multiply(np.tile([-1,1], x.shape), np.repeat(x, 2, axis=1))
pd.DataFrame(data=data, index=flows.index, columns=pd.MultiIndex.from_tuples(tuples))
产量:
CZ>DE HU>AT
CZ DE HU AT
Monday -1 1 -2 2
Tuesday -3 3 -4 4
好吧,在这里受到启发之后, python / pandas:如何将两个数据帧与具有分层列索引的一个数据帧组合在一起? 我通过串联DataFrames字典解决了这个问题。 具有我的原始映射lambda函数
country_from = lambda x: x[:2]
country_to = lambda x: x[3:]
结果可以在一行上获得
pd.concat({col:pd.DataFrame({country_from(col):-1*flows[col], country_to(col):flows[col]}) for col in flows.columns}, axis=1)
CZ>DE HU>AT
CZ DE AT HU
Monday -1 1 2 -2
Tuesday -3 3 4 -4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.