简体   繁体   English

根据列标题前缀拆分dataFrames

[英]Split dataFrames based on column header prefix

I have a data frame where the column names share a common element, other columns have been generated with a suffix to this common element. 我有一个数据框,其中的列名称共享一个公共元素,而其他列的生成都带有此公共元素的后缀。 I have a list of these elements that is around ~100 entries. 我有这些元素的列表,大约有100个条目。 I'd like to iteratively slice the large df using this list, transform the sub-df's by grouping and eventually concatenate them back together. 我想使用此列表来迭代切片大型df,通过分组来转换子df,最后将它们重新连接在一起。

I was thinking of using a dictionary approach-- using the list as keys, and then defining the columns sharing this element as values. 我当时正在考虑使用字典方法-将列表用作键,然后将共享该元素的列定义为值。 I am not sure how to implement this. 我不确定如何执行此操作。 I have copied a simplified version to illustrate what I'd like to scale up. 我复制了一个简化版本,以说明我想要扩展的内容。 In reality there'd be around 100 keys each with 20 associated columns. 实际上,大约有100个键,每个键具有20个关联的列。

   A A_1 A_2 A_3  B B_1 B_2 B_3
0  1   e   f   g  1   x   y   z
1  2   e   f   g  2   x   y   z
2  3   e   f   g  3   x   y   z
3  3   e   f   g  3   x   y   z
4  3   e   f   g  4   x   y   z
5  3   e   f   g  4   x   y   z

df_list = ['A','B']

df_A = df[df.columns[df.columns.to_series().str.contains('A')]]

df_B = df[df.columns[df.columns.to_series().str.contains('B')]]

calc_A = df_A.groupby(['A']).head(1)
print(calc_A)

   A A_1 A_2 A_3
0  1   e   f   g
1  2   e   f   g
2  3   e   f   g


calc_B = df_B.groupby(['B']).head(1)
print(calc_B)

   B B_1 B_2 B_3
0  1   x   y   z
1  2   x   y   z
2  3   x   y   z
4  4   x   y   z

Please advise how to structure this dictionary, iterating through the list to slice the df and assign columns sharing the key as values for the new sub-df. 请建议如何构造该字典,遍历列表以对df进行切片,并将共享密钥的列分配为新sub-df的值。 Thank you. 谢谢。

IIUC, you can group on column prefixes, and then initialise a dictionary: IIUC,您可以对列前缀进行分组,然后初始化字典:

d = {}
for i, g in df.groupby(by=lambda x: x.split('_')[0], axis=1):
    d[i] = g.groupby(i).head(1)

You could also do this using a dict comprehension : 您也可以使用dict理解来做到这一点:

d = {
        i : g.groupby(i).head(1) 
        for (i, g) in df.groupby(by=lambda x: x.split('_')[0], axis=1)
}

for k, v in d.items():
    print(v, '\n')

   A A_1 A_2 A_3
0  1   e   f   g
1  2   e   f   g
2  3   e   f   g 

   B B_1 B_2 B_3
0  1   x   y   z
1  2   x   y   z
2  3   x   y   z
4  4   x   y   z 

d.keys()
dict_keys(['A', 'B'])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM