从另一个 Dataframe 的列迭代创建数据帧

Question

Say I have a df:假设我有一个 df：

df = pd.DataFrame({'A.C.1_v': [1, 2, 3], 'B': ['a', 'b', 'c'], 'C.C.1_f': [4, 5, 6], 'D': ['e', 'f', 'g'], 'E': [7, 8, 9]})

Noticed that the col of interest are those nmae includes "C.1_letter"注意到感兴趣的列是那些 nmae 包括“C.1_letter”

I have built a list corresponding of selected columns: col_list = [A.C.1_v, C.C.1_f]我已经建立了一个对应于所选列的列表：col_list = [A.C.1_v, C.C.1_f]

Objective: Create several dataframe s as follow (in this illustration only 2 dfs are built, but there could be much more in practice) Objective: 创建几个dataframe如下（在这个插图中只构建了 2 个 dfs，但在实践中可能会有更多）

The first df第一个df

Takes the name with the following convention name: "df_AC1_v"采用具有以下约定名称的名称：“df_AC1_v”
Is composed of the values of column A.C.1_v and the values of columns D and E由列 A.C.1_v 的值以及列 D 和 E 的值组成

So, for df_AC1_v we would have the following output: output 1 without iteration因此，对于 df_AC1_v，我们将有以下 output： output 1 没有迭代

The second df第二个df

Takes the name with the following convention name: "df_CC1_f"采用具有以下约定名称的名称：“df_CC1_f”
Is composed of the values of column C.C.1_f and the values of columns D and E So, for df_CC1_f, we would have the following output: Output2 without iteration由列 C.C.1_f 的值以及列 D 和 E 的值组成因此，对于 df_CC1_f，我们将具有以下 output： Output2 without iteration

My point is to do this iteratively, but so far, what I have attempted does not work.我的观点是迭代地执行此操作，但到目前为止，我所尝试的方法并没有奏效。

Here are the codes I have done.这是我完成的代码。 It bugs in the loop for and I do not understand why.它在循环中出错，我不明白为什么。 First I extract the col list and create a list as follow:首先，我提取 col 列表并创建一个列表，如下所示：

col_list = list(df)
list_c1 = list(filter(lambda x:'.C.1' in x, col_list))
list_c1 = [str(r) for r in list_c1]

in: list_c1 out:['A.C.1_v', 'C.C.1_f']输入： list_c1输出：['A.C.1_v', 'C.C.1_f']

Second I isolate the 'C.1'其次，我隔离“C.1”

list_c1_bis = []
for element in list_c1:
    stock = element.split('.C.1')
    list_c1_bis.append(stock)

in: list_c1_bis out:[['A', '_v'], ['C', '_f']]输入： list_c1_bis输出：[['A', '_v'], ['C', '_f']]

Until now, I am happy.到现在为止，我很开心。 Where it bugs is the code below:它的错误是下面的代码：

for line in list_c1_bis:
    name1 ='df'+'_'+line[0]+'C1'+line[1]
    vars()[name1] =  df[[list_c1[0],'D','E']]

My outputs are indeed as follow: in: df_AC1_v ==> OK correct out: output1我的输出确实如下： in: df_AC1_v ==> OK correct out: output1

in: df_CC1_f ==> Wrong it has taken the inappropriate column A.C.1_v, instead of expected C.C.1_f output2在： df_CC1_f ==>错误它采用了不合适的列 A.C.1_v，而不是预期的 C.C.1_f output2

Your suggestions are welcome !欢迎您提出建议！

Thanks a lot for your time and help, that will be truly appreciated非常感谢您的时间和帮助，我们将不胜感激

nb: please feel free to modify the first steps that work if you think you have a better solution nb：如果您认为您有更好的解决方案，请随时修改可行的第一步

Kindest regards最亲切的问候

Answer 1

I strongly discouraged you to create variables dynamically with vars , locals or globals .我强烈建议您使用vars 、 locals或globals动态创建变量。 Prefer to use dictionary.更喜欢用字典。

Try尝试

for col in df.columns[df.columns.str.contains(r'[A-Z]\.[0-9]_[a-z]')]:
    name = col.replace('.', '')
    locals()[f"df_{name}"] = df[[col, 'D', 'E']]

Update更新

If f-strings are not available (Python < 3.6), replace locals()[f"df_{name}"] by locals()["df_{}".format(name)] .如果f-strings不可用（Python < 3.6），请将locals()[f"df_{name}"]替换为locals()["df_{}".format(name)] 。

Output: Output：

>>> df_AC1_v
   A.C.1_v  D  E
0        1  e  7
1        2  f  8
2        3  g  9

>>> df_CC1_f
   C.C.1_f  D  E
0        4  e  7
1        5  f  8
2        6  g  9

Alternative with dictionary:用字典替代：

dfs = {}
for col in df.columns[df.columns.str.contains(r'[A-Z]\.[0-9]_[a-z]')]:
    name = col.replace('.', '')
    dfs[name] = df[[col, 'D', 'E']]

Output: Output：

>>> dfs['AC1_v']
   A.C.1_v  D  E
0        1  e  7
1        2  f  8
2        3  g  9

>>> dfs['CC1_f']
   C.C.1_f  D  E
0        4  e  7
1        5  f  8
2        6  g  9

Answer 2

Hi Corralien and first let me thank you for your prompt reply that is truly appreciated.嗨，Corralien，首先让我感谢您的及时回复，我真的很感激。

I have tried the first code我试过第一个代码

for col in df.columns[df.columns.str.contains(r'[A-Z]\.[0-9]_[a-z]')]:
    name = col.replace('.', '')
    locals()[f"df_{name}"] = df[[col, 'D', 'E']]

But, I have the following error: File "", line 3 locals()[f"df_{name}"] = df[[col, 'D', 'E']] ^ SyntaxError: invalid syntax但是，我有以下错误： File "", line 3 locals()[f"df_{name}"] = df[[col, 'D', 'E']] ^ SyntaxError: invalid syntax

I have also tried the second proposed code that gives the solution under dictionary.我还尝试了第二个建议的代码，它给出了字典下的解决方案。

dfs = {}
for col in df.columns[df.columns.str.contains(r'[A-Z]\.[0-9]_[a-z]')]:
    name = col.replace('.', '')
    dfs[name] = df[[col, 'D', 'E']]

It runs without error, but when I check the existence of the DFs in: df_AC1_v它运行没有错误，但是当我检查 DF 是否存在时： df_AC1_v

I have the following errors: NameError: name 'df_AC1_v' is not defined我有以下错误： NameError: name 'df_AC1_v' is not defined

I understand that to get the df, it is required to write: dfs['AC1_v']我知道要获得 df，需要这样写：dfs['AC1_v']

The second solution is acceptable, but I would prefer the first solution if it worked.第二种解决方案是可以接受的，但如果可行的话，我更喜欢第一种解决方案。

Kindest regards最亲切的问候

从另一个 Dataframe 的列迭代创建数据帧

问题描述

2 个解决方案

解决方案1
2 已采纳 2022-03-10 10:20:23

解决方案2
0 2022-03-10 10:45:57

从另一个 Dataframe 的列迭代创建数据帧

问题描述

2 个解决方案

解决方案1 2 已采纳 2022-03-10 10:20:23

解决方案2 0 2022-03-10 10:45:57

解决方案1
2 已采纳 2022-03-10 10:20:23

解决方案2
0 2022-03-10 10:45:57