[英]Create Dataframes iteratively from columns of another Dataframe
Say I have a df:假设我有一个 df:
df = pd.DataFrame({'A.C.1_v': [1, 2, 3], 'B': ['a', 'b', 'c'], 'C.C.1_f': [4, 5, 6], 'D': ['e', 'f', 'g'], 'E': [7, 8, 9]})
Noticed that the col of interest are those nmae includes "C.1_letter"注意到感兴趣的列是那些 nmae 包括“C.1_letter”
I have built a list corresponding of selected columns: col_list = [A.C.1_v, C.C.1_f]我已经建立了一个对应于所选列的列表:col_list = [A.C.1_v, C.C.1_f]
Objective: Create several dataframe s as follow (in this illustration only 2 dfs are built, but there could be much more in practice) Objective: 创建几个dataframe如下(在这个插图中只构建了 2 个 dfs,但在实践中可能会有更多)
The first df第一个df
So, for df_AC1_v we would have the following output: output 1 without iteration因此,对于 df_AC1_v,我们将有以下 output: output 1 没有迭代
The second df第二个df
My point is to do this iteratively, but so far, what I have attempted does not work.我的观点是迭代地执行此操作,但到目前为止,我所尝试的方法并没有奏效。
Here are the codes I have done.这是我完成的代码。 It bugs in the loop for and I do not understand why.它在循环中出错,我不明白为什么。 First I extract the col list and create a list as follow:首先,我提取 col 列表并创建一个列表,如下所示:
col_list = list(df)
list_c1 = list(filter(lambda x:'.C.1' in x, col_list))
list_c1 = [str(r) for r in list_c1]
in: list_c1
out:['A.C.1_v', 'C.C.1_f']输入: list_c1
输出:['A.C.1_v', 'C.C.1_f']
Second I isolate the 'C.1'其次,我隔离“C.1”
list_c1_bis = []
for element in list_c1:
stock = element.split('.C.1')
list_c1_bis.append(stock)
in: list_c1_bis
out:[['A', '_v'], ['C', '_f']]输入: list_c1_bis
输出:[['A', '_v'], ['C', '_f']]
Until now, I am happy.到现在为止,我很开心。 Where it bugs is the code below:它的错误是下面的代码:
for line in list_c1_bis:
name1 ='df'+'_'+line[0]+'C1'+line[1]
vars()[name1] = df[[list_c1[0],'D','E']]
My outputs are indeed as follow: in: df_AC1_v
==> OK correct out: output1我的输出确实如下: in: df_AC1_v
==> OK correct out: output1
in: df_CC1_f
==> Wrong it has taken the inappropriate column A.C.1_v, instead of expected C.C.1_f output2在: df_CC1_f
==>错误它采用了不合适的列 A.C.1_v,而不是预期的 C.C.1_f output2
Your suggestions are welcome !欢迎您提出建议!
Thanks a lot for your time and help, that will be truly appreciated非常感谢您的时间和帮助,我们将不胜感激
nb: please feel free to modify the first steps that work if you think you have a better solution nb:如果您认为您有更好的解决方案,请随时修改可行的第一步
Kindest regards最亲切的问候
I strongly discouraged you to create variables dynamically with vars
, locals
or globals
.我强烈建议您使用vars
、 locals
或globals
动态创建变量。 Prefer to use dictionary.更喜欢用字典。
Try尝试
for col in df.columns[df.columns.str.contains(r'[A-Z]\.[0-9]_[a-z]')]:
name = col.replace('.', '')
locals()[f"df_{name}"] = df[[col, 'D', 'E']]
Update更新
If f-strings
are not available (Python < 3.6), replace locals()[f"df_{name}"]
by locals()["df_{}".format(name)]
.如果f-strings
不可用(Python < 3.6),请将locals()[f"df_{name}"]
替换为locals()["df_{}".format(name)]
。
Output: Output:
>>> df_AC1_v
A.C.1_v D E
0 1 e 7
1 2 f 8
2 3 g 9
>>> df_CC1_f
C.C.1_f D E
0 4 e 7
1 5 f 8
2 6 g 9
Alternative with dictionary:用字典替代:
dfs = {}
for col in df.columns[df.columns.str.contains(r'[A-Z]\.[0-9]_[a-z]')]:
name = col.replace('.', '')
dfs[name] = df[[col, 'D', 'E']]
Output: Output:
>>> dfs['AC1_v']
A.C.1_v D E
0 1 e 7
1 2 f 8
2 3 g 9
>>> dfs['CC1_f']
C.C.1_f D E
0 4 e 7
1 5 f 8
2 6 g 9
Hi Corralien and first let me thank you for your prompt reply that is truly appreciated.嗨,Corralien,首先让我感谢您的及时回复,我真的很感激。
I have tried the first code我试过第一个代码
for col in df.columns[df.columns.str.contains(r'[A-Z]\.[0-9]_[a-z]')]:
name = col.replace('.', '')
locals()[f"df_{name}"] = df[[col, 'D', 'E']]
But, I have the following error: File "", line 3 locals()[f"df_{name}"] = df[[col, 'D', 'E']] ^ SyntaxError: invalid syntax但是,我有以下错误: File "", line 3 locals()[f"df_{name}"] = df[[col, 'D', 'E']] ^ SyntaxError: invalid syntax
I have also tried the second proposed code that gives the solution under dictionary.我还尝试了第二个建议的代码,它给出了字典下的解决方案。
dfs = {}
for col in df.columns[df.columns.str.contains(r'[A-Z]\.[0-9]_[a-z]')]:
name = col.replace('.', '')
dfs[name] = df[[col, 'D', 'E']]
It runs without error, but when I check the existence of the DFs in: df_AC1_v
它运行没有错误,但是当我检查 DF 是否存在时: df_AC1_v
I have the following errors: NameError: name 'df_AC1_v' is not defined我有以下错误: NameError: name 'df_AC1_v' is not defined
I understand that to get the df, it is required to write: dfs['AC1_v']我知道要获得 df,需要这样写:dfs['AC1_v']
The second solution is acceptable, but I would prefer the first solution if it worked.第二种解决方案是可以接受的,但如果可行的话,我更喜欢第一种解决方案。
Kindest regards最亲切的问候
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.