[英]Python :Nested for loops fail on the second loop
I am trying to divide a large data set into smaller parts for an analysis. 我正在尝试将大型数据集划分为较小的部分以进行分析。 I been using a for-loop to divide the data set before implementing the decision trees.
在实现决策树之前,我一直使用for循环来划分数据集。 Please see a small version of the data set below:
请参阅下面的小数据集:
ANZSCO4_CODE Skill_name Cluster date
1110 computer S 1
1110 communication C 1
1110 SAS S 2
1312 IT support S 1
1312 SAS C 2
1312 IT support S 1
1312 SAS C 1
First step I create an empty dictionary: 第一步我创建一个空字典:
d = {}
and the lists: 和列表:
list = [1110, 1322, 2111]
s_type = ['S','C']
Then run the following loop: 然后运行以下循环:
for i in list:
d[i]=pd.DataFrame(df1[df1['ANZSCO4_CODE'].isin([i])] )
The result is a dictionary with 2 data sets inside. 结果是一个包含2个数据集的字典。
As a next step I would like to subdivide the data sets into S and C. I run the following code: 作为下一步,我想将数据集细分为S和C.我运行以下代码:
for i in list:
d[i]=pd.DataFrame(df1[df1['ANZSCO4_CODE'].isin([i])] )
for b in s_type:
d[i]= d[i][d[i]['SKILL_CLUSTER_TYPE']==b]
As a final result I would expect to have 4 separate data sets, being: 1110 x S
, 1110 x C
, 1312 x S
and 1312 and C
. 作为最终结果,我希望有4个独立的数据集:
1110 x S
, 1110 x C
, 1312 x S
和1312 and C
However when I implement the second code I get only 2 data sets inside the dictionary and they are empty. 但是,当我实现第二个代码时,我在字典中只获得了2个数据集并且它们是空的。
Maybe something like this works: 也许这样的工作:
from collections import defaultdict
d = defaultdict(pd.DataFrame)
# don't name your list "list"
anzco_list = [1110, 1312]
s_type = ['S','C']
for i in anzco_list:
for b in s_type:
d[i][b] = df1[(df1['ANZSCO4_CODE'] == i) & (df1['SKILL_CLUSTER_TYPE'] == b)]
Then you can access your DataFrames like this: 然后,您可以像这样访问您的DataFrame:
d[1112]['S']
I think there was empty DataFrames, because in data was not values from list
called L
(Dont use variable name list, because python reserved word). 我认为有空的DataFrames,因为在数据中没有来自
list
值L
(不要使用变量名列表,因为python保留字)。
from itertools import product
L = [1110, 1312, 2111]
s_type = ['S','C']
Then create all combinations all lists: 然后创建所有列表的所有组合:
comb = list(product(L, s_type))
print (comb)
[(1110, 'S'), (1110, 'C'), (1312, 'S'), (1312, 'C'), (2111, 'S'), (2111, 'C')]
And last create dictionary of DataFrame
s: 最后创建
DataFrame
的字典:
d = {}
for i, j in comb:
d['{}x{}'.format(i, j)] = df1[(df1['ANZSCO4_CODE'] == i) & (df1['Cluster'] == j)]
Or use dictionary comprehension: 或者使用字典理解:
d = {'{}x{}'.format(i, j): df1[(df1['ANZSCO4_CODE'] == i) & (df1['Cluster'] == j)]
for i, j in comb}
print (d['1110xS'])
ANZSCO4_CODE Skill_name Cluster
0 1110 computer S
2 1110 SAS S
EDIT: 编辑:
If need all combinations of possible data by columns use groupby
: 如果需要列的所有可能数据组合,请使用
groupby
:
d = {'{}x{}x{}'.format(i,j,k): df2
for (i,j, k), df2 in df1.groupby(['ANZSCO4_CODE','Cluster','date'])}
print (d)
{'1110xCx1': ANZSCO4_CODE Skill_name Cluster date
1 1110 communication C 1, '1110xSx1': ANZSCO4_CODE Skill_name Cluster date
0 1110 computer S 1, '1110xSx2': ANZSCO4_CODE Skill_name Cluster date
2 1110 SAS S 2, '1312xCx1': ANZSCO4_CODE Skill_name Cluster date
6 1312 SAS C 1, '1312xCx2': ANZSCO4_CODE Skill_name Cluster date
4 1312 SAS C 2, '1312xSx1': ANZSCO4_CODE Skill_name Cluster date
3 1312 IT support S 1
5 1312 IT support S 1}
print (d.keys())
dict_keys(['1110xCx1', '1110xSx1', '1110xSx2', '1312xCx1', '1312xCx2', '1312xSx1'])
Another different approach is if need processes each group is use GroupBy.apply
: 另一种不同的方法是,如果需要进程,每个组都使用
GroupBy.apply
:
def func(x):
print (x)
#some code for process each group
return x
ANZSCO4_CODE Skill_name Cluster date
1 1110 communication C 1
ANZSCO4_CODE Skill_name Cluster date
1 1110 communication C 1
ANZSCO4_CODE Skill_name Cluster date
0 1110 computer S 1
ANZSCO4_CODE Skill_name Cluster date
2 1110 SAS S 2
ANZSCO4_CODE Skill_name Cluster date
6 1312 SAS C 1
ANZSCO4_CODE Skill_name Cluster date
4 1312 SAS C 2
ANZSCO4_CODE Skill_name Cluster date
3 1312 IT support S 1
5 1312 IT support S 1
df2 = df1.groupby(['ANZSCO4_CODE','Cluster','date']).apply(func)
print (df2)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.