循环遍历列和列值以设置熊猫数据框

Question

我有一个数据框，如下df ：

 ID color   finish  duration
    A1  black   smooth  12
    A2  white   matte   8
    A3  blue    smooth  20
    A4  green   matte   10
    B1  black   smooth  12
    B2  white   matte   8
    B3  blue    smooth   
    B4  green       10
    C1  black   smooth   
    C2  white   matte   8
    C3  blue    smooth   
    C4  green       10

我想根据某些条件生成此数据框的子集。 例如， color= black ， finish = smooth ， duration = 12 ，我得到以下数据帧。

ID  color   finish  duration    score
A1  black   smooth  12  1
B1  black   smooth  12  1

color= blue ， finish = smooth ， duration = 20 ，我得到以下数据帧。

ID  color   finish  duration    score
A3  blue    smooth  20  1
B3  blue    smooth      0.666667
C3  blue    smooth      0.666667

分数以填充的列数/总列数计算 。 我想在熊猫数据框中循环播放。 以下代码为我工作了两列。

list2 = list(df['color'].unique())
list3 = list(df['finish'].unique())

df_final = pd.DataFrame()


for i in range(len(list2)):
   for j in range(len(list3)):
       print 'Current Attribute Value:',list2[i],list3[j]

       gbl["df_"+list2[i]] = df[df.color == list2[i]]
       gbl["df_" + list2[i] + list3[j]] =          
       gbl["df_"+list2[i]].loc[gbl["df_"+list2[i]].finish == list3[j]]
       gbl["df_" + list2[i] + list3[j]]['dfattribval'] = list2[i] + list3[j]
       df_final = df_final.append(gbl["df_" + list2[i] + list3[j]], ignore_index=True)

但是，我无法在列名上循环。 我想做的是

lista = ['color','finish']

df_final = pd.DataFrame()
for a in range(len(lista)):
  for i in range(len(list2)):
    for j in range(len(list3)):
       print 'Current Attribute Value:',lista[a],list2[i],lista[a+1],list3[j]
       gbl["df_"+list2[i]] = df[df.lista[a] == list2[i]]
       gbl["df_" + list2[i] + list3[j]] = gbl["df_"+list2[i]].loc[gbl["df_"+list2[i]].lista[a+1] == list3[j]]
       gbl["df_" + list2[i] + list3[j]]['dfattribval'] = list2[i] + list3[j]
       df_final = df_final.append(gbl["df_" + list2[i] + list3[j]], ignore_index=True)

我收到明显的错误-

AttributeError：“ DataFrame”对象没有属性“ lista”。

任何人都知道如何遍历列名和值。 在此先感谢！

Answer 1

不太确定您的需求，但请考虑使用列表理解来置换列表，以避免嵌套循环并使用数据帧字典。 可能可以调整scorecalc()应用函数以适合您的需求：

colorlist = list(df['color'].unique())
finishlist = list(df['finish'].unique())
durationlist = list(df['duration'].unique())

# ALL COMBINATIONS BETWEEN LISTS
allList = [(c,f, d) for c in colorlist for f in finishlist for d in durationlist]

def scorecalc(row):    
    row['score'] = row['duration'].count()
    return(row)

dfList = []; dfDict = {}
for i in allList:    
    # SUBSET DFS
    tempdf = df[(df['color'] == i[0]) & (df['finish']==i[1]) & (df['duration']==i[2])]

    if len(tempdf) > 0:  # FOR NON-EMPTY DFS
        print('Current Attribute Value:', i[0], i[1], i[2])
        tempdf = tempdf.groupby(['color','finish']).apply(scorecalc)        
        tempdf['score'] = tempdf['score'] / len(tempdf)
        print(tempdf)

        key = str(i[0]) + str(i[1]) + str(i[2])
        dfDict[key] = tempdf    # DICTIONARY OF DFS (USE pd.DataFrame(list(...)) FOR FINAL)
        dfList.append(tempdf)   # LIST OF DFS (USE pd.concat() FOR FINAL DF)

# Current Attribute Value: black smooth 12.0
#   ID  color  finish  duration  score
#0  A1  black  smooth      12.0    1.0
#4  B1  black  smooth      12.0    1.0
#Current Attribute Value: white matte 8.0
#   ID  color finish  duration  score
#1  A2  white  matte       8.0    1.0
#5  B2  white  matte       8.0    1.0
#9  C2  white  matte       8.0    1.0
#Current Attribute Value: blue smooth 20.0
#   ID color  finish  duration  score
#2  A3  blue  smooth      20.0    1.0
#Current Attribute Value: green matte 10.0
#   ID  color finish  duration  score
#3  A4  green  matte      10.0    1.0

循环遍历列和列值以设置熊猫数据框

问题描述

1 个解决方案

解决方案1
1 2016-07-22 03:33:24

循环遍历列和列值以设置熊猫数据框

问题描述

1 个解决方案

解决方案1 1 2016-07-22 03:33:24

解决方案1
1 2016-07-22 03:33:24