在下n行中迭代Pandas數據框

Question

我有這個熊貓數據框df ：

station a_d direction
   a     0      0
   a     0      0
   a     1      0
   a     0      0
   a     1      0
   b     0      0
   b     1      0
   c     0      0
   c     1      0
   c     0      1
   c     1      1
   b     0      1
   b     1      1
   b     0      1
   b     1      1
   a     0      1
   a     1      1
   a     0      0
   a     1      0

我將分配一個value_id，該值在方向值更改時遞增，並且僅首先引用最后一對站值，它以不同的[0,1] a_d值更改。 我可以忽略最后一個（在本示例中為最后兩個）數據幀行。 換一種說法：

station a_d direction id_value
   a     0      0
   a     0      0
   a     1      0
   a     0      0        0
   a     1      0        0
   b     0      0        0
   b     1      0        0
   c     0      0        0
   c     1      0        0
   c     0      1        1
   c     1      1        1
   b     0      1         
   b     1      1        
   b     0      1        1
   b     1      1        1
   a     0      1        1
   a     1      1        1
   a     0      0
   a     1      0

我使用df.iterrows()編寫以下腳本：

df['value_id'] = ""
value_id = 0
row_iterator = df.iterrows()
for i, row in row_iterator:
    if i == 0:
        continue
    elif (df.loc[i-1,'direction'] != df.loc [i,'direction']):
        value_id += 1
    for z in range(1,11):
        if i+z >= len(df)-1:
            break
        elif (df.loc[i+1,'a_d'] == df.loc [i,'a_d']):
            break
        elif (df.loc[i+1,'a_d'] != df.loc [i,'a_d']) and (df.loc [i+2,'station'] == df.loc [i,'station'] and (df.loc [i+2,'direction'] == df.loc [i,'direction'])):
            break
        else:
            df.loc[i,'value_id'] = value_id

它可以工作，但是非常慢。 對於10*10^6行數據框，我需要一種更快的方法。 任何想法？

@ user5402代碼運行良好，但我注意到，在else之后break else減少計算時間：

df['value_id'] = ""
value_id = 0
row_iterator = df.iterrows()
for i, row in row_iterator:
    if i == 0:
        continue
    elif (df.loc[i-1,'direction'] != df.loc [i,'direction']):
        value_id += 1
    for z in range(1,11):
        if i+z >= len(df)-1:
            break
        elif (df.loc[i+1,'a_d'] == df.loc [i,'a_d']):
            break
        elif (df.loc[i+1,'a_d'] != df.loc [i,'a_d']) and (df.loc [i+2,'station'] == df.loc [i,'station'] and (df.loc [i+2,'direction'] == df.loc [i,'direction'])):
            break
        else:
            df.loc[i,'value_id'] = value_id
            break

Answer 1

您沒有有效地在內部for循環中使用z 。 您永遠不會訪問第i+z行。 您可以訪問第i行，第i+1行和第i+2行，但不能訪問第i+z行。

您可以將內部for循環替換為：

  if i+1 > len(df)-1:
    pass
  elif (df.loc[i+1,'a_d'] == df.loc [i,'a_d']):
    pass
  elif (df.loc [i+2,'station'] == df.loc [i,'station'] and (df.loc [i+2,'direction'] == df.loc [i,'direction'])):
    pass
  else:
    df.loc[i,'value_id'] = value_id

請注意，我還稍微優化了第二個elif因為那一點您已經知道df.loc[i+1,'a_d'] df.loc [i,'a_d'] df.loc[i+1,'a_d']不等於df.loc [i,'a_d'] 。

不必循環z將節省大量時間。

在下n行中迭代Pandas數據框

問題描述

1 個解決方案

解決方案1
2 已采納 2014-12-14 17:50:08

在下n行中迭代Pandas數據框

問題描述

1 個解決方案

解決方案1 2 已采納 2014-12-14 17:50:08

解決方案1
2 已采納 2014-12-14 17:50:08