有没有一种绕过嵌套for循环的有效方法？

Question

I've got a nested for loop, and I'm wondering if there's a more efficient way to do this, code-wise: 我有一个嵌套的for循环，我想知道是否有更有效的方法来执行此操作，代码方面：

My data looks similar to the following. 我的数据类似于以下内容。

  ID  | DEAD     | 2009-10 | ...    | 2016-10
 -----------------------------------------
  1   | 2018-11  | 5.4     | ...    | 6.5 
  2   | 2014-01  | 0.5     | ...    | 5.2
  ...                      
  N   | 2008-11  | 8.6     | ...    | 1.3

The goal is to replace the values with np.NaN as soon as a product expires (when column 'DEAD' < date), otherwise the values should remain the same. 目标是在产品到期时立即用np.NaN替换值（当列'DEAD'<日期时），否则值应保持不变。

  ID  | DEAD     | 2009-10 | ...    | 2016-10
 -----------------------------------------
  1   | 2018-11  | 5.4     | ...    | 6.5 
  2   | 2014-01  | 0.5     | ...    | NaN
  ...                      
  N   | 2008-11  | 8.6     | ...    | NaN

My initial idea was to apply a nested for loop to check whether the condition 'DEAD' < date is reached. 我最初的想法是应用嵌套的for循环来检查是否达到条件'DEAD' < date 。 The method works for smaller N. But since my data includes over 20,000 rows and 400 columns it requires too much time. 该方法适用于较小的N.但由于我的数据包括超过20,000行和400列，因此需要太多时间。

time = df.columns[2:] # take the header as an index
time = pd.DataFrame(time)
time.columns = ['Dummy']
time['Dummy'] = pd.to_datetime(time.Dummy) # Convert index argument to datetime

df['DEAD'] = pd.to_datetime(tore.DEAD) # Convert column 'DEAD' to datetime



lists = []
for i in range(397):
    row = []
    for j in range(20000):
        if time.iloc[i,0] <= df.iloc[j,0]: 
            newlist = df.iloc[j,i]
        else:
            newlist = np.NaN
        row.append(newlist)
    lists.append(row)

lists = pd.DataFrame(lists)
lists = lists.transpose()

Appreciate any suggestions! 感谢任何建议！

Answer 1

You can try to iterate through each column instead: 您可以尝试迭代每列：

for column_name in df.drop('DEAD', axis=1):
   column_date = pd.to_datetime(column_name)
   df[column_name].mask(df['DEAD']<column_date, inplace=True)

The mask method is also useful here. 掩码方法在这里也很有用。

Answer 2

If your columns are ordered - for example, ascending order by date - then you could avoid some of the looping and checking. 如果您的列是有序的 - 例如，按日期升序 - 那么您可以避免一些循环和检查。

For each row, find the first column that is meets your condition 对于每一行，找到符合条件的第一列
- You could do this with a binary search if you really want to optimize 如果您真的想要优化，可以使用二进制搜索来完成此操作
Get the index of this column; 获取此列的索引; call it i 叫它i
Update all the subsequent columns with index >= i to the NaN value 将index >= i所有后续列更新为NaN值

The update itself is still being done cell-by-cell, which might not perform particularly well. 更新本身仍然是逐个单元地完成的，这可能不是特别好。

You might get better performance if you create a second dataframe with the same dimensions that could be used like a bitmask, containing 0 and 1 values indicating whether the value in the underlying dataframe should be retained or removed. 如果您创建第二个具有相同维度的数据帧（如位掩码），则可能会获得更好的性能，其中包含0和1值，指示是否应保留或删除基础数据帧中的值。

Answer 3

如果这些数据存储在数据库中，您应该直接使用sql，更快。

有没有一种绕过嵌套for循环的有效方法？

问题描述

3 个解决方案

解决方案1
4 已采纳 2019-02-19 22:15:54

解决方案2
1 2019-02-19 22:15:11

解决方案3
0 2019-02-23 12:51:43

有没有一种绕过嵌套for循环的有效方法？

问题描述

3 个解决方案

解决方案1 4 已采纳 2019-02-19 22:15:54

解决方案2 1 2019-02-19 22:15:11

解决方案3 0 2019-02-23 12:51:43

解决方案1
4 已采纳 2019-02-19 22:15:54

解决方案2
1 2019-02-19 22:15:11

解决方案3
0 2019-02-23 12:51:43