如果没有逐行迭代数据帧，这需要很长时间，我如何检查许多行是否都满足条件？

Question

I want to do the following, but obviously I realise that this kind of iterative method is very slow with large DataFrames, what other solutions are there to this problem?:我想做以下事情，但显然我意识到这种迭代方法对于大型 DataFrames 非常慢，还有什么其他解决方案可以解决这个问题？：

for i in range(len(df)):
    for n in range(1001):
        if df["Close"][(i+n)] > df["MA"][i+n]:
            df["Strategy 1"][i] = "Buy"

What I would expect the code above to do is:我希望上面的代码做的是：

Sub in n from 0 to 1,000 into line 3, with an i of 0 , and then if the condition in line 3 held for each n in the range of 0 to 1,000 then it would go on and carry out the operation in line 4.将n 从 0 到 1,000 代入第 3 行，其中i 为 0 ，然后如果第 3 行中的条件对于 0 到 1,000 范围内的每个 n 都成立，那么它将继续执行第 4 行中的操作。

After this it would take i of 1 and then sub in n from 0 to 1,000 into line 3, and if the condition held for all n in that range then it would carry out line 4.在此之后，它将把i取为 1 ，然后将n 从 0 到 1,000放入第 3 行，如果该条件适用于该范围内的所有 n，则它将执行第 4 行。

After this it would take i of 2 and then sub in n from 0 to 1,000 into line 3, and if the condition held for all n in that range then it would carry out line 4.在此之后，它将取i 为 2 ，然后将n 从 0 到 1,000放入第 3 行，如果该条件适用于该范围内的所有 n，则它将执行第 4 行。

After this it would take i of 3 and then sub in n from 0 to 1,000 into line 3, and if the condition held for all n in that range then it would carry out line 4.在此之后，它将取3 中的 i，然后将n 从 0 到 1,000放入第 3 行，如果该条件适用于该范围内的所有 n，则它将执行第 4 行。

... ... ......

After this it would take i of len(df) and then sub in n from 0 to 1,000 into line 3, and if the condition held for all n in that range then it would carry out line 4.在此之后，它将使用len(df) 的 i ，然后将n 从 0 到 1,000放入第 3 行，如果该条件适用于该范围内的所有 n，则它将执行第 4 行。

Regardless of if the code presented above does what i'd expect or not, is there a much faster way to compute this for very large multi Gigabyte DataFrames?不管上面提供的代码是否符合我的预期，对于非常大的多 GB 数据帧，是否有更快的方法来计算它？

Answer 1

Using the .apply function would be faster.使用 .apply 函数会更快。 For a general example...对于一般示例...

import pandas as pd

# only required to create the test dataframe in this example
import numpy as np

# create a dataframe for testing using the numpy import above
df = pd.DataFrame(np.random.randint(100,size=(10, )),columns=['A'])

# create a new column based on column 'A' but moving the column 'across and up'
df['NextRow'] = df['A'].shift(-1)

# create a function to do something, anything, and return that thing
def doMyThingINeedToDo(num, numNext):
#     'num' is going to be the value of whatever is in column 'A' per row 
#     as the .apply function runs below and 'numNext' is plus one.
    if num >= 50 and numNext >= 75:
        return 'Yes'
    else:
        return '...No...'

# create a new column called 'NewColumnName' based on the existing column 'A' and apply the
# function above, whatever it does, to the frame per row.
df['NewColumnName'] = df.apply(lambda row : doMyThingINeedToDo(row['A'], row['NextRow']), axis = 1)

# output the frame and notice the new column
print(df)

Outputs:输出：

    A  NextRow NewColumnName
0  67     84.0           Yes
1  84     33.0      ...No...
2  33     59.0      ...No...
3  59     85.0           Yes
4  85     39.0      ...No...
5  39     81.0      ...No...
6  81     76.0           Yes
7  76     83.0           Yes
8  83     60.0      ...No...
9  60      NaN      ...No...

The main point is that you can separate what exactly you want to do per row and contain it in a function (that can be tweaked and updated as required) and just call that function for all rows on a frame when required.主要的一点是，您可以将每行具体要做的事情分开，并将其包含在一个函数中（可以根据需要进行调整和更新），并在需要时为帧上的所有行调用该函数。

Answer 2

You can accomplish what you are attempting with only your close data.您可以仅使用接近的数据来完成您正在尝试的操作。 Calculating the MA and 1000 conditions on the fly via vectorization.通过矢量化动态计算 MA 和 1000 条件。 Maybe try this:也许试试这个：

import numpy as np

ma_window = 1000 
n = 1000 

df['Strategy 1'] = \
    np.where( \
        (df['close'] > df['close'].rolling(window=ma_window).mean()).rolling(window=n).mean() == 1, \
             'buy','')

Play around with this and see if it might work for you.试试这个，看看它是否适合你。

Answer 3

First, let me state how I understand your rule.首先，让我说明我如何理解你的规则。 As near as I can tell you are trying to get a value of "Buy" in the "Strategy 1" column of the df only if there are 1000 consecutive cases where MA was greater than the Close preceding that time.我可以告诉您，只有在连续 1000 次MA大于该时间之前的Close价的情况下，您才会尝试在 df 的“策略 1”列中获得“买入”值。 I think you can get that done simply by using a rolling sum on the comparison:我认为您可以通过在比较中使用滚动总和来完成：

import pandas as pd
import numpy as np

# build some repeatable sample data
np.random.seed(1)
df = pd.DataFrame({'close': np.cumsum(np.random.randn(10000))})
df['MA'] = df['close'].rolling(1000).mean()

# Apply strategy
npoints = 1000

df['Strategy 1'] = float('nan')
buypoints = (df['MA'] > df['close']).rolling(npoints).sum() == npoints
df.loc[buypoints, "Strategy 1"] = "Buy"

# just for visualisation show where the Buys would be
df['Buypoints'] = buypoints*10
df.plot()

This comes out like this (with the same seed it should look the same on your machine too)这是这样的（使用相同的种子，它在您的机器上也应该看起来相同）

Answer 4

Iteration is a last resort with Pandas.迭代是 Pandas 的最后手段。

The solution you are looking for is coming from numpy:您正在寻找的解决方案来自 numpy：

import numpy as np
df["Strategy 1"] = np.where(df["Close"] > df["MA"], "Buy", df["Strategy 1"])

如果没有逐行迭代数据帧，这需要很长时间，我如何检查许多行是否都满足条件？

问题描述

4 个解决方案

解决方案1
1 2021-07-17 20:44:08

解决方案2
1 2021-07-18 01:40:16

解决方案3
1 已采纳 2021-07-18 14:01:53

解决方案4
0 2021-07-17 20:58:59

如果没有逐行迭代数据帧，这需要很长时间，我如何检查许多行是否都满足条件？

问题描述

4 个解决方案

解决方案1 1 2021-07-17 20:44:08

解决方案2 1 2021-07-18 01:40:16

解决方案3 1 已采纳 2021-07-18 14:01:53

解决方案4 0 2021-07-17 20:58:59

解决方案1
1 2021-07-17 20:44:08

解决方案2
1 2021-07-18 01:40:16

解决方案3
1 已采纳 2021-07-18 14:01:53

解决方案4
0 2021-07-17 20:58:59