Python 在 pandas dataframe 上使用 IF 条件循环给我不完整的结果或 KeyError

Question

Given a dataframe:给定一个 dataframe：

d = {'A': [2, 1, 4, 5, 7, 8, 7, 5], 'B': [5, 7, 7, 6, 10, 9, 12, 10]}
testdf = pd.DataFrame(data=d)


    A   B
0   2   5
1   1   7
2   4   7
3   5   6
4   7   10
5   8   9
6   7   3
7   5   2

I'm comparing both columns and I expect to append 'Inside' to array if A > A-1 AND B < B-1, otherwise append 'Broken'.我正在比较两列，如果 A > A-1 AND B < B-1，我希望 append 'Inside' 到数组，否则 append 'Broken'。

array = []

for i in range(1,len(testdf)):
   
    if testdf.A[i] > testdf.A[i-1]:
        
        if testdf.B[i] < testdf.B[i-1]:
        
            array.append('INSIDE')
        
        else:
            
            array.append('BROKEN')

The result is:结果是：

['BROKEN', 'INSIDE', 'BROKEN', 'INSIDE']

But I expect:但我期望：

['BROKEN', 'BROKEN', 'INSIDE', 'BROKEN', 'INSIDE', 'BROKEN', 'BROKEN']

I tried different variations with the starting point of the loop我在循环的起点尝试了不同的变化

for i in range(len(testdf)-1):

but it causes only key errors但它只会导致关键错误

How to improve the code to get it running as expected?如何改进代码以使其按预期运行？

Answer 1

For a pandas based approach, you can use diff :对于基于 pandas 的方法，您可以使用diff ：

m = df.diff()
m = (m.A>0)&(m.B<0)
df['new_col'] = np.where(m, 'INSIDE', 'BROKEN')

print(df)
   A   B new_col
0  2   5  BROKEN
1  1   7  BROKEN
2  4   7  BROKEN
3  5   6  INSIDE
4  7  10  BROKEN
5  8   9  INSIDE
6  7   3  BROKEN
7  5   2  BROKEN

Answer 2

For expected output need to append else statement:对于预期的 output 需要 append else语句：

array = []
for i in range(1,len(testdf)):
    if testdf.A[i] > testdf.A[i-1]:
        if testdf.B[i] < testdf.B[i-1]:
            array.append('INSIDE')
        else:
            array.append('BROKEN')
    else:
        array.append('BROKEN')

Non loop solution, there is also tested first value, so same length like original, if need same output is removed first value by indexing [1:] :非循环解决方案，还测试了第一个值，因此与原始长度相同，如果需要相同的 output 通过索引[1:]删除第一个值：

mask = testdf['A'].gt(testdf['A'].shift()) & testdf['B'].lt(testdf['B'].shift())


out = np.where(mask, 'INSIDE', 'BROKEN').tolist()
print (out)
['BROKEN', 'BROKEN', 'BROKEN', 'INSIDE', 'BROKEN', 'INSIDE', 'BROKEN', 'BROKEN']

out1 = np.where(mask, 'INSIDE', 'BROKEN')[1:].tolist()
print (out1)
['BROKEN', 'BROKEN', 'INSIDE', 'BROKEN', 'INSIDE', 'BROKEN', 'BROKEN']

Answer 3

Here you go:这里是 go：

import numpy as np
import pandas as pd

d = {'A': [2, 1, 4, 5, 7, 8, 7, 5], 'B': [5, 7, 7, 6, 10, 9, 12, 10]}
testdf = pd.DataFrame(data=d)

mask1 = testdf.A > testdf.A.shift()
mask2 = testdf.B < testdf.B.shift()

res = np.where(mask1 & mask2, 'INSIDE', 'BROKEN')[1:]
print(res)

Output: Output：

['BROKEN' 'BROKEN' 'INSIDE' 'BROKEN' 'INSIDE' 'BROKEN' 'BROKEN']

Answer 4

You can put the whole dataframe into an array like this Inside will come only once as the 6th element in the B column is less than the 5th element您可以将整个 dataframe 放入这样的数组中，里面只会出现一次，因为 B 列中的第 6 个元素小于第 5 个元素

import pandas as pd

d = {'A': [2, 1, 4, 5, 7, 8, 7, 5], 'B': [5, 7, 7, 6, 10, 9, 12, 10]}
testdf = pd.DataFrame(data=d)

dataframearray = [[],[]]
array = []
for number in d['A']:
    dataframearray[0].append(number)

for number in d['B']:
    dataframearray[1].append(number)

x = 1
while x < len(dataframearray[0])-1:
    x += 1
    if dataframearray[0][x] > dataframearray[0][x-1] and dataframearray[1][x] > dataframearray[1][x-1]:
        array.append('INSIDE')

    else:
        array.append('BROKEN')

Hope this helps希望这可以帮助

Python 在 pandas dataframe 上使用 IF 条件循环给我不完整的结果或 KeyError

问题描述

4 个解决方案

解决方案1
2 2020-07-10 08:32:15

解决方案2
1 已采纳 2020-07-10 08:26:17

解决方案3
1 2020-07-10 08:39:23

解决方案4
0 2020-07-10 09:01:05

Python 在 pandas dataframe 上使用 IF 条件循环给我不完整的结果或 KeyError

问题描述

4 个解决方案

解决方案1 2 2020-07-10 08:32:15

解决方案2 1 已采纳 2020-07-10 08:26:17

解决方案3 1 2020-07-10 08:39:23

解决方案4 0 2020-07-10 09:01:05

解决方案1
2 2020-07-10 08:32:15

解决方案2
1 已采纳 2020-07-10 08:26:17

解决方案3
1 2020-07-10 08:39:23

解决方案4
0 2020-07-10 09:01:05