简体   繁体   English

Python 在 pandas dataframe 上使用 IF 条件循环给我不完整的结果或 KeyError

[英]Python Loop with IF condition on pandas dataframe gives me incomplete result or KeyError

Given a dataframe:给定一个 dataframe:

d = {'A': [2, 1, 4, 5, 7, 8, 7, 5], 'B': [5, 7, 7, 6, 10, 9, 12, 10]}
testdf = pd.DataFrame(data=d)


    A   B
0   2   5
1   1   7
2   4   7
3   5   6
4   7   10
5   8   9
6   7   3
7   5   2

I'm comparing both columns and I expect to append 'Inside' to array if A > A-1 AND B < B-1, otherwise append 'Broken'.我正在比较两列,如果 A > A-1 AND B < B-1,我希望 append 'Inside' 到数组,否则 append 'Broken'。

array = []

for i in range(1,len(testdf)):
   
    if testdf.A[i] > testdf.A[i-1]:
        
        if testdf.B[i] < testdf.B[i-1]:
        
            array.append('INSIDE')
        
        else:
            
            array.append('BROKEN')

The result is:结果是:

['BROKEN', 'INSIDE', 'BROKEN', 'INSIDE']

But I expect:但我期望:

['BROKEN', 'BROKEN', 'INSIDE', 'BROKEN', 'INSIDE', 'BROKEN', 'BROKEN']

I tried different variations with the starting point of the loop我在循环的起点尝试了不同的变化

for i in range(len(testdf)-1):

but it causes only key errors但它只会导致关键错误

How to improve the code to get it running as expected?如何改进代码以使其按预期运行?

For a pandas based approach, you can use diff :对于基于 pandas 的方法,您可以使用diff

m = df.diff()
m = (m.A>0)&(m.B<0)
df['new_col'] = np.where(m, 'INSIDE', 'BROKEN')

print(df)
   A   B new_col
0  2   5  BROKEN
1  1   7  BROKEN
2  4   7  BROKEN
3  5   6  INSIDE
4  7  10  BROKEN
5  8   9  INSIDE
6  7   3  BROKEN
7  5   2  BROKEN

For expected output need to append else statement:对于预期的 output 需要 append else语句:

array = []
for i in range(1,len(testdf)):
    if testdf.A[i] > testdf.A[i-1]:
        if testdf.B[i] < testdf.B[i-1]:
            array.append('INSIDE')
        else:
            array.append('BROKEN')
    else:
        array.append('BROKEN')

Non loop solution, there is also tested first value, so same length like original, if need same output is removed first value by indexing [1:] :非循环解决方案,还测试了第一个值,因此与原始长度相同,如果需要相同的 output 通过索引[1:]删除第一个值:

mask = testdf['A'].gt(testdf['A'].shift()) & testdf['B'].lt(testdf['B'].shift())


out = np.where(mask, 'INSIDE', 'BROKEN').tolist()
print (out)
['BROKEN', 'BROKEN', 'BROKEN', 'INSIDE', 'BROKEN', 'INSIDE', 'BROKEN', 'BROKEN']

out1 = np.where(mask, 'INSIDE', 'BROKEN')[1:].tolist()
print (out1)
['BROKEN', 'BROKEN', 'INSIDE', 'BROKEN', 'INSIDE', 'BROKEN', 'BROKEN']

Here you go:这里是 go:

import numpy as np
import pandas as pd

d = {'A': [2, 1, 4, 5, 7, 8, 7, 5], 'B': [5, 7, 7, 6, 10, 9, 12, 10]}
testdf = pd.DataFrame(data=d)

mask1 = testdf.A > testdf.A.shift()
mask2 = testdf.B < testdf.B.shift()

res = np.where(mask1 & mask2, 'INSIDE', 'BROKEN')[1:]
print(res)

Output: Output:

['BROKEN' 'BROKEN' 'INSIDE' 'BROKEN' 'INSIDE' 'BROKEN' 'BROKEN']

You can put the whole dataframe into an array like this Inside will come only once as the 6th element in the B column is less than the 5th element您可以将整个 dataframe 放入这样的数组中,里面只会出现一次,因为 B 列中的第 6 个元素小于第 5 个元素

import pandas as pd

d = {'A': [2, 1, 4, 5, 7, 8, 7, 5], 'B': [5, 7, 7, 6, 10, 9, 12, 10]}
testdf = pd.DataFrame(data=d)

dataframearray = [[],[]]
array = []
for number in d['A']:
    dataframearray[0].append(number)

for number in d['B']:
    dataframearray[1].append(number)

x = 1
while x < len(dataframearray[0])-1:
    x += 1
    if dataframearray[0][x] > dataframearray[0][x-1] and dataframearray[1][x] > dataframearray[1][x-1]:
        array.append('INSIDE')

    else:
        array.append('BROKEN')

Hope this helps希望这可以帮助

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM