简体   繁体   English

在另一列中查找列负值 - dataframe

[英]Find columns negative value in another column - dataframe

I have this code:我有这个代码:

test = {"number": ['1555','1666','1777', '1888', '1999'],
        "order_amount": ['100.00','200.00','-200.00', '300.00', '-150.00'],
        "number_of_refund": ['','','1666', '', '1888']
    }

df = pd.DataFrame(test)

Which returns the following dataframe:它返回以下 dataframe:

  number order_amount number_of_refund
0   1555       100.00                 
1   1666       200.00                 
2   1777      -200.00             1666
3   1888       300.00                 
4   1999      -150.00             1888    

I want to remove order and order refund entries if:如果出现以下情况,我想删除订单和订单退款条目:

  • "number_of_refund" matches a number from "number" column (there might not be a number of order in the dataframe if order was made last month and refund during the current month) "number_of_refund" 匹配 "number" 列中的数字(如果上个月下订单并在当月退款,dataframe 中可能没有订单数)
  • amount of "number_of_refund" (which was matched to "number") has a negative amount of "number" amount (in this case number 1666 has 200, and refund of 1666 has -200 so both rows should be removed) “number_of_refund”的数量(与“number”匹配)的“number”数量为负数(在这种情况下,数字 1666 有 200,而 1666 的退款有 -200,因此应该删除这两行)

So the result in this case should be:所以这种情况下的结果应该是:

number order_amount number_of_refund
0   1555       100.00                 
3   1888       300.00                 
4   1999      -150.00           1888                            

How do I check if amount of one column's value is in another column but with opposite amount (negative)?如何检查一列值的数量是否在另一列中但数量相反(负数)?

IIUC, you can use a boolean indexing approach: IIUC,您可以使用 boolean 索引方法:

# ensure numeric values
df['order_amount'] = pd.to_numeric(df['order_amount'], errors='coerce')

# is the row a refund?
m1 = df['number_of_refund'].ne('')
# get mapping of refunds
s = df[m1].set_index('number_of_refund')['order_amount']

# get reimbursements and find which ones will equal the original value
reimb = df['number'].map(s)
m2 = reimb.eq(-df['order_amount'])
m3 = df['number_of_refund'].isin(df.loc[m2, 'number'])

# keep rows that do not match any m2 or m3 mask
df = df[~(m2|m3)]

output: output:

  number  order_amount number_of_refund
0   1555         100.0                 
3   1888         300.0                 
4   1999        -150.0             1888

Let's say I change the refunded amount for 1999 to be -200.00假设我将 1999 年的退款金额更改为-200.00

test = {"number": ['1555','1666','1777', '1888', '1999'],
        "order_amount": ['100.00','200.00','-200.00', '300.00', '-200.00'],
        "number_of_refund": ['','','1666', '', '1888']  }
df = pd.DataFrame(test)
print(df)

  number order_amount number_of_refund
0   1555       100.00                 
1   1666       200.00                 
2   1777      -200.00             1666
3   1888       300.00                 
4   1999      -200.00             1888

Here's another way to do it.这是另一种方法。 I create a unique string by concatenating the number_of_refund (filled with the number column on the blanks) and the absolute order_amount (ie, without the negative sign), then drop both duplicates found我通过连接number_of_refund (用空白处的number列填充)和绝对order_amount (即没有负号)来创建一个唯一的字符串,然后删除找到的两个重复项

df['unique'] = df.apply(lambda x: x['order_amount'].replace('-','')+'|'+x['number'] if x['number_of_refund']=='' else x['order_amount'].replace('-','')+'|'+x['number_of_refund'], axis=1)
#df['unique'] = df['order_amount'].str.replace('-','') + '|' + df['number_of_refund'].mask(df['number_of_refund'].eq(''), df['number'])  #the same
print(df)

  number order_amount number_of_refund       unique
0   1555       100.00                   100.00|1555
1   1666       200.00                   200.00|1666    #duplicate
2   1777      -200.00             1666  200.00|1666    #duplicate
3   1888       300.00                   300.00|1888
4   1999      -200.00             1888  200.00|1888

The duplicate rows are easily identified, and ready to be dropped (including the column unique )重复的行很容易识别,并准备被删除(包括列unique

df = df.drop_duplicates(['unique'], keep=False).drop(columns=['unique'])
print(df)

  number order_amount number_of_refund
0   1555       100.00                 
3   1888       300.00                 
4   1999      -200.00             1888

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 给定另一个 dataframe 中两列的值约束,在一个 dataframe 的列中查找最大值 - Find maximum of value in a column of one dataframe given the value constraint of two columns in another dataframe 根据另一列中的值组合数据框的列 - Combining columns of dataframe based on value in another column 根据另一个数据框查找列值 - find column value based on another dataframe 如何使用其他数据框列的值转换数据框的列值 - How to transform a column value of a dataframe with values of another dataframe columns 根据条件将一个 dataframe 中的列值设置为另一个 dataframe 列 - Setting value of columns in one dataframe to another dataframe column based on condition 如果另一个列中的值是另一个DataFrame中的pandas列? - pandas columns from another DataFrame if value is in another column? 根据具有共享列的另一个数据帧在数据帧中查找第一次出现的值 - Find first occurrence of value in dataframe based on another dataframe with a shared column 如何从另一个 dataframe 创建一个 dataframe,每个值列只有最后一个非负值? - How do I create a dataframe from another dataframe with only the last non negative values for each value column? Pandas 将 1 列值与另一个数据框列进行比较,找到匹配的行 - Pandas compare 1 columns values to another dataframe column, find matching rows 如何根据数据框的另一列中的条件查找列中的最小值? - How to find minimum value in a column based on condition in an another column of a dataframe?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM