简体   繁体   English

删除存在 null 值的列的重复项

[英]Drop duplicates of a column where null value is present

I have a dataframe df1 and column 1 (col1) contains customer id.我有一个 dataframe df1 并且第 1 列(col1)包含客户 ID。 Col2 is filled with sales and some of the values are missing Col2 填充了销售额,并且缺少一些值

My problem is that I want to drop duplicate customer ids in col1 only where the value of sales is missing.我的问题是,我只想在缺少销售价值的地方删除 col1 中的重复客户 ID。

I tried writing a function saying:我试着写一个 function 说:

def drop(i):
          if i[col2] == np.nan:
             i.drop_duplicates(subset = 'col1')
          else:
             return i['col1']

I am getting an error saying truth value of series is ambiguous我收到一个错误,说系列的真值不明确

Thank you for reading.感谢您的阅读。 Would appreciate a solution将不胜感激一个解决方案

Following should work, using groupby , apply , dropna , reset_index以下应该工作,使用groupbyapplydropnareset_index

assuming your data is something like this假设您的数据是这样的

input:输入:

col1    col2
0   1001    2.0
1   1001    NaN
2   1002    4.0
3   1002    NaN

code:代码:

import pandas as pd
import numpy as np

#Dummy data
data = {
    'col1':[1001,1001,1002,1002],
    'col2':[2,np.nan,4,np.nan],
}

df = pd.DataFrame(data)

#Solution
df.groupby('col1').apply(lambda group: group.dropna(subset=['col2'])).reset_index(drop=True)

output: output:

col1    col2
0   1001    2.0
1   1002    4.0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从 PySpark Dataframe 中删除重复项并将剩余的列值更改为 null - How to drop duplicates from PySpark Dataframe and change the remaining column value to null 如何丢弃重复项但如果某个特定的其他列不为空则保留行(Pandas) - How to drop duplicates but keep the rows if a particular other column is not null (Pandas) 在列相同且时间戳接近的 DataFrame 中删除重复项 - Drop Duplicates in a DataFrame where a column are identical and have near timestamps DATAFRAME:删除列的值对于唯一键相等的重复项 - DATAFRAME: drop duplicates where column's values are equal for unique key 根据另一列(Python,Pandas)中的值删除一列的重复项 - Drop duplicates of one column based on value in another column, Python, Pandas 我需要删除某一列中没有值或为“null”的所有行:使用 Python 和 Pandas - I need to drop all rows in a certain column where there is no value or is “null”: Using Python and Pandas 删除重复项,保留另一列中具有最高值的行 - Drop duplicates keeping the row with the highest value in another column PySpark - 为列 Window 提取最大值 24 小时,然后删除重复项 - PySpark - Extract Max Value for Column for 24 Hour Window, Then Drop Duplicates Pandas - 删除重复项但根据列中的值更改 keep:first/last - Pandas - Drop duplicates but change keep:first/last according to a value in a column PySpark 删除重复项并保留列中具有最高值的行 - PySpark drop Duplicates and Keep Rows with highest value in a column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM