[英]How to drop values from lists inside columns from a Pandas DataFrame
Although not good coding practice, I've come to an special kind of problem, in which I need to go through a column of lists to erase particular values.虽然不是很好的编码习惯,但我遇到了一种特殊的问题,我需要通过一列列表来 go 擦除特定值。 I suppose one resolution could be managed with melting the 'neighbors' column, but I believe the code I've managed is close from the objective.
我想可以通过融化“邻居”列来管理一项解决方案,但我相信我管理的代码与目标很接近。 I've prepared a reproducible example for better understanding:
为了更好地理解,我准备了一个可重现的示例:
import pandas as pd
import numpy as np
def removing_nan_neighboors(custom_df):
nan_list = list(custom_df[custom_df['values'].notna()]['customer'])
print(nan_list)
custom_df['neighbors'] = [x for x in custom_df['neighbors'] if x not in nan_list]
return custom_df
customer = [1, 2, 3, 4, 5, 6]
values = [np.nan, np.nan, 10, np.nan, 11, 12]
neighbors = [[6, 2], [1, 3], [2, 4], [3, 5], [4, 6], [5, 1]]
df = pd.DataFrame({'customer': customer, 'values': values, 'neighbors': neighbors})
df = removing_nan_neighboors(df)
print(df)
customer values neighbors
0 1 NaN [6, 2]
1 2 NaN [1, 3]
2 3 10.0 [2, 4]
3 4 NaN [3, 5]
4 5 11.0 [4, 6]
5 6 12.0 [5, 1]
The objective is to erase the customer numbers from the neighbors, if they have NaN values:目标是从邻居中删除客户编号,如果它们具有 NaN 值:
customer values neighbors
0 1 NaN [6]
1 2 NaN [3]
2 3 10.0 []
3 4 NaN [3, 5]
4 5 11.0 [6]
5 6 12.0 [5]
But I have failed to get that far, for my function doesn't work as intended yet.但我没能走到那一步,因为我的 function 还没有按预期工作。 Help is appreciated.
帮助表示赞赏。
Try:尝试:
df["cust_1"] = np.where(
np.isnan(np.roll(df["values"], 1)),
np.nan,
np.roll(df["customer"], 1),
)
df["cust_2"] = np.where(
np.isnan(np.roll(df["values"], -1)),
np.nan,
np.roll(df["customer"], -1),
)
df["neighbors"] = df[["cust_1", "cust_2"]].agg(
lambda x: list(x[x.notna()].astype(int)), axis=1
)
df = df.drop(columns=["cust_1", "cust_2"])
print(df)
Prints:印刷:
customer values neighbors
0 1 NaN [6]
1 2 NaN [3]
2 3 10.0 []
3 4 NaN [3, 5]
4 5 11.0 [6]
5 6 12.0 [5]
If I understood your objective correctly, you want to erase such numbers from every neighbors
row that belong to that customer
rows, where values
is NaN
.如果我正确理解了您的目标,您希望从属于该
customer
行的每个neighbors
行中删除此类数字,其中values
NaN
。 So basically you want to get the result from your last cell.所以基本上你想从你的最后一个单元格中得到结果。
I attempted to do that in a list comprehension approach:我试图在列表理解方法中做到这一点:
df['neighbors_new'] = [[n for n in neighbor
if n not in df[df['values'].isna() == True]['customer'].to_list()]
for neighbor in df.neighbors]
And got this:得到了这个:
customer values neighbors neighbors_new
0 1 NaN [6, 2] [6]
1 2 NaN [1, 3] [3]
2 3 10.0 [2, 4] []
3 4 NaN [3, 5] [3, 5]
4 5 11.0 [4, 6] [6]
5 6 12.0 [5, 1] [5]
In your case do explode
then isin
keep the notna
在你的情况下
explode
,然后notna
isin
s = df['neighbors'].explode()
df['new'] = s[s.isin(df.loc[df['values'].notna(),'customer'])].groupby(level=0).agg(list)
df
Out[36]:
customer values neighbors new
0 1 NaN [6, 2] [6]
1 2 NaN [1, 3] [3]
2 3 10.0 [2, 4] NaN
3 4 NaN [3, 5] [3, 5]
4 5 11.0 [4, 6] [6]
5 6 12.0 [5, 1] [5]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.