简体   繁体   English

如何从 Pandas DataFrame 的列内的列表中删除值

[英]How to drop values from lists inside columns from a Pandas DataFrame

Although not good coding practice, I've come to an special kind of problem, in which I need to go through a column of lists to erase particular values.虽然不是很好的编码习惯,但我遇到了一种特殊的问题,我需要通过一列列表来 go 擦除特定值。 I suppose one resolution could be managed with melting the 'neighbors' column, but I believe the code I've managed is close from the objective.我想可以通过融化“邻居”列来管理一项解决方案,但我相信我管理的代码与目标很接近。 I've prepared a reproducible example for better understanding:为了更好地理解,我准备了一个可重现的示例:

import pandas as pd
import numpy as np


def removing_nan_neighboors(custom_df):
    nan_list = list(custom_df[custom_df['values'].notna()]['customer'])
    print(nan_list)
    custom_df['neighbors'] = [x for x in custom_df['neighbors'] if x not in nan_list]
    return custom_df


customer = [1, 2, 3, 4, 5, 6]
values = [np.nan, np.nan, 10, np.nan, 11, 12]
neighbors = [[6, 2], [1, 3], [2, 4], [3, 5], [4, 6], [5, 1]]
df = pd.DataFrame({'customer': customer, 'values': values, 'neighbors': neighbors})
df = removing_nan_neighboors(df)

print(df)

   customer values neighbors
0        1     NaN    [6, 2]
1        2     NaN    [1, 3]
2        3    10.0    [2, 4]
3        4     NaN    [3, 5]
4        5    11.0    [4, 6]
5        6    12.0    [5, 1]

The objective is to erase the customer numbers from the neighbors, if they have NaN values:目标是从邻居中删除客户编号,如果它们具有 NaN 值:

   customer values neighbors
0        1     NaN    [6]
1        2     NaN    [3]
2        3    10.0    []
3        4     NaN    [3, 5]
4        5    11.0    [6]
5        6    12.0    [5]

But I have failed to get that far, for my function doesn't work as intended yet.但我没能走到那一步,因为我的 function 还没有按预期工作。 Help is appreciated.帮助表示赞赏。

Try:尝试:

df["cust_1"] = np.where(
    np.isnan(np.roll(df["values"], 1)),
    np.nan,
    np.roll(df["customer"], 1),
)

df["cust_2"] = np.where(
    np.isnan(np.roll(df["values"], -1)),
    np.nan,
    np.roll(df["customer"], -1),
)

df["neighbors"] = df[["cust_1", "cust_2"]].agg(
    lambda x: list(x[x.notna()].astype(int)), axis=1
)
df = df.drop(columns=["cust_1", "cust_2"])

print(df)

Prints:印刷:

   customer  values neighbors
0         1     NaN       [6]
1         2     NaN       [3]
2         3    10.0        []
3         4     NaN    [3, 5]
4         5    11.0       [6]
5         6    12.0       [5]

If I understood your objective correctly, you want to erase such numbers from every neighbors row that belong to that customer rows, where values is NaN .如果我正确理解了您的目标,您希望从属于该customer行的每个neighbors行中删除此类数字,其中values NaN So basically you want to get the result from your last cell.所以基本上你想从你的最后一个单元格中得到结果。

I attempted to do that in a list comprehension approach:我试图在列表理解方法中做到这一点:

df['neighbors_new'] = [[n for n in neighbor 
                        if n not in df[df['values'].isna() == True]['customer'].to_list()] 
                       for neighbor in df.neighbors]

And got this:得到了这个:

   customer  values neighbors neighbors_new
0         1     NaN    [6, 2]           [6]
1         2     NaN    [1, 3]           [3]
2         3    10.0    [2, 4]            []
3         4     NaN    [3, 5]        [3, 5]
4         5    11.0    [4, 6]           [6]
5         6    12.0    [5, 1]           [5]

In your case do explode then isin keep the notna在你的情况下explode ,然后notna isin

s = df['neighbors'].explode()
df['new'] = s[s.isin(df.loc[df['values'].notna(),'customer'])].groupby(level=0).agg(list)
df
Out[36]: 
   customer  values neighbors     new
0         1     NaN    [6, 2]     [6]
1         2     NaN    [1, 3]     [3]
2         3    10.0    [2, 4]     NaN
3         4     NaN    [3, 5]  [3, 5]
4         5    11.0    [4, 6]     [6]
5         6    12.0    [5, 1]     [5]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM