如何根据熊猫中的依赖值更新数据框？

Question

I have to update a dataframe based on a dependency value.我必须根据依赖值更新数据帧。 How can this be done?如何才能做到这一点？

For example, input dataframe df :例如，输入数据帧df ：

id      dependency
10
20       30
30       40
40
50       10
60       20

Here we have: 20 -> 30 and 30 -> 40 .这里我们有： 20 -> 30和30 -> 40 。 So the final result will be 20 -> 40 and 30 -> 40 .所以最终结果将是20 -> 40和30 -> 40 。

In the same way, 60 -> 20 -> 30 -> 40 so final result will be 60 -> 40 .以同样的方式， 60 -> 20 -> 30 -> 40所以最终结果将是60 -> 40 。

Final result:最后结果：

id      dependency   final_dependency
10
20       30            40
30       40            40
40
50       10            10
60       20            40

Answer 1

You can use networkx to do this.您可以使用networkx来执行此操作。 First, create a graph with the nodes that have a dependency:首先，创建一个具有依赖关系的节点的图：

df_edges = df.dropna(subset=['dependency'])
G = nx.from_pandas_edgelist(df_edges, create_using=nx.DiGraph, source='dependency', target='id')

Now, we can find the root ancestor for each node and add that as a new column:现在，我们可以找到每个节点的根祖先并将其添加为一个新列：

def find_root(G, node):
    ancestors = list(nx.ancestors(G, node))
    if len(ancestors) > 0:
        root = find_root(G, ancestors[0])
    else:
        root = node
    return root

df['final_dependency'] = df['id'].apply(lambda x: find_root(G, x))
df['final_dependency'] = np.where(df['final_dependency'] == df['id'], np.nan, df['final_dependency'])

Resulting dataframe:结果数据框：

   id  dependency  final_dependency
0  10         NaN               NaN
1  20        30.0              40.0
2  30        40.0              40.0
3  40         NaN               NaN
4  50        10.0              10.0
5  60        20.0              40.0

Answer 2

One way is to create a custom function:一种方法是创建自定义函数：

s = df[df["dependency"].notnull()].set_index("id")["dependency"].to_dict()

def func(val):
    if not s.get(val):
        return None
    while s.get(val):
        val = s.get(val)
    return val

df["final"] = df["id"].apply(func)

print (df)

   id  dependency  final
0  10         NaN    NaN
1  20        30.0   40.0
2  30        40.0   40.0
3  40         NaN    NaN
4  50        10.0   10.0
5  60        20.0   40.0

Answer 3

You already have a few answers.你已经有了一些答案。 iterrows() is a bit expensive solution but wanted you to have this as well. iterrows() 是一个有点昂贵的解决方案，但希望你也有这个。

import pandas as pd

raw_data = {'id': [i for i in range (10,61,10)],
            'dep':[None,30,40,None,10,20]}
df = pd.DataFrame(raw_data)

df['final_dep'] = df.dep

for i,r in df.iterrows():

    if pd.notnull(r.dep):
        x = df.loc[df['id'] == r.dep, 'dep'].values[0]
        if pd.notnull(x):
            df.iloc[i,df.columns.get_loc('final_dep')] = x
        else:
            df.iloc[i,df.columns.get_loc('final_dep')] = r.dep

print (df)

Output of this will be:输出将是：

   id   dep final_dep
0  10   NaN       NaN
1  20  30.0        40
2  30  40.0        40
3  40   NaN       NaN
4  50  10.0        10
5  60  20.0        30

如何根据熊猫中的依赖值更新数据框？

问题描述

3 个解决方案

解决方案1
3 2020-10-06 06:17:40

解决方案2
2 已采纳 2020-10-06 06:14:30

解决方案3
0 2020-10-06 06:39:15

如何根据熊猫中的依赖值更新数据框？

问题描述

3 个解决方案

解决方案1 3 2020-10-06 06:17:40

解决方案2 2 已采纳 2020-10-06 06:14:30

解决方案3 0 2020-10-06 06:39:15

解决方案1
3 2020-10-06 06:17:40

解决方案2
2 已采纳 2020-10-06 06:14:30

解决方案3
0 2020-10-06 06:39:15