简体   繁体   English

遍历 dataframe 和字典以更新 dataframe 中的值以匹配字符串与 python

[英]iterate through dataframe and dictionary to update values in dataframe for matching strings with python

I have a dictionary:我有一本字典:

dict = {"name1":["name1_a, "name1_b"], "name2":["name2_a", "name2_b", "name2_c"]

Then I read in a.csv file as a dataframe that has the following structure:然后我读入 a.csv 文件作为具有以下结构的 dataframe:

df = pd.read_csv('file.csv')
Name姓名 Value价值
"name1" “姓名1” 10 10
"name1_b" “name1_b” 30 30
"name2_c" “name2_c” 30 30

I need a function to iterate through the dataframe and the dictionary, in a way that it searches the dataframe for each name in the dictionary lists ("name1_a", "name1_b", etc).我需要一个 function 来遍历 dataframe 和字典,它在 dataframe 中搜索字典列表中的每个名称(“name1_a”、“name1_b”等)。 Once it finds a match, let's say for "name1_b", it should add the corresponding value (30) to "name1" in the dataframe. If the name doesn't exist in the dataframe (like "name2" in the example), it should create a new row and assign the value corresponding to the sum of "name2_a" + "name2_b", etc.一旦找到匹配项,假设对于“name1_b”,它应该将相应的值 (30) 添加到 dataframe 中的“name1”。如果该名称在 dataframe 中不存在(如示例中的“name2”),它应该创建一个新行并分配对应于“name2_a”+“name2_b”等的总和的值。

So the resulting dataframe should be like this (value of "name_1b" was added to the value of "name1", and "name2" was created and assigned the value of "name2_c):所以得到的dataframe应该是这样的(“name_1b”的值被添加到“name1”的值,“name2”被创建并赋值“name2_c”):

Name姓名 Value价值
"name1" “姓名1” 40 40
"name1_b" “name1_b” 30 30
"name2_c" “name2_c” 30 30
"name2" “名字2” 30 30

Thanks for the help!谢谢您的帮助!

You could index df by name and create a separate dataframe that holds values that will be added to df.您可以按名称索引df并创建一个单独的 dataframe 来保存将添加到 df 的值。 Some target keys in dict won't be in df , so they will need to be added with a default. dict中的一些目标键不会在df中,因此需要添加默认值。 Its similar with the addend lists in dict , some will not have values and will need a default.它与dict中的加数列表类似,有些没有值,需要默认值。

Once those two are setup, you can loop through the addends, collect sums and add those to df .一旦设置了这两个,您就可以遍历加数,收集总和并将它们添加到df

import pandas as pd

df = pd.DataFrame({"Name":["name1", "name1_b", "name2_c"],
    "Value":[10, 30, 30]})

# map of target:addends to apply to dataframe
mydict = {"name1":["name1_a", "name1_b"], "name2":["name2_a", "name2_b", "name2_c"]}

# index dataframe by name and default unknown values
df.set_index("Name", inplace=True)
unknowns = pd.DataFrame(index=mydict.keys()-df.index)
unknowns["Value"] = 0
df = df.append(unknowns)
del unknowns

# create dataframe addends, defaulting unknown values
addends_df = pd.DataFrame(index={val for values in mydict.values() 
        for val in values})
addends_df["Value"] = df
addends_df.fillna(0, inplace=True)

# for each target, add the addends
for target, addends in mydict.items():
    df.loc[target] += addends_df.loc[addends].sum()

print(df)

Iterate through the dictionary items and mask the data frame from the matching key and value list and get the sum value using.sum().遍历字典项并从匹配的键和值列表中屏蔽数据框,并使用 .sum() 获取总和值。 if a specific name exists in the data frame simply assign the value else create a new row.如果数据框中存在特定名称,只需分配该值,否则创建一个新行。

dict_ = {"name1":["name1_a", "name1_b"], "name2":["name2_a", "name2_b", "name2_c"]}

for k,v in dict_.items():
    mask_list = v + [k]
    sum_value = df[df['Name'].isin(mask_list)]['Value'].sum()

    if k in df['Name'].unique():
        df.loc[df['Name'] == k, 'Value'] = sum_value
    else:
        df.loc[len(df.index)] = [k, sum_value] 

You can try firstly via dict comprehension make a key:value pair out of the list then chack if 'Name' present in dd and filter out results then replace the values of 'Name' with their values by using replace() and assign() to assign the changes back then append this new dataframe in the original one and then groupby 'Name' and calculate sum:您可以首先尝试通过字典理解从列表中创建一个键:值对,然后检查dd中是否存在“名称”并过滤掉结果,然后使用replace()assign() “名称”的值替换为它们的值将更改分配给原来的 append 这个新的 dataframe 然后 groupby 'Name' 并计算总和:

d={"name1":["name1_a", "name1_b"], "name2":["name2_a", "name2_b", "name2_c"]}
dd={i:k for k,v in d.items() for i in v}
df=(df.append(df[df['Name'].isin(dd)]
      .assign(Name=lambda x:x['Name'].replace(dd)))
      .groupby('Name',as_index=False).sum())

OR或者

The same approach but in seperate steps:同样的方法,但在不同的步骤:

d={"name1":["name1_a", "name1_b"], "name2":["name2_a", "name2_b", "name2_c"]}
dd={i:k for k,v in d.items() for i in v}
df1=df[df['Name'].isin(dd)]
df1['Name']=df1['Name'].map(dd)
df=df.append(df1,ignore_index=True)
df=df.groupby('Name',as_index=False)['name2'].sum()

output of df : output 的df :

    Name        name2
0   name1       40
1   name1_b     30
2   name2       30
3   name2_c     30

Note: don't assign anything to dict function in python注意:不要给 python 中的dict function 赋值

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM