简体   繁体   English

如何同时遍历pandas的列和行?

[英]How to iterate through pandas columns and rows simultaneously?

I have two df A & B, I want to iterate through df B's certain columns and check values of all its rows and see if values exist in one of the columns in A, and use fill null values with A's other columns' values.我有两个 df A & B,我想遍历 df B 的某些列并检查其所有行的值并查看 A 中的一列中是否存在值,并使用 A 的其他列的值填充空值。

df A: df 答:

 country region product
 USA     NY     apple
 USA     NY     orange
 UK      LON    banana
 UK      LON    chocolate
 CANADA  TOR    syrup 
 CANADA  TOR    fish

df B: df乙:

 country ID    product1     product2     product3     product4     region 
 USA     123   other stuff  other stuff  apple        NA           NA
 USA     456   orange       other stuff  other stuff  NA           NA
 UK      234   banana       other stuff  other stuff  NA           NA
 UK      766   other stuff  other stuff  chocolate    NA           NA
 CANADA  877   other stuff  other stuff  syrup        NA           NA
 CANADA  109   NA           fish         NA           other stuff  NA

so I want to iterate through dfB and for example see if dfA.product ( apple ) is in columns of dfB.product1-product4 if true such as the first row of dfB indicates, then I want to add the region value from dfA.region into dfB's region which now is currently NA.所以我想遍历 dfB,例如查看 dfA.product ( apple ) 是否在 dfB.product1-product4 的列中,如果真如 dfB 的第一行所示,那么我想添加来自 dfA.region 的区域值进入现在是 NA 的 dfB区域

here is the code I have, I am not sure if it is right:这是我的代码,我不确定它是否正确:

import pandas as pd 
from tqdm import tqdm


def fill_null_value(dfA, dfB):
    for i, row in tqdm(dfA.iterrows()):
        for index, row in tqdm(dfB.iterrows()):
            if dfB['product1'][index] == dfA['product'][i]:
                dfB['region'] =  dfA['region '][i]

            elif dfB['product2'][index] == dfA['product'[i]:
                dfB['region'] =  dfA['region'][i]

            elif dfB['product3'][index] == dfA['product'][i]:
                dfB['region'] =  dfA['region'][i]

            elif dfB['product4'][index] == dfA['product'][i]:
                dfB['region'] =  dfA['region'][i]

            else:
                dfB['region '] = "not found"


    print('outputing data')
    return dfB.to_excel('test.xlsx')

The main issue here seems to be finding a single column for products in your second data set that you can do your join on.这里的主要问题似乎是在您的第二个数据集中为产品找到一个列,您可以对其进行连接。 It's not clear how exactly you are deciding what values in the various product columns in df_b are meant to be used as keys to lookup vs. the ones that are ignored.目前尚不清楚您究竟如何决定df_b中各个产品列中的哪些值是用作查找键还是被忽略的键。

Assuming, though, that your df_a contains an exhaustive list of product values and each of those values only ever occurs in a row once you could do something like this (simplifying your example):但是,假设您的df_a包含一个详尽的产品值列表,并且一旦您可以执行以下操作(简化您的示例),这些值中的每一个都只会出现在一行中:

import pandas as pd

df_a = pd.DataFrame({'Region':['USA', 'Canada'], 'Product': ['apple', 'banana']})
df_b = pd.DataFrame({'product1': ['apple', 'xyz'], 'product2': ['xyz', 'banana']})

product_cols = ['product1', 'product2']

df_b['Product'] = df_b[product_cols].apply(lambda x: x[x.isin(df_a.Product)][0], axis=1)
df_b = df_b.merge(df_a, on='Product')

The big thing here is generating a column that you can join on for your lookup这里最重要的是生成一个列,您可以加入该列进行查找

If i where you I would create some join and then concat them and drop duplicates如果我在你那里我会创建一些join然后concat它们并drop duplicates

df_1 = df_A.merge(df_B, right_on=['country', 'product'], left_on=['country', 'product1'], how='right')
df_2 = df_A.merge(df_B, right_on=['country', 'product'], left_on=['country', 'product2'], how='right')
df_3 = df_A.merge(df_B, right_on=['country', 'product'], left_on=['country', 'product3'], how='right')
df_4 = df_A.merge(df_B, right_on=['country', 'product'], left_on=['country', 'product4'], how='right')

df = pd.concat([df_1, df_2, df_3, df_4]).drop_duplicates()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM