[英]How to iterate through pandas columns and rows simultaneously?
I have two df A & B, I want to iterate through df B's certain columns and check values of all its rows and see if values exist in one of the columns in A, and use fill null values with A's other columns' values.我有两个 df A & B,我想遍历 df B 的某些列并检查其所有行的值并查看 A 中的一列中是否存在值,并使用 A 的其他列的值填充空值。
df A: df 答:
country region product
USA NY apple
USA NY orange
UK LON banana
UK LON chocolate
CANADA TOR syrup
CANADA TOR fish
df B: df乙:
country ID product1 product2 product3 product4 region
USA 123 other stuff other stuff apple NA NA
USA 456 orange other stuff other stuff NA NA
UK 234 banana other stuff other stuff NA NA
UK 766 other stuff other stuff chocolate NA NA
CANADA 877 other stuff other stuff syrup NA NA
CANADA 109 NA fish NA other stuff NA
so I want to iterate through dfB and for example see if dfA.product ( apple ) is in columns of dfB.product1-product4 if true such as the first row of dfB indicates, then I want to add the region value from dfA.region into dfB's region which now is currently NA.所以我想遍历 dfB,例如查看 dfA.product ( apple ) 是否在 dfB.product1-product4 的列中,如果真如 dfB 的第一行所示,那么我想添加来自 dfA.region 的区域值进入现在是 NA 的 dfB区域。
here is the code I have, I am not sure if it is right:这是我的代码,我不确定它是否正确:
import pandas as pd
from tqdm import tqdm
def fill_null_value(dfA, dfB):
for i, row in tqdm(dfA.iterrows()):
for index, row in tqdm(dfB.iterrows()):
if dfB['product1'][index] == dfA['product'][i]:
dfB['region'] = dfA['region '][i]
elif dfB['product2'][index] == dfA['product'[i]:
dfB['region'] = dfA['region'][i]
elif dfB['product3'][index] == dfA['product'][i]:
dfB['region'] = dfA['region'][i]
elif dfB['product4'][index] == dfA['product'][i]:
dfB['region'] = dfA['region'][i]
else:
dfB['region '] = "not found"
print('outputing data')
return dfB.to_excel('test.xlsx')
The main issue here seems to be finding a single column for products in your second data set that you can do your join on.这里的主要问题似乎是在您的第二个数据集中为产品找到一个列,您可以对其进行连接。 It's not clear how exactly you are deciding what values in the various product columns in df_b
are meant to be used as keys to lookup vs. the ones that are ignored.目前尚不清楚您究竟如何决定df_b
中各个产品列中的哪些值是用作查找键还是被忽略的键。
Assuming, though, that your df_a
contains an exhaustive list of product values and each of those values only ever occurs in a row once you could do something like this (simplifying your example):但是,假设您的df_a
包含一个详尽的产品值列表,并且一旦您可以执行以下操作(简化您的示例),这些值中的每一个都只会出现在一行中:
import pandas as pd
df_a = pd.DataFrame({'Region':['USA', 'Canada'], 'Product': ['apple', 'banana']})
df_b = pd.DataFrame({'product1': ['apple', 'xyz'], 'product2': ['xyz', 'banana']})
product_cols = ['product1', 'product2']
df_b['Product'] = df_b[product_cols].apply(lambda x: x[x.isin(df_a.Product)][0], axis=1)
df_b = df_b.merge(df_a, on='Product')
The big thing here is generating a column that you can join on for your lookup这里最重要的是生成一个列,您可以加入该列进行查找
If i where you I would create some join
and then concat
them and drop duplicates
如果我在你那里我会创建一些join
然后concat
它们并drop duplicates
df_1 = df_A.merge(df_B, right_on=['country', 'product'], left_on=['country', 'product1'], how='right')
df_2 = df_A.merge(df_B, right_on=['country', 'product'], left_on=['country', 'product2'], how='right')
df_3 = df_A.merge(df_B, right_on=['country', 'product'], left_on=['country', 'product3'], how='right')
df_4 = df_A.merge(df_B, right_on=['country', 'product'], left_on=['country', 'product4'], how='right')
df = pd.concat([df_1, df_2, df_3, df_4]).drop_duplicates()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.