![](/img/trans.png)
[英]How to compare pairs of values in two dataframes of different sizes in python?
[英]How to compare cell values of two different DataFrames in Python?
我有兩個DataFrame:
Person_df
Name Emplid Country
0 DK 123 India
1 JS 456 India
2 RM 789 China
3 MS 111 China
4 SR 222 China
Target_df
Country Category Target
0 India Marketing Reduce spend by $xy.
1 India R&D Increase spend by $dd.
2 India Infra Reduce spend by $kn.
3 China Marketing Increase spend by $eg.
4 China R&D Increase spend by $cb.
5 China Infra Reduce spend by $mn.
我的目標是根據每個人的國家/地區創建第三個DataFrame,如下所示:
Individual_df
TargetID Category Target
DK12301 Marketing Reduce spend by $xy.
DK12302 R&D Increase spend by $dd.
DK12303 Infra Reduce spend by $kn.
JS45601 Marketing Reduce spend by $xy.
JS45602 R&D Increase spend by $dd.
JS45603 Infra Reduce spend by $kn.
RM78901 Marketing Increase spend by $eg.
RM78902 R&D Increase spend by $cb.
RM78903 Infra Reduce spend by $mn.
MS11101 Marketing Increase spend by $eg.
MS11102 R&D Increase spend by $cb.
MS11103 Infra Reduce spend by $mn.
SR22201 Marketing Increase spend by $eg.
SR22202 R&D Increase spend by $cb.
SR22203 Infra Reduce spend by $mn.
基本上,我必須從Person_df中獲取一個人,將他/她的國家/地區與Target_df中提到的國家/地區相匹配,然后將此目標分配給該人(並存儲在Personal_df中)。
問題是,我是python的新手,無法真正弄清楚如何進行國家/地區比較。
我寫了下面的代碼:
for index, row in Person_df.iterrows():
for index1, row1 in Goals_df.iterrows():
If Person_df['country'] == Person_df['country'] : #I know this is incorrect
data = []
#populate data[] with selected values for one person.
#append data[] to Individual_df
我需要在以下幾點上提供幫助:
1)在這里,我如何真正能夠對每個人的國家/地區進行比較。
2)即使我知道如何比較,我編寫的代碼也不高效,因為我在這里進行了不必要的迭代。 任何指針,我該如何改善呢?
謝謝!
嘗試這個,
Individual_df = pd.merge(Person_df, Target_df2, on=['Country'], how='left')
Individual_df['TargetID'] = Individual_df['Name'] + df3['Emplid'].astype(str) + ((df3.groupby('Emplid').cumcount() + 1).astype(str).str.zfill(2))
Individual_df = Individual_df[['TargetID', 'Category', 'Target']]
print Individual_df
輸出:
TargetID Category Target
0 DK12301 Marketing Reduce spend by $xy.
1 DK12302 R&D Increase spend by $dd.
2 DK12303 Infra Reduce spend by $kn.
3 JS45601 Marketing Reduce spend by $xy.
4 JS45602 R&D Increase spend by $dd.
5 JS45603 Infra Reduce spend by $kn.
6 RM78901 Marketing Increase spend by $eg.
7 RM78902 R&D Increase spend by $cb.
8 RM78903 Infra Reduce spend by $mn.
9 MS11101 Marketing Increase spend by $eg.
10 MS11102 R&D Increase spend by $cb.
11 MS11103 Infra Reduce spend by $mn.
12 SR22201 Marketing Increase spend by $eg.
13 SR22202 R&D Increase spend by $cb.
14 SR22203 Infra Reduce spend by $mn.
說明:
當用戶請求通過for循環獲取行時:
unique_countries=df1['Country'].unique().tolist()
for index, row in df2.iterrows():
if row['Country'] in unique_countries:
print row.values
//do operation
說明:
查找Person_df的唯一元素
通過for循環迭代Individual_df
檢查是否存在國家/地區,如果存在,則執行所需的操作。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.