簡體   English   中英

如何在Python中比較兩個不同DataFrame的單元格值?

[英]How to compare cell values of two different DataFrames in Python?

我有兩個DataFrame:

Person_df

Name  Emplid  Country

    0  DK     123    India

    1  JS     456    India

    2  RM     789    China

    3  MS     111    China

    4  SR     222    China

Target_df

Country Category    Target

    0   India   Marketing   Reduce spend by $xy.

    1   India   R&D         Increase spend by $dd.

    2   India   Infra       Reduce spend by $kn.

    3   China   Marketing   Increase spend by $eg.

    4   China   R&D         Increase spend by $cb.

    5   China   Infra       Reduce spend by $mn.

我的目標是根據每個人的國家/地區創建第三個DataFrame,如下所示:

Individual_df

TargetID    Category    Target

    DK12301     Marketing   Reduce spend by $xy.

    DK12302     R&D         Increase spend by $dd.

    DK12303     Infra       Reduce spend by $kn.

    JS45601     Marketing   Reduce spend by $xy.

    JS45602     R&D         Increase spend by $dd.

    JS45603     Infra       Reduce spend by $kn.

    RM78901     Marketing   Increase spend by $eg.

    RM78902     R&D         Increase spend by $cb.

    RM78903     Infra       Reduce spend by $mn.

    MS11101     Marketing   Increase spend by $eg.

    MS11102     R&D         Increase spend by $cb.

    MS11103     Infra       Reduce spend by $mn.

    SR22201     Marketing   Increase spend by $eg.

    SR22202     R&D         Increase spend by $cb.

    SR22203     Infra       Reduce spend by $mn.

基本上,我必須從Person_df中獲取一個人,將他/她的國家/地區與Target_df中提到的國家/地區相匹配,然后將此目標分配給該人(並存儲在Personal_df中)。

問題是,我是python的新手,無法真正弄清楚如何進行國家/地區比較。

我寫了下面的代碼:

for index, row in Person_df.iterrows():

     

        for index1, row1 in Goals_df.iterrows():

            If Person_df['country'] == Person_df['country'] : #I know this is incorrect

                data = [] 

                #populate data[] with selected values for one person.

                #append data[] to Individual_df

我需要在以下幾點上提供幫助:

1)在這里,我如何真正能夠對每個人的國家/地區進行比較。

2)即使我知道如何比較,我編寫的代碼也不高效,因為我在這里進行了不必要的迭代。 任何指針,我該如何改善呢?

謝謝!

嘗試這個,

Individual_df = pd.merge(Person_df, Target_df2, on=['Country'], how='left')
Individual_df['TargetID'] = Individual_df['Name'] + df3['Emplid'].astype(str) + ((df3.groupby('Emplid').cumcount() + 1).astype(str).str.zfill(2))
Individual_df = Individual_df[['TargetID', 'Category', 'Target']]
print Individual_df

輸出:

   TargetID   Category                  Target
0   DK12301  Marketing    Reduce spend by $xy.
1   DK12302        R&D  Increase spend by $dd.
2   DK12303      Infra    Reduce spend by $kn.
3   JS45601  Marketing    Reduce spend by $xy.
4   JS45602        R&D  Increase spend by $dd.
5   JS45603      Infra    Reduce spend by $kn.
6   RM78901  Marketing  Increase spend by $eg.
7   RM78902        R&D  Increase spend by $cb.
8   RM78903      Infra    Reduce spend by $mn.
9   MS11101  Marketing  Increase spend by $eg.
10  MS11102        R&D  Increase spend by $cb.
11  MS11103      Infra    Reduce spend by $mn.
12  SR22201  Marketing  Increase spend by $eg.
13  SR22202        R&D  Increase spend by $cb.
14  SR22203      Infra    Reduce spend by $mn.

說明:

  1. 用Person_df和Target_df執行左聯接
  2. 然后根據名稱和員工ID以及emp id的總數創建TargetID
  3. 提取所需的列

當用戶請求通過for循環獲取行時:

unique_countries=df1['Country'].unique().tolist()

for index, row in df2.iterrows():
    if row['Country'] in unique_countries:
        print row.values
        //do operation

說明:

  1. 查找Person_df的唯一元素

  2. 通過for循環迭代Individual_df

  3. 檢查是否存在國家/地區,如果存在,則執行所需的操作。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM