簡體   English   中英

Pandas:在 DataFrame 中查找值,其中源列有多個成員

[英]Pandas: lookup values in DataFrame, where source column has multiple members

我有一個帶有數據的 DF 和一個表示用於查詢和返回數據的數據庫的 DF。 我無法使用合並,因為某些行包含多個查找。

數據:

df_data = pd.DataFrame([[1000, 'Jerry', 'BR1001, BR1003, BR9009','',''], 
                        [1001, 'Buck', 'BR1010, BR1011','',''], 
                        [1002, 'Melanie', 'BR3009','','DPT2002'],
                        [1003, 'Perry','BR4009','',''],
                        [1004, 'Perry2','','DIST1000',''],
                        [1005, 'Eloise','','','DPT9009'],
                        [1005, 'Sharon','','','DPT9009']],
                        columns=['ID', 'Name', 'School Number','District Number','Dept. Number'])

給定School Number ,我需要能夠提取所有關聯的District NumberDept. Number 我只想專注於提取District Number s。 問題是如何在一個以上的字段中迭代成員。

要查詢的數據:

df_DB = pd.DataFrame([['DIST1000', 'BR1001', 'DPT9009','Physics'], 
                    ['DIST1000', 'BR1003', 'DPT1010','Biology'],
                    ['DIST1000', 'BR1003', 'DPT1011','Sociology'],
                    ['DIST1000', 'BR1010', 'DPT1012','Philosophy'],
                    ['DIST1000', 'BR1011', 'DPT1013','Pre-K'],
                    ['DIST1000', 'BR1012', 'DPT1014','Geology'],
                    ['DIST1001', 'BR9009', 'DPT2001', 'Math'],
                    ['DIST1001', 'BR3009', 'DPT2002', 'Physics'],
                    ['DIST1001', 'BR9009', 'DPT2003', 'Pre-K'],
                    ['DIST1001', 'BR4009', 'DPT2004', 'Economics']],
                    columns=['District Number', 'School Number', 'Dept. Number','Name'])

例如,請注意上面數據中的第一條記錄Jerry 他的記錄中分配了 3 個School Number

所需 output(示例):

1000, 'Jerry', 'BR1001, BR1003, BR9009','DIST1001, DIST1000','DPT9009, DPT1010, DPT1011, DPT2001, DPT2003'

為此我需要 function 嗎? 如果我能找到地區號碼,我想我可以弄清楚部門。

# Changing type from string to list.
df_data['School Number'] = df_data['School Number'].apply(lambda x: x.split(", ")) 

# Expanding list into rows, selecting the desired columns from both tables,
# grouping by the ID, and returning the list of district numbers and schools along with the 
# first name per ID (which is a guaranteed 1:1), then we join these values with a comma after turning them into a set (no duplicates).

df_data.explode('School Number')[['ID', 'Name', 'School Number']].merge(df_DB[['School Number', 'District Number']], left_on='School Number', right_on='School Number').groupby('ID').agg({'Name': 'first', 'School Number': lambda x: ', '.join(set(x)), 'District Number': lambda x: ', '.join(set(x))})

Output:

             Name           School Number     District Number
ID
1000    Jerry  BR1001, BR9009, BR1003  DIST1000, DIST1001
1001     Buck          BR1011, BR1010            DIST1000
1002  Melanie                  BR3009            DIST1001
1003    Perry                  BR4009            DIST1001

或者,對於左連接:

df_data.explode('School Number')[['ID', 'Name', 'School Number']].merge(df_DB[['School Number', 'District Number']], left_on='School Number', right_on='School Number', how='left').groupby('ID').agg({'Name': 'first', 'School Number': lambda x: ', '.join(set(x)), 'District Number': lambda x: ', '.join(set([y for y in x if y == y]))})

Output:

         Name           School Number     District Number
ID
1000    Jerry  BR1003, BR1001, BR9009  DIST1000, DIST1001
1001     Buck          BR1011, BR1010            DIST1000
1002  Melanie                  BR3009            DIST1001
1003    Perry                  BR4009            DIST1001
1004   Perry2
1005   Eloise

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM