簡體   English   中英

熊貓從第二個數據框中選擇的列,其中另一個列的值存在於主數據框中

[英]pandas selected columns from second dataframe where another column's values exist in a primary dataframe

我正在努力解決一個特定的問題。 我有兩個長度不同,索引不同的熊貓數據框。 對於df1中包含的每個項目,我想查看df2並采用幾列(df1中不包含),其中df2列之一的值等於df1中的值。 例:

import pandas as pd

data_1 = {'TARGET_NAME':['fishinghook', 'doorlock', 'penguin', 'ashtray', 'cat', 'elephant', 'cupcake', 'exercisebench'],
          'FOOBAR':['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'],
          'ix':[320, 321, 322, 323, 324, 325, 326, 328]}

data_2 = {'IMAGE_NAME':['cat', 'penguin', 'jewelrybox', 'exercisebench', 'doorlock', 'jar', ],
          'VALUES_1':['h', 'h', 'c', 'm', 'h', 'f'],
          'VALUES_2':['hm', 'hl', 'cm', 'ml', 'hh', 'fl'],
          'ix':[616, 617, 618, 619, 620, 621]}

desired = {'TARGET_NAME':['fishinghook', 'doorlock', 'penguin', 'ashtray', 'cat', 'elephant', 'cupcake', 'exercisebench'],
          'FOOBAR':['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'],
          'PRODUCED_VALUES_1':['DROPPED', 'h', 'h', 'DROPPED', 'h', 'DROPPED', 'DROPPED', 'm'],
          'ix':[320, 321, 322, 323, 324, 325, 326, 328]}

df1 = pd.DataFrame(data_1, index=data_1['ix'])
df2 = pd.DataFrame(data_2, index=data_2['ix'])
desired_df = pd.DataFrame(desired, index=desired['ix'])

df1
Out[2]: 
    FOOBAR    TARGET_NAME   ix
320    foo    fishinghook  320
321    bar       doorlock  321
322    foo        penguin  322
323    bar        ashtray  323
324    foo            cat  324
325    bar       elephant  325
326    foo        cupcake  326
328    bar  exercisebench  328

df2
Out[3]: 
        IMAGE_NAME VALUES_1 VALUES_2   ix
616            cat        h       hm  616
617        penguin        h       hl  617
618     jewelrybox        c       cm  618
619  exercisebench        m       ml  619
620       doorlock        h       hh  620
621            jar        f       fl  621

desired_df
Out[4]: 
    FOOBAR PRODUCED_VALUES_1    TARGET_NAME   ix
320    foo           DROPPED    fishinghook  320
321    bar                 h       doorlock  321
322    foo                 h        penguin  322
323    bar           DROPPED        ashtray  323
324    foo                 h            cat  324
325    bar           DROPPED       elephant  325
326    foo           DROPPED        cupcake  326
328    bar                 m  exercisebench  328

我想查看df1 ['TARGET_NAME']中的每個值,並使其等於df2 ['IMAGE_NAME'],從df2中提取VALUES_1和VALUES_2列,然后將這些詳細信息添加到df1(或df1的副本)中。 如果它在df2中的任何地方都不匹配(因為位置也不同),那么我希望它寫其他內容(例如,“ DROPPED”)。 理想情況下,我希望df1索引保持不變。

任何幫助表示贊賞!

通過重命名列,您可以在外合並數據,然后與你想要的列名重命名列,然后填充produced_values的楠dropped和下降的NaN的。 最后設置df1索引。

ndf = df1.merge(df2.rename(columns = {'IMAGE_NAME':'TARGET_NAME'}),how='outer',on='TARGET_NAME')
ndf = ndf.drop(['ix_y','VALUES_2'],1).rename(columns={'ix_x':'ix','VALUES_1':'PRODUCED_VALUES_1'})

ndf['PRODUCED_VALUES_1'] = ndf['PRODUCED_VALUES_1'].fillna('Dropped')
ndf = ndf.dropna().set_index(df1.index)
FOOBAR    TARGET_NAME     ix PRODUCED_VALUES_1
320    foo    fishinghook  320.0           Dropped
321    bar       doorlock  321.0                 h
322    foo        penguin  322.0                 h
323    bar        ashtray  323.0           Dropped
324    foo            cat  324.0                 h
325    bar       elephant  325.0           Dropped
326    foo        cupcake  326.0           Dropped
328    bar  exercisebench  328.0                 m
In [34]: df1['PRODUCED_VALUES_1'] = \
             df1['TARGET_NAME'].map(df2.set_index('IMAGE_NAME')['VALUES_1']) \
                               .fillna('DROPPED')

In [35]: df1
Out[35]:
    FOOBAR    TARGET_NAME   ix PRODUCED_VALUES_1
320    foo    fishinghook  320           DROPPED
321    bar       doorlock  321                 h
322    foo        penguin  322                 h
323    bar        ashtray  323           DROPPED
324    foo            cat  324                 h
325    bar       elephant  325           DROPPED
326    foo        cupcake  326           DROPPED
328    bar  exercisebench  328                 m

或類似於@Bharath shetty的解決方案的單線:

In [26]: df1.merge(df2[['IMAGE_NAME','VALUES_1']].rename(columns={'IMAGE_NAME':'TARGET_NAME'}),
    ...:           how='left') \
    ...:    .fillna('DROPPED') \
    ...:    .rename(columns=lambda c: 'PRODUCED_' + c if c=='VALUES_1' else c) \
    ...:    .set_index(df1.index)
    ...:
Out[26]:
    FOOBAR    TARGET_NAME   ix PRODUCED_VALUES_1
320    foo    fishinghook  320           DROPPED
321    bar       doorlock  321                 h
322    foo        penguin  322                 h
323    bar        ashtray  323           DROPPED
324    foo            cat  324                 h
325    bar       elephant  325           DROPPED
326    foo        cupcake  326           DROPPED
328    bar  exercisebench  328                 m

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM