![](/img/trans.png)
[英]Creating pandas DataFrame column from selected values from another column
[英]pandas selected columns from second dataframe where another column's values exist in a primary dataframe
我正在努力解決一個特定的問題。 我有兩個長度不同,索引不同的熊貓數據框。 對於df1中包含的每個項目,我想查看df2並采用幾列(df1中不包含),其中df2列之一的值等於df1中的值。 例:
import pandas as pd
data_1 = {'TARGET_NAME':['fishinghook', 'doorlock', 'penguin', 'ashtray', 'cat', 'elephant', 'cupcake', 'exercisebench'],
'FOOBAR':['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'],
'ix':[320, 321, 322, 323, 324, 325, 326, 328]}
data_2 = {'IMAGE_NAME':['cat', 'penguin', 'jewelrybox', 'exercisebench', 'doorlock', 'jar', ],
'VALUES_1':['h', 'h', 'c', 'm', 'h', 'f'],
'VALUES_2':['hm', 'hl', 'cm', 'ml', 'hh', 'fl'],
'ix':[616, 617, 618, 619, 620, 621]}
desired = {'TARGET_NAME':['fishinghook', 'doorlock', 'penguin', 'ashtray', 'cat', 'elephant', 'cupcake', 'exercisebench'],
'FOOBAR':['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'],
'PRODUCED_VALUES_1':['DROPPED', 'h', 'h', 'DROPPED', 'h', 'DROPPED', 'DROPPED', 'm'],
'ix':[320, 321, 322, 323, 324, 325, 326, 328]}
df1 = pd.DataFrame(data_1, index=data_1['ix'])
df2 = pd.DataFrame(data_2, index=data_2['ix'])
desired_df = pd.DataFrame(desired, index=desired['ix'])
df1
Out[2]:
FOOBAR TARGET_NAME ix
320 foo fishinghook 320
321 bar doorlock 321
322 foo penguin 322
323 bar ashtray 323
324 foo cat 324
325 bar elephant 325
326 foo cupcake 326
328 bar exercisebench 328
df2
Out[3]:
IMAGE_NAME VALUES_1 VALUES_2 ix
616 cat h hm 616
617 penguin h hl 617
618 jewelrybox c cm 618
619 exercisebench m ml 619
620 doorlock h hh 620
621 jar f fl 621
desired_df
Out[4]:
FOOBAR PRODUCED_VALUES_1 TARGET_NAME ix
320 foo DROPPED fishinghook 320
321 bar h doorlock 321
322 foo h penguin 322
323 bar DROPPED ashtray 323
324 foo h cat 324
325 bar DROPPED elephant 325
326 foo DROPPED cupcake 326
328 bar m exercisebench 328
我想查看df1 ['TARGET_NAME']中的每個值,並使其等於df2 ['IMAGE_NAME'],從df2中提取VALUES_1和VALUES_2列,然后將這些詳細信息添加到df1(或df1的副本)中。 如果它在df2中的任何地方都不匹配(因為位置也不同),那么我希望它寫其他內容(例如,“ DROPPED”)。 理想情況下,我希望df1索引保持不變。
任何幫助表示贊賞!
通過重命名列,您可以在外合並數據,然后與你想要的列名重命名列,然后填充produced_values的楠dropped
和下降的NaN的。 最后設置df1
索引。
ndf = df1.merge(df2.rename(columns = {'IMAGE_NAME':'TARGET_NAME'}),how='outer',on='TARGET_NAME')
ndf = ndf.drop(['ix_y','VALUES_2'],1).rename(columns={'ix_x':'ix','VALUES_1':'PRODUCED_VALUES_1'})
ndf['PRODUCED_VALUES_1'] = ndf['PRODUCED_VALUES_1'].fillna('Dropped')
ndf = ndf.dropna().set_index(df1.index)
FOOBAR TARGET_NAME ix PRODUCED_VALUES_1 320 foo fishinghook 320.0 Dropped 321 bar doorlock 321.0 h 322 foo penguin 322.0 h 323 bar ashtray 323.0 Dropped 324 foo cat 324.0 h 325 bar elephant 325.0 Dropped 326 foo cupcake 326.0 Dropped 328 bar exercisebench 328.0 m
In [34]: df1['PRODUCED_VALUES_1'] = \
df1['TARGET_NAME'].map(df2.set_index('IMAGE_NAME')['VALUES_1']) \
.fillna('DROPPED')
In [35]: df1
Out[35]:
FOOBAR TARGET_NAME ix PRODUCED_VALUES_1
320 foo fishinghook 320 DROPPED
321 bar doorlock 321 h
322 foo penguin 322 h
323 bar ashtray 323 DROPPED
324 foo cat 324 h
325 bar elephant 325 DROPPED
326 foo cupcake 326 DROPPED
328 bar exercisebench 328 m
或類似於@Bharath shetty的解決方案的單線:
In [26]: df1.merge(df2[['IMAGE_NAME','VALUES_1']].rename(columns={'IMAGE_NAME':'TARGET_NAME'}),
...: how='left') \
...: .fillna('DROPPED') \
...: .rename(columns=lambda c: 'PRODUCED_' + c if c=='VALUES_1' else c) \
...: .set_index(df1.index)
...:
Out[26]:
FOOBAR TARGET_NAME ix PRODUCED_VALUES_1
320 foo fishinghook 320 DROPPED
321 bar doorlock 321 h
322 foo penguin 322 h
323 bar ashtray 323 DROPPED
324 foo cat 324 h
325 bar elephant 325 DROPPED
326 foo cupcake 326 DROPPED
328 bar exercisebench 328 m
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.