I have two dataframes, I want to iterate over the elements in each list in the Companies column and match it with the company names in my second dataframe only if the date from the first dataframe occurs after the date of the second dataframe. I want two columns for the name matches and two columns for the date matches returned.
df = pd.DataFrame(columns=['Customer','Companies', 'Date'])
df = df.append({'Customer':'Gold', 'Companies':['Gold Ltd', 'Gold X', 'Gold De'], 'Date':'2019-01-07'}, ignore_index=True)
df = df.append({'Customer':'Micro', 'Companies':['Microf', 'Micro Inc', 'Micre'], 'Date':'2019-02-10'}, ignore_index=True)
Customer Companies Date
0 Gold [Gold Ltd, Gold X, Gold De] 2019-01-07
1 Micro [Microf, Micro Inc, Micre] 2019-02-10
df2 = pd.DataFrame(columns=['Companies', 'Date'])
df2 = df2.append({'Companies':'Gold Ltd', 'Date':'2019-01-01'}, ignore_index=True)
df2 = df2.append({'Companies':'Gold X', 'Date':'2020-01-07'}, ignore_index=True)
df2 = df2.append({'Companies': 'Gold De', 'Date':'2018-07-07'}, ignore_index=True)
df2 = df2.append({'Companies':'Microf', 'Date':'2019-02-18'}, ignore_index=True)
df2 = df2.append({'Companies':'Micro Inc', 'Date':'2017-09-27'}, ignore_index=True)
df2 = df2.append({'Companies':'Micre', 'Date':'2018-12-11'}, ignore_index=True)
Companies Date
0 Gold Ltd 2019-01-01
1 Gold X 2020-01-07
2 Gold De 2018-07-07
3 Microf 2019-02-18
4 Micro Inc 2017-09-27
5 Micre 2018-12-11
def match_it(d1, d2):
for companies in d1['Companies']:
for company in companies:
if d2['Companies'].str.contains(company).any():
mask = d1.Companies.apply(lambda x: company in x)
dff = d1[mask]
date1 = datetime.strptime(dff['Date'].values[0], '%Y-%m-%d').date()
date2 = datetime.strptime(d2[d2['Companies']==company]['Date'].values[0], '%Y-%m-%d').date()
if date2 < date1:
print(d2[d2['Companies']==company])
new_row = pd.Series([d2[d2['Companies']==company]['Date'], d2[d2['Companies']==company]['Companies']])
return new_row
Desired Output:
Customer Companies Date Name_1 Date_1 Name_2 Date_2
Gold [Gold Ltd, Gold X, Gold De] 2019-01-07 Gold Ltd 2019-01-01 Gold De 2018-07-07
Micro [Microf, Micro Inc, Micre] 2019-02-10 Micro Inc 2017-09-27 Micre 2018-12-11
Start from more pandasonic way to convert Date columns in both DataFrames from string do datetime :
df.Date = pd.to_datetime(df.Date)
df2.Date = pd.to_datetime(df2.Date)
Then proceed as follows:
df3 = df.explode('Companies')
df3 = df3.merge(df2, on='Companies', suffixes=('_x', ''))
df3 = df3[df3.Date_x > df3.Date].drop(columns='Date_x')
df3.rename(columns={'Companies': 'Name'}, inplace=True)
df3['idx'] = df3.groupby('Customer').cumcount()
df3 = df3.pivot(index='Customer',columns='idx')
df3 = df3.swaplevel(axis=1)
df3 = df3.sort_index(axis=1, ascending=[True, False])
cols = []
for i in range(1, df3.columns.size // 2 + 1):
cols.extend(['Name_' + str(i), 'Date_' + str(i)])
df3.columns = cols
result = df.merge(df3, how='left', left_on='Customer', right_index=True)
The result is just as you want.
To comprehend the details run each instruction separately and print the result. It is better to see the result on your own than read the description.
Caution: Explode is a relatively new function, added in Pandas version 0.25 . If you have older version of Pandas , start from upgrading it.
df1 can have more columns.
To test it, I added Xxx column to df1 . The only change required in this case is to block these additional columns from copying to df3 . To do this, the first instruction should be appended with:
.drop(columns=['Xxx'])
(in general case, replace 'Xxx' with the actual list of additional columns).
To check the case of different number of output columns, I changed the Date for Gold X company in df2 to 2019-01-06 , so that this company will also be included in the output.
For your data, with the above changes, the result is:
Customer Companies Date Xxx Name_1 Date_1 Name_2 Date_2 Name_3 Date_3
0 Gold [Gold Ltd, Gold X, Gold De] 2019-01-07 Xxx1 Gold Ltd 2019-01-01 Gold X 2019-01-06 Gold De 2018-07-07
1 Micro [Microf, Micro Inc, Micre] 2019-02-10 Xxx2 Micro Inc 2017-09-27 Micre 2018-12-11 NaN NaT
So, as you can see:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.