简体   繁体   中英

Combine two tables based on certain criteria using python

I have two tables (table1, table2) of the following:

table1:

ID Filename
12345 12345.txt
12346 12346.txt
12347 12347.txt
12348 12348.txt
12349 12349.txt
12350 12350.txt

table2: contains the path where table 1 files are present

Path
/table/text3/12349.txt
/table/text1/12345.txt
/table/text2/12346.txt
/table/text1/12350.txt
/table/text3/12347.txt
/table/text1/12348.txt

How do I combine these two files, such that, the path and filenames are matched. What I tried so far?

pd.concat([table1, table2])

I also tried pd.merge but it does not match with the filename. How do I solve this?

The desired output:

ID Filename Path
12345 12345.txt /table/text1/12345.txt
12346 12346.txt /table/text2/12346.txt
12347 12347.txt /table/text3/12347.txt
12348 12348.txt /table/text1/12348.txt
12349 12349.txt /table/text3/12349.txt
12350 12350.txt /table/text1/12350.txt

You can extract the Filename and then do the merge:

merged_df = df2.assign(Filename = df2['Path'].str.rsplit('/', 1).str[-1]).merge(df1, on = 'Filename')

Option via pathlib :

from pathlib import Path
merged_df = df2.assign(Filename = df2['Path'].apply(lambda x: Path(x).name)).merge(df1, on = 'Filename')

OUTPUT:

                     Path   Filename     ID
0  /table/text3/12349.txt  12349.txt  12349
1  /table/text1/12345.txt  12345.txt  12345
2  /table/text2/12346.txt  12346.txt  12346
3  /table/text1/12350.txt  12350.txt  12350
4  /table/text3/12347.txt  12347.txt  12347
5  /table/text1/12348.txt  12348.txt  12348

You can make a temporary column Filename on df2 by .assign() and use the resulting copy of df2 with newly added column to merge with df1 using .merge() on the common column Filename , as follows:

df_merge = df1.merge(df2.assign(Filename=df2['Path'].str.split('/').str[-1]), on='Filename')

Result:

print(df_merge)



      ID   Filename                    Path
0  12345  12345.txt  /table/text1/12345.txt
1  12346  12346.txt  /table/text2/12346.txt
2  12347  12347.txt  /table/text3/12347.txt
3  12348  12348.txt  /table/text1/12348.txt
4  12349  12349.txt  /table/text3/12349.txt
5  12350  12350.txt  /table/text1/12350.txt

Edit

If you still want to show the entries of df1 even when there is no matching entries in df2 , you can use left merge with parameter how='left' , as follows:

df_merge = df1.merge(df2.assign(Filename=df2['Path'].str.split('/').str[-1]), on='Filename', how='left')

Result:

If the path /table/text3/12347.txt is missing:

      ID   Filename                    Path
0  12345  12345.txt  /table/text1/12345.txt
1  12346  12346.txt  /table/text2/12346.txt
2  12347  12347.txt                     NaN
3  12348  12348.txt  /table/text1/12348.txt
4  12349  12349.txt  /table/text3/12349.txt
5  12350  12350.txt  /table/text1/12350.txt

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM