简体   繁体   中英

adding new rows to one data frame from another on a condition [python]

I have two csv files and I am trying to combine them in one based on a condition.

they both have the 'Import_ID' column which gives unique id for each row. What I need to do is to check if the Import_ID from df1 already exist in df2, if not add this row, else skip it. It would be best if it would be done without the loops or iterrows().

df1
 Import_ID            State                  ProductName
0     7027514279_100    716        TRANSCEND ESD220C 240GB SSD extern
1   7027514279_200.1    716        INTELLINET 48,3cm 19Zoll Steckdosenleist
2   7027514279_200.2    716        INTELLINET 48,3cm 19Zoll Steckdosenleist
3   7027514279_200.3    716        INTELLINET 48,3cm 19Zoll Steckdosenleist
4   7027514279_200.4    716        INTELLINET 48,3cm 19Zoll Steckdosenleist

and df2
   Import_ID          State                  ProductName
0     7027329459_100    716        BROTHER MFCJ6530DW MULTI-FUNCTION (AT)
1   7027514279_200.2    716        INTELLINET 48,3cm 19Zoll Steckdosenleist
2   7027514279_200.3    716        INTELLINET 48,3cm 19Zoll Steckdosenleist
3     7027329459_400    716        BROTHER LC-3219XLY Tinte Gelb (1500 S)
4     7027329459_500  20010        BROTHER LC-3219XLBK Tinte Schwarz

what I would like it to be is that df1 rows are only then imported when their Import_ID doesn't exist in the df2

df2 updated:
 Import_ID          State                  ProductName
0     7027329459_100    716        BROTHER MFCJ6530DW MULTI-FUNCTION (AT)
1   7027514279_200.2    716        INTELLINET 48,3cm 19Zoll Steckdosenleist
2   7027514279_200.3    716        INTELLINET 48,3cm 19Zoll Steckdosenleist
3     7027329459_400    716        BROTHER LC-3219XLY Tinte Gelb (1500 S)
4     7027329459_500  20010        BROTHER LC-3219XLBK Tinte Schwarz
5     7027514279_100    716        TRANSCEND ESD220C 240GB SSD extern
6   7027514279_200.1    716        INTELLINET 48,3cm 19Zoll Steckdosenleist
7   7027514279_200.4    716        INTELLINET 48,3cm 19Zoll Steckdosenleist

so in the example the df1 rows 1 nad 2 are not copied, because they already exist.

I tried by creating a temp file where I would store all of the rows from df1 that don't exist in df2 but it didn't work.

temp = numpy.where(file_in['Import_ID'] not in finalData['Import_ID'], file_in)

I don't know what else I should try , since I am pretty new to python.

Using merge with indicator more info

s=df1.merge(df2,how='left',indicator=True)
pd.concat([df2,s.loc[s['_merge']=='left_only'].drop('_merge',axis=1)])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM