简体   繁体   中英

Best approach when uploading csv files, and check for duplicated lines

HI I am building a php page, where I can upload some csv files from Credit card terminal, to show the owner of the shop, today sales, and make some statistics. I am using MYISAM in my database.

This is just one line, from the csv file, to show what info I got to work with.

Transaction Date: 22-05-2014 00:00:12;

Store: MCdonalds_denmark;

Terminal POS: 00008101;

Last Oper Num: 138;

Host Code: 88135;

PAN: 4571xxxxxxxxxxx5362;

Operation: Authorizazion req;

POS data Code: 5 - ICC;

Amount: 70;

Acquirer: SDID;

Transaction Result: Approved;

How do I avoid duplicate values in MySQLi database, if a user by accident uploading the csv file twice, the filename is not truly unique, right now i am checking every line with a mysql command if NOT EXIST, but it takes about 8 min, to upload a csv file with 500.000 lines, when I use mysqli command NOT EXIST.

I can see that the bigger the table gets, the slower the upload is running ? and the table will only get bigger and bigger over time.

Are there better options, maybe running a cronjob at night, to look for duplicates, or is it just the users problem to avoid uploading the same file twice.

Are there any other totally different work approach to solve the problem ??

First thing. Do you do this like that:

Until lines

Read line - save data to DB;

Read next line

If so... first do this "one time".

Gather all lines, split it to affordable chunks of Update statements and run them massively.

This will save you much time.

Duplicates - If I'd hit big performance issues - i would be adding everything as it is and have cron task which cleans the table.

I found a solution, to the speed issue, i'm indexing all the columns, where I got "WHERE" clause on, and I did not change any sql commands in my php script, it took down, the execution time, down from 15 minutes to 10 sec.

在列上运行索引

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM