简体   繁体   中英

Make MySQL table unique

Hay, I created a spider to crawl through a PDF document and log every word in the document into a table in a MySQL database.

Obviously words like 'the', 'and', 'or' etc appear in a book many, many times.

I'm just wondering what's the quickest method to remove dupe values from a table?

Create a table without indexing the words and put in all the words from the book using mass inserts (you could also use LOAD DATA). When you're done with insertions, add a new Index on the word field

Then create a second table using:

CREATE TABLE newTable SELECT DISTINCT word FROM oldTable

Instead of removing duplicates, you could make sure that no duplicates ever make it into the table.

Presuming your table has only 2 fields, id and word:

INSERT INTO table SELECT null, 'word' FROM table WHERE NOT EXISTS (SELECT * FROM table WHERE word = 'word') LIMIT 1;

This will insert the word into the table only if it's not already in there

If you can rerun the script to populate the database, you could add a unique key on the "word" field and instead of INSERT INTO do a REPLACE INTO. This will delete the previous instance of the record before adding a duplicate field. This may not be the most efficient way to do it, but it's rather simple. See here for more details:

http://dev.mysql.com/doc/refman/5.0/en/replace.html

delete from words where idcolumn not in
  (select min(idcolumn) 
   from words T2 
   where T2.plain = WordsTable.plain)

This works if you added (idcolumn, plain) for every word you found.

If you do not have an id column (pk) then you can use Anax's solution.

In addition to not inserting duplicates (codeburger comment), you can just set a unique index on your plain column.

select distinct on word field, and then delete all rows that have a different id? I'm not a master in subqueries so no example atm :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM