MySQL: Find similar repeated entries

Question

Given my table

id | name
 01| Test Name
 02| Name Test
 03| Another name
 ...
 ...
 nn| Test string

I'd like to do the following, for each entry, read the first word, until a space, so in this example i would read Test, then, find all similar entries which contain Test anywhere on the string, then continue with Name , Another , and so on.

I don't want to do this manually, as I'll have to make a lot of queries, the idea is the data is imported from an old excel spreadsheet and the client wants to get repeated names, so Test Name , Test Something Name and Name Test are potential similar names.

Answer 1

You want to run a query, select all the names, return that to php, then loop through it, parse the string into separate words, and run a query with a fulltext index.

here's something to get you started. http://www.databasejournal.com/sqletc/article.php/1578331/Using-Fulltext-Indexes-in-MySQL---Part-1.htm

Answer 2

Here's my db solution:

SELECT * 
FROM princess a
INNER JOIN (SELECT 
        DISTINCT CASE
            WHEN name LIKE '% %' 
                THEN SUBSTR(name, 1, LOCATE(' ', name) - 1)
            ELSE name
        END AS 'name'
    FROM princess) b ON a.name LIKE CONCAT('%', b.name ,'%')

This will find the DISTINCT names (before a space) and then JOIN to the original table using LIKE .

You may also consider using INSTR(a.name, b.name) in place of b.name LIKE CONCAT('%', a.name ,'%') depending on how your EXPLAIN looks.

Answer 3

There are a ton of examples of how to split a string into multiple rows in MySQL, eg this one .

After that, you can find the exact matches easily. If you want non-exact matches, check out SOUNDEX()

MySQL: Find similar repeated entries

Question

3 answers

solution1
3 2012-09-06 00:04:24

solution2
2 ACCPTED 2012-09-06 00:24:07

solution3
0 2012-09-06 00:15:53

MySQL: Find similar repeated entries

Question

3 answers

solution1 3 2012-09-06 00:04:24

solution2 2 ACCPTED 2012-09-06 00:24:07

solution3 0 2012-09-06 00:15:53

solution1
3 2012-09-06 00:04:24

solution2
2 ACCPTED 2012-09-06 00:24:07

solution3
0 2012-09-06 00:15:53