I've a problem which is annoying the hell out of me!
I have a database with several thousand users. The data originally came from a database which I cannot trust data from, so I have imported it into another 'clean-up' database to remove duplicate entries.
I performed the query:
SELECT uid, username
FROM users
GROUP BY username
HAVING COUNT(username)>1
This is a sample of my table in its present state:
uid forename surname username
1 Jo Bloggs jobloggs
2 Jo Bloggs jobloggs
3 Jane Doe janedoe
4 Jane Doe janedoe
After performing the query above, I get the following sample result:
uid forename surname username
2 Jo Bloggs jobloggs
As you can see, there are 2 duplicate users, however the query is only displaying one of these.
When I perform the query, I get 300~ results. Obviously if the query isn't pulling all the duplicates, I cant trust this result set to be accurate and can't proceed with the clean up.
Any idea's about what I can try?
Thanks
Phil
There's no good explanation for the resultset that is being returned.
According to the sample data, and your query, then you should be getting a second row:
3 janedoe
(Actually, it's arbitrary whether you get a uid value of 3 or 4 returned.)
Also, Be sure that your client is returning just a subset of rows, eg SQLyog has a "Limit rows" feature which limits the number of rows returned.
If that's not the issue, then the most likely explanation is that one of the 'janedoe' includes non-printable characters, or you've got some wicked characterset conversions going on where two different encodings are displaying the same value.
As a quick first step, I'd suggest you check the number of characters in each of those 'janedoe' values:
SELECT username, LENGTH(username) FROM mytable WHERE uid IN (3,4) ORDER BY uid
Also, you could try displaying the actual encodings, using the HEX() function to see if there's a difference. (NOTE: It's not clear to me whether a characterset translation occurs before or after the HEX, what we're after here is a MySQL equivalent of the Oracle DUMP() function, which will display a byte by byte representation of the actual value.)
It's possible that you've got some Latin1 encodings mangled into UTF-8, or vice versa, or some other characterset weirdness going on. This may give you some ideas...
SELECT username
, HEX(username)
, HEX(BINARY username)
, CONVERT(BINARY username USING latin1)
, CONVERT(BINARY username USING utf8)
FROM mytable
WHERE uid IN (3,4)
ORDER BY uid
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.