So I have a table which stores a URL in a column. Due to URL's being parsed and written differently, there are duplicates in the table. How can I select all rows that have the same domain and path of the URL?
I can select duplicates where the URL is an exact match, but that is not what I want.
Examples,
# This is a duplicate
https://www.example.com/example1
https://example.com/example1
# Not a duplicate
https://example.com/example2
https://example.com/example3
# This is a duplicate
https://example.com/example2/
https://example.com/example2
You can remove the duplicate value using this.
DELETE t1 FROM urls t1
INNER JOIN urls t2
WHERE
t1.id != t2.id AND
t1.url = TRIM(TRAILING '/' FROM REPLACE(t2.url, '://www.', '://'));
This is the example url: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=792b0a7870b1abdd91f13cd4c608ab6a
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.