Python remove couple of duplicates from List

Question

i Know similar questions have already an answer but i think my case is a little bit different. I have a mysql database with a big table (40.000+ entries) Table structure is this :

    Field    |  Type       |Null |Key  |Default |   Extra   
    -----------------------------------------------------
    Messaggio|  longtext   |NO   |     |NULL    |
    Id       |  bigint(20) |NO   |     |NULL    |
    Data     |  date       |NO   |     |NULL    |
    Partito  |  text       |NO   |     |NULL    |
    Numero   |  bigint(23) |NO   |PRI  |NULL    |auto_increment

I have to remove duplicates of rows that have same values in 'Messaggio','Id' and 'Partito', for example:

 Messaggio |Id      | Data      | Partito    | numero   |
----------------------------------------------------------
long_text1 | 123    | somedate  | M5s        |  1       |
long_text1 | 123    | somedate  | M5s        |  2       |
long_text2 | 123    | somedate  | M5s        |  3       |

In this case i have to delete one of the first 2 entries.

i've tried this

db = MySQLdb.connect(host="localhost", port=xxxxx, user="xxxxxxx", passwd="xxxxxx", db="xxxxx", charset='utf8',  use_unicode=True)db.ping(True)

cursor = db.cursor()

cursor.execute("SET NAMES utf8;")

cursor.execute("SELECT `Messaggio`, `Id`, `Data`, `Partito`, `Numero` FROM `Statuses` WHERE 1")

data = cursor.fetchall()

data2 = (dict((x[0], x) for x in data).values()

print (data2)
print (len(data))
print (len(data2))

Output:

- a very long list
- 41804
- 39558

Is not clear to me what this code ( (dict((x[0], x) for x in data).values() ) do ( i'm pretty to new to python and also i'have to figure out how dictionary works). first tought was that it delete identical lists (with same values in the 5 fields) but this is not posible because field 'Numero' is AI so it cant have duplicates (i've checked with a query on Mysql and no duplicates of 'Numero' found)

My questions:

Why that code removed about 2.000 items? It remove any kind of duplicates?
What is the best way to obtain the results?

Answer 1

it removes all lines having the same Messaggio except the very last one, consider the following code:

>>> {1:2, 1:3}
{1: 3}

you are building a dict with multiple assignments to the same key, only the very last does persist

back to:

(dict((x[0], x) for x in data).values()

starting from the end, it lists values for a dictionary

>>> {1:'a', 2:'b'}.values()
['a', 'b']

the dict is created from a generator ("tuple of tuples"):

>>> dict(((1,'a'),(2,'b')))
{1: 'a', 2: 'b'}

the most inner part is like:

>>> list((x[0], x) for x in [[1,2,3], ['a','b','c']])
[(1, [1, 2, 3]), ('a', ['a', 'b', 'c'])]

so I think you want to use:

(dict((x[0], x[1], x[3]), x) for x in data).values()

Python remove couple of duplicates from List

Question

1 answers

solution1
1 ACCPTED 2015-11-22 16:52:51

Python remove couple of duplicates from List

Question

1 answers

solution1 1 ACCPTED 2015-11-22 16:52:51

solution1
1 ACCPTED 2015-11-22 16:52:51