In the function sqlPull() I pull the most recent 5 entries from a MySQL database every 5 seconds. In the second function dupCatch() I am attempting to remove duplicates that would in the n+1 SQL pull when compared to n. I want to save only the unique list of tuples, but right now the function is printing the same list of tuples 5 times every five seconds.
In english what I am attempting to do with dupCatch() is take the data from sqlPull(), initialize and empty list and say for all of the tuples in the variable data if that tuple is not in the empty list, add it to the newData variable, if not, set lastPull equal to the non-unique tuples.
Obviously, my function is wrong, but I'm not sure how to fix it.
import mysql.connector
import datetime
import requests
from operator import itemgetter
import time
run = True
def sqlPull():
connection = mysql.connector.connect(user='XXX', password='XXX', host='XXXX', database='MeshliumDB')
cursor = connection.cursor()
cursor.execute("SELECT TimeStamp, MAC, RSSI FROM wifiscan ORDER BY TimeStamp DESC LIMIT 5;")
data = cursor.fetchall()
connection.close()
time.sleep(5)
return data
def dupCatch():
data = sqlPull()
lastPull = []
for (TimeStamp, MAC, RSSI) in data:
if (TimeStamp, MAC, RSSI) not in lastPull:
newData = data
else:
lastPull = data
print newData
while run == True:
dupCatch()
This is what the output I am getting now looks like:
[(datetime.datetime(2013, 11, 14, 20, 28, 54), u'E0:CB:1D:36:EE:9D', u' 20'), (datetime.datetime(2013, 11, 14, 20, 28, 53), u'00:1E:8F:75:82:35', u' 21'), (datetime.datetime(2013, 11, 14, 20, 28, 52), u'78:E4:00:0C:50:DF', u' 33'), (datetime.datetime(2013, 11, 14, 20, 28, 52), u'00:1E:4C:03:C0:66', u' 26'), (datetime.datetime(2013, 11, 14, 20, 28, 52), u'78:E4:00:0C:50:DF', u' 33')]
[(datetime.datetime(2013, 11, 14, 20, 28, 54), u'E0:CB:1D:36:EE:9D', u' 20'), (datetime.datetime(2013, 11, 14, 20, 28, 53), u'00:1E:8F:75:82:35', u' 21'), (datetime.datetime(2013, 11, 14, 20, 28, 52), u'78:E4:00:0C:50:DF', u' 33'), (datetime.datetime(2013, 11, 14, 20, 28, 52), u'00:1E:4C:03:C0:66', u' 26'), (datetime.datetime(2013, 11, 14, 20, 28, 52), u'78:E4:00:0C:50:DF', u' 33')]
[(datetime.datetime(2013, 11, 14, 20, 28, 54), u'E0:CB:1D:36:EE:9D', u' 20'), (datetime.datetime(2013, 11, 14, 20, 28, 53), u'00:1E:8F:75:82:35', u' 21'), (datetime.datetime(2013, 11, 14, 20, 28, 52), u'78:E4:00:0C:50:DF', u' 33'), (datetime.datetime(2013, 11, 14, 20, 28, 52), u'00:1E:4C:03:C0:66', u' 26'), (datetime.datetime(2013, 11, 14, 20, 28, 52), u'78:E4:00:0C:50:DF', u' 33')]
[(datetime.datetime(2013, 11, 14, 20, 28, 54), u'E0:CB:1D:36:EE:9D', u' 20'), (datetime.datetime(2013, 11, 14, 20, 28, 53), u'00:1E:8F:75:82:35', u' 21'), (datetime.datetime(2013, 11, 14, 20, 28, 52), u'78:E4:00:0C:50:DF', u' 33'), (datetime.datetime(2013, 11, 14, 20, 28, 52), u'00:1E:4C:03:C0:66', u' 26'), (datetime.datetime(2013, 11, 14, 20, 28, 52), u'78:E4:00:0C:50:DF', u' 33')]
[(datetime.datetime(2013, 11, 14, 20, 28, 54), u'E0:CB:1D:36:EE:9D', u' 20'), (datetime.datetime(2013, 11, 14, 20, 28, 53), u'00:1E:8F:75:82:35', u' 21'), (datetime.datetime(2013, 11, 14, 20, 28, 52), u'78:E4:00:0C:50:DF', u' 33'), (datetime.datetime(2013, 11, 14, 20, 28, 52), u'00:1E:4C:03:C0:66', u' 26'), (datetime.datetime(2013, 11, 14, 20, 28, 52), u'78:E4:00:0C:50:DF', u' 33')]
[(datetime.datetime(2013, 11, 14, 20, 28, 54), u'E0:CB:1D:36:EE:9D', u' 20'), (datetime.datetime(2013, 11, 14, 20, 28, 53), u'00:1E:8F:75:82:35', u' 21'), (datetime.datetime(2013, 11, 14, 20, 28, 52), u'78:E4:00:0C:50:DF', u' 33'), (datetime.datetime(2013, 11, 14, 20, 28, 52), u'00:1E:4C:03:C0:66', u' 26'), (datetime.datetime(2013, 11, 14, 20, 28, 52), u'78:E4:00:0C:50:DF', u' 33')]
Assuming you're only trying to filter out adjacent repeats, not repeats ever seen…
First, the first time you find a tuple that's in lastPull
, you're going to set lastPull = data
. That means all of the subsequent tuples will automatically be in lastPull
.
Meanwhile, you're setting either lastPull
or newData
each time through the loop. So, one of these is going to happen:
newData
(repeatedly) and not update lastPull
. newData
and also update lastPull
. lastPull
. This can't be the logic you wanted. I think what you want to use any
or all
, or to put a break
in one of the conditions and put opposite in an else
clause on the for
, but I'm not honestly sure what you're trying to do here.
Meanwhile, your code always does a print newData
each time through the loop. So, for each tuple, you're going to print all of the tuples. As mentioned above, this will always be the new ones if the first tuple is new, otherwise the previous ones. Again, this can't be what you want, but I'm not sure what you do want. Maybe you want to print newData
outside the loop, instead of each time through?
On top of all that, you say you want to add things to the newData
list, but in your code you're just replacing the variable over and over. To add things to a list, you need to call append
on it. (Or extend
, if you have a list of new things to add all in one go.)
Rather than try to figure our what your code is trying to do and fix it, let's go back to your English description:
In english what I am attempting to do with dupCatch() is take the data from sqlPull(), initialize and empty list and say for all of the tuples in the variable data if that tuple is not in the empty list, add it to the newData variable, if not, set lastPull equal to the non-unique tuples.
So:
seen = set()
def dupCatch():
data = sqlPull()
new_data = []
for (TimeStamp, MAC, RSSI) in data:
if (TimeStamp, MAC, RSSI) not in seen:
seen.add((TimeStamp, MAC, RSSI))
new_data.append((TimeStamp, MAC, RSSI))
print new_data
Or, more concisely:
seen = set()
def dupCatch():
data = sqlPull()
newData = [row for row in data if row not in seen]
seen.update(newData)
print new_data
Either way, the trick here is that we have a set which keeps track of every row we've ever seen. So, for each new row, if it's in that set, we've seen it and can ignore it; otherwise, we have to not ignore it, and add it to the set for later.
The second version just simplifies things by filtering all 5 rows at once, and then update
-ing the set with all of the new ones at once, instead of doing it row by row.
The reason that seen
has to be global is that a global lives forever, across all runs of the function, so we can use it to keep track of every row we've ever seen; if we made it local to the function, it would be new each time, so we'd only be keeping track of rows we've seen in the current batch, which isn't very useful.
In general, globals are bad. However, things like persistent caches are an exception to the "in general" rule. The whole point of them is that they're not local. If you had an object model in mind that made sense, seen
would be much better as a member of whatever object dupCatch
was a method on than as a global. If you had a good reason to define the function as a closure inside another function, seen
would be better as part of that closure. And so on. But otherwise, a global is the best option.
If you reorganized your code a bit, you could make this even simpler:
def pull():
while True:
for row in sqlPull():
yield row
for row in unique_everseen(pull()):
print row
… or even:
for row in unique_everseen(chain.from_iterable(iter(sqlPull, None))):
print row
See Iterators and the next few tutorial sections, the itertools
documentation, and David M. Beazley's presentations to understand what this last version does. But for a novice, you might want to stick with the second version.
Try this:
def dupCatch():
data = sqlPull()
lastPull = []
for x in data:
if x not in lastPull:
print(x)
lastPull.append(x)
The problem is that lastPull
is a local variable, so it gets set to []
every time, and doesn't persist between function calls. For what you're trying to do, you should use a class and store the last pull there:
import mysql.connector
import datetime
import requests
import time
class SqlPuller(object):
def __init__(self):
self.last_pull = set()
def pull(self):
connection = mysql.connector.connect(user='XXX', password='XXX',
host='XXXX', database='MeshliumDB')
cursor = connection.cursor()
cursor.execute("SELECT TimeStamp, MAC, RSSI FROM wifiscan ORDER BY TimeStamp DESC LIMIT 5;")
data = cursor.fetchall()
connection.close()
return data
def pull_new(self):
new_data = []
data = self.pull()
for item in data:
if item not in self.last_pull:
new_data.append(item)
self.last_pull = set(data)
return new_data
if __name__ == "__main__":
sql_puller = SqlPuller()
while True:
for item in sql_puller.pull():
print(item)
time.sleep(5)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.