简体   繁体   English

并发的sqlite用python写

[英]concurrent sqlite writes in python

I've got a python application (Gtk) which uses threads to fetch information from certain sites and writes them to the datebase. 我有一个python应用程序(Gtk),该应用程序使用线程从某些站点获取信息并将其写入日期数据库。

I've got a thread that checks for new updates at site1 , if there are updates I receive a json object ( json1 ). 我有一个线程在site1上检查新更新,如果有更新,我会收到一个json对象( json1 )。 I will then iterate through json1 and insert the new information to the datebase, within json1 there is a result I need to use to fetch more information at site2 . 然后,我将遍历json1并将新信息插入日期数据库,在json1内有一个结果,我需要使用它来在site2处获取更多信息。 I will recive a json object( json2 ) at site2 as well. 我还将在site2接收一个json对象( json2 )。

So the situation is something like this 所以情况是这样的

def get_more_info(name):
    json2 = get(www.site2.com?=name....)
    etc

for information in json1:
    db.insert(information)
    get_more_info(information.name)

From this situation I see that there are a couple of ways of doing this. 从这种情况下,我看到有两种方法可以做到这一点。

get_more_info to return json object so that get_more_info返回json对象,以便

for information in json1:
    db.insert(information)
    json2 = get_more_info(information.name)
    for info in json2:
        db.insert(info)
db.commit()

get_more_info to do the inserting get_more_info进行插入

for information in json1:
    db.insert(information)
    get_more_info(information.name)
db.commit()

Both of these ways seem a bit slow since the main for loop will have to wait for get_more_info to complete before carrying on and both json1 and json2 could be large, there is also the possiblity that site2 is unavailiable at that moment, causing the whole transaction to fail. 这两种方法似乎有点慢,因为main for循环将不得不等待get_more_info完成才进行操作,并且json1json2都可能很大, 这时有可能site2不可用 ,从而导致整个事务失败。 The application can still function without json2 , that data can be fetched at a later time if needed. 该应用程序仍可以在没有json2的情况下运行 ,如果需要,可以在以后的时间获取数据。

So I was thinking of passing information.name to a queue so that the main loop can continue and kick off a thread that will monitor that queue and excute get_more_info . 所以我在考虑将information.name传递到队列,以便主循环可以继续并启动一个线程,该线程将监视该队列并执行get_more_info Is this the right approach to take? 这是正确的方法吗?

I know that sqlite does not perform concurrent writes, If I recall correctly if get_more_info tries to write while the main for loop is busy, sqlite will output OperationalError: database is locked . 我知道sqlite不会执行并发写入,如果在main for循环忙时get_more_info尝试写入时,如果我正确地调用,sqlite将输出OperationalError: database is locked

Now what happends to get_more_info at that point, does it get put into sometype of write queue or does it wait for the main loop to complete and what happens to the main for loop when get_more_info is busying writing? 现在,此时get_more_info会发生什么情况,它是否被放入某种类型的写队列中,还是等待主循环完成?当get_more_info编写时,主for循环会发生什么?

Will there be a need to go to another database engine? 是否需要使用另一个数据库引擎?

Since you are using threads always, you can use an other thread to write to the database. 由于您始终使用线程,因此可以使用其他线程写入数据库。 In order to feed it with the data you should use a globally accessible Queue.Queue() ( queue.Queue() in Python3) instance. 为了向其中提供数据,您应该使用全局可访问的Queue.Queue() (在Python3中为queue.Queue() )实例。 Using the instances get() method with block=true will make the thread wait for data to write. 将实例get()方法与block=true将使线程等待数据写入。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM