简体   繁体   English

如何锁定 MySQL 表的读/写,以便我可以 select 然后在没有其他程序读/写数据库的情况下插入?

[英]How do I lock read/write to MySQL tables so that I can select and then insert without other programs reading/writing to the database?

I am running many instances of a webcrawler in parallel.我正在并行运行许多网络爬虫实例。

Each crawler selects a domain from a table, inserts that url and a start time into a log table, and then starts crawling the domain.每个爬虫从表中选择一个域,将 url 和一个开始时间插入到日志表中,然后开始对该域进行爬取。

Other parallel crawlers check the log table to see what domains are already being crawled before selecting their own domain to crawl.其他并行爬虫在选择自己的要爬取的域之前检查日志表以查看哪些域已经被爬取。

I need to prevent other crawlers from selecting a domain that has just been selected by another crawler but doesn't have a log entry yet.我需要防止其他爬虫选择一个刚刚被另一个爬虫选择但还没有日志条目的域。 My best guess at how to do this is to lock the database from all other read/writes while one crawler selects a domain and inserts a row in the log table (two queries).我对如何做到这一点的最佳猜测是在一个爬虫选择一个域并在日志表中插入一行(两个查询)时锁定数据库以防止所有其他读/写操作。

How the heck does one do this?到底是怎么做到的? I'm afraid this is terribly complex and relies on many other things.恐怕这非常复杂,并且依赖于许多其他事情。 Please help get me started.请帮助我开始。


This code seems like a good solution (see the error below, however):这段代码似乎是一个很好的解决方案(但是请参阅下面的错误):

INSERT INTO crawlLog (companyId, timeStartCrawling)
VALUES
(
    (
        SELECT companies.id FROM companies
        LEFT OUTER JOIN crawlLog
        ON companies.id = crawlLog.companyId
        WHERE crawlLog.companyId IS NULL
        LIMIT 1
    ),
    now()
)

but I keep getting the following mysql error:但我不断收到以下 mysql 错误:

You can't specify target table 'crawlLog' for update in FROM clause

Is there a way to accomplish the same thing without this problem?有没有办法在没有这个问题的情况下完成同样的事情? I've tried a couple different ways.我尝试了几种不同的方法。 Including this:包括这个:

INSERT INTO crawlLog (companyId, timeStartCrawling)
VALUES
(
    (
        SELECT id
        FROM companies
        WHERE id NOT IN (SELECT companyId FROM crawlLog) LIMIT 1
    ),
    now()
)

You can lock tables using the MySQL LOCK TABLES command like this:您可以使用 MySQL LOCK TABLES命令锁定表,如下所示:

LOCK TABLES tablename WRITE;

# Do other queries here

UNLOCK TABLES;

See:看:

http://dev.mysql.com/doc/refman/5.5/en/lock-tables.html http://dev.mysql.com/doc/refman/5.5/en/lock-tables.html

You probably don't want to lock the table.您可能不想锁定表。 If you do that you'll have to worry about trapping errors when the other crawlers try to write to the database - which is what you were thinking when you said "...terribly complex and relies on many other things."如果您这样做,您将不得不担心当其他爬虫尝试写入数据库时捕获错误 - 这就是您在说“......非常复杂并且依赖于许多其他事情”时所想的。

Instead you should probably wrap the group of queries in a MySQL transaction (see http://dev.mysql.com/doc/refman/5.0/en/commit.html ) like this: Instead you should probably wrap the group of queries in a MySQL transaction (see http://dev.mysql.com/doc/refman/5.0/en/commit.html ) like this:

START TRANSACTION;
SELECT @URL:=url FROM tablewiththeurls WHERE uncrawled=1 ORDER BY somecriterion LIMIT 1;
INSERT INTO loggingtable SET url=@URL;
COMMIT;

Or something close to that.或者类似的东西。

[edit] I just realized - you could probably do everything you need in a single query and not even have to worry about transactions. [编辑] 我刚刚意识到 - 您可能可以在一个查询中完成您需要的所有事情,甚至不必担心交易。 Something like this:像这样的东西:

INSERT INTO loggingtable (url) SELECT url FROM tablewithurls u LEFT JOIN loggingtable l ON l.url=t.url WHERE {some criterion used to pick the url to work on} AND l.url IS NULL.

Well, table locks are one way to deal with that;好吧,表锁是解决这个问题的一种方法; but this makes parallel requests impossible.但这使得并行请求成为不可能。 If the table is InnoDB you could force a row lock instead, using SELECT... FOR UPDATE within a transaction.如果表是 InnoDB,则可以强制行锁,在事务中使用SELECT... FOR UPDATE

BEGIN;

SELECT ... FROM your_table WHERE domainname = ... FOR UPDATE

# do whatever you have to do

COMMIT;

Please note that you will need an index on domainname (or whatever column you use in the WHERE-clause) for this to work, but this makes sense in general and I assume you will have that anyway.请注意,您将需要一个关于domainname (或您在 WHERE 子句中使用的任何列)的索引才能使其正常工作,但这通常是有意义的,我假设您无论如何都会拥有它。

I wouldn't use locking, or transactions.我不会使用锁定或事务。

The easiest way to go is to INSERT a record in the logging table if it's not yet present, and then check for that record. go 的最简单方法是在记录表中插入一条记录(如果它尚不存在),然后检查该记录。

Assume you have tblcrawels (cra_id) that is filled with your crawlers and tblurl (url_id) that is filled with the URLs, and a table tbllogging (log_cra_id, log_url_id) for your logfile.假设您有tblcrawels (cra_id)填充了爬虫, tblurl (url_id)填充了 URL,还有一个表tbllogging (log_cra_id, log_url_id)用于您的日志文件。

You would run the following query if crawler 1 wants to start crawling url 2:如果爬虫 1 想要开始爬取 url 2,您将运行以下查询:

INSERT INTO tbllogging (log_cra_id, log_url_id) 
SELECT 1, url_id FROM tblurl LEFT JOIN tbllogging on url_id=log_url 
WHERE url_id=2 AND log_url_id IS NULL;

The next step is to check whether this record has been inserted.下一步是检查是否已插入此记录。

SELECT * FROM tbllogging WHERE log_url_id=2 AND log_cra_id=1

If you get any results then crawler 1 can crawl this url.如果你得到任何结果,那么爬虫 1 可以爬取这个 url。 If you don't get any results this means that another crawler has inserted in the same line and is already crawling.如果您没有得到任何结果,这意味着另一个爬虫已插入同一行并且已经在爬取。

I got some inspiration from @Eljakim's answer and started this new thread where I figured out a great trick.我从@Eljakim 的回答中获得了一些灵感,并开始了这个新线程,在那里我发现了一个很棒的技巧。 It doesn't involve locking anything and is very simple.它不涉及锁定任何东西并且非常简单。

INSERT INTO crawlLog (companyId, timeStartCrawling)
SELECT id, now()
FROM companies
WHERE id NOT IN
(
    SELECT companyId
    FROM crawlLog AS crawlLogAlias
)
LIMIT 1

It's better to use row lock or transactional based query so that other parallel request context can access the table.最好使用行锁或基于事务的查询,以便其他并行请求上下文可以访问该表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何锁定MySQL数据库,以使写入/读取/更新/删除..所有内容均被锁定? - How do I lock a MySQL DATABASE so that write/read/update/delete ..everything is locked? 如何解决MySQL INSERT INTO…SELECT导致SELECTED表的写锁定? - How do I resolve MySQL INSERT INTO… SELECT causing write lock for SELECTed table? 在MySql中,如何将数据插入引用其他三个表的表中,然后选择该数据? - In MySql, how do I insert data into a table that references three other tables, and then select that data? 如何锁定MySQL或phpmyadmin中的表? - how can i lock tables in MySQL or phpmyadmin? 我该如何编写此INSERT使其起作用 - how do I write this INSERT so that it works MySQL插入与锁表写 - MySQL Insert with lock tables write 如何从Android的不同表中读取MYSQL数据库? - How can I read MYSQL Database from different tables in Android? 如何锁定 MySQL 中的数据库,以便禁止 >1 个相同用户的实例? - How can I lock a database in MySQL so that >1 instances of same user are prohibited? 如何在Java Swing应用程序中设置字符编码,以便可以将Hindi写入MySQL数据库? - Where do I set the character encoding in a Java Swing application so that I can write Hindi to a MySQL database? 如何将我从远程MySQL数据库选择的一些行插入本地MySQL数据库 - how do I insert some rows that I select from remote MySQL database to my local MySQL database
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM