I have two tables in my MySQL database, users and tweets, as follows:
TABLE users (
uid int(7) NOT NULL AUTO_INCREMENT,
twitter_uid int(10) NOT NULL,
screen_name varchar(255) NOT NULL,
`name` varchar(255) NOT NULL,
tweets int(6) NOT NULL,
followers_count int(7) NOT NULL,
statuses_count int(7) NOT NULL,
created_at int(10) NOT NULL,
PRIMARY KEY (uid)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
TABLE tweets (
tweet_id int(11) NOT NULL AUTO_INCREMENT,
`query` varchar(5) NOT NULL,
id_str varchar(18) NOT NULL,
created_at int(10) NOT NULL,
from_user_id int(11) NOT NULL,
from_user varchar(256) NOT NULL,
`text` text NOT NULL,
PRIMARY KEY (tweet_id),
KEY id_str (id_str)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
The tweets table contains over 2 million records. I have put the unique users (taken from tweets.from_user) in the users table. It now contains 94,100 users. I now want to count the number of tweets each user made, as follows (in PHP):
res = db_query('SELECT uid, screen_name FROM users WHERE tweets = 0 LIMIT 150');
while ($user = db_fetch_object($result)) {
$res2 = db_query(
"SELECT COUNT(tweet_id) FROM tweets WHERE from_user = '%s'",
$user->screen_name
);
$cnt = db_result($result2);
db_query("UPDATE users SET tweets = %d WHERE uid = %d", $cnt, $user->uid);
}
This code however, is EXTREMELY slow. It takes about 5 minutes to count the tweets of 150 users. Going at this rate, it will take about 3 days to complete this task for all users.
My question is - I MUST be missing something here. Perhaps there is a more efficient query possible or I should change something to the database structure? Any help would be greatly appreciated :)
I think worst problem here is having multiple queries. That's most likely worse than just an issue with indexes. You should try to have one query only.
UPDATE users
SET users.tweets = (SELECT COUNT(tweet_id)
FROM tweets
WHERE tweets.from_user = users.uid
AND users.tweets =0
)
have you indexed all relevant attributes? escpecially from_user should have an index!
I'd start by condensing all of that into a single UPDATE statement:
UPDATE users
SET tweets =
( SELECT COUNT(1)
FROM tweets
WHERE tweets.from_user = users.screen_name
)
WHERE users.tweets = 0
LIMIT 150
;
and then I'd look at indices. In particular, make sure there's an index on tweets.from_user
. (See http://dev.mysql.com/doc/refman/5.0/en/create-index.html for how to create an index on a table columns.)
While you could significantly speed-up the updating of users.tweets
by "condensing" these SQL statements into one (as suggested by other answers), what will you do when user makes a new tweet? How will know that users.tweets
needs to be updated again?
users.tweets
whenever a row is deleted from or inserted into the tweets
table, or when tweets.from_user
is modified. users.tweets
altogether and just count the tweets dynamically on as-needed basis. In any case, to speed up the SELECT COUNT(tweet_id) FROM tweets WHERE from_user = '%s'
query, you'll need to create an index on {from_user}. Since tweet_id is NOT NULL, COUNT(tweet_id)
is equivalent to COUNT(*)
- otherwise a composite index on {from_user, tweet_id} would be needed.
第一步是将索引添加到经常用作搜索条件的列。
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.