I am using this API to retrieve around 24.000 items from it.
So first I take the items list from here (Warning, slow browsers may crash).
Then I loop over all the items and find all the info for each item. Something like:
https://api.guildwars2.com/v2/items/itemidhere
and the insert the info into MySQL database.
PD: real question starts after this line below.
Im trying to find the fastest way to get the info from those links and insert it. For this I am using:
-GSON library (easiest and fastest way to control JSON)
-HikariCP (for database connection pools)
-Threads (each thread takes care of 1000 items if there are 24 threads)
I did some tests and here are the results for collecting and inserting the 24.000 items:
-Threads: 50
-DB Pool size: 10
-Time: 644 seconds
-Threads: 100
-DB Pool size: 10
-Time: 607 seconds
-Threads: 250
-DB Pool size: 15
-Time: 662 seconds
-Threads: 500
-DB Pool size: 20
-Time: 689 seconds
I know the slowest thing here is the network.
My computer and internet arent that slow:
-300mb/s internet
-Intel 5820k
-16GB DDR4
So whats left may be the code implementation...
HikariConfig config = new HikariConfig();
config.setDriverClassName("com.mysql.jdbc.Driver");
config.setJdbcUrl("jdbc:mysql://localhost:3306/mydb");
config.setUsername("root");
config.setPassword("none");
config.addDataSourceProperty("cachePrepStmts", "true");
config.addDataSourceProperty("prepStmtCacheSize", "250");
config.addDataSourceProperty("prepStmtCacheSqlLimit", "2048");
config.setMaximumPoolSize(depending on case);
Core.ds = new HikariDataSource(config);
This is the setup for the database connection pool. I start the threads with a Cyclic barrier:
final CyclicBarrier _threadGate = new CyclicBarrier(depends on case);
ArrayList<Thread> _threadList = new ArrayList<>();
And then
_threadList.add(new Thread() {
@Override
public void run() {
try {
_threadGate.await();
//Parsing happens a bit later
Here I loop through all the list and get the info from the url (I skipped variable declaration):
_id = _itemList.get(i);
_stringUrl = "https://api.guildwars2.com/v2/items/" + _id;
_responseText = new URL(_stringUrl);
_requestUrl = (HttpURLConnection) _responseText.openConnection();
_requestUrl.connect();
_requestStatus = _requestUrl.getResponseCode();
if(_requestStatus == 200){
_jsonParser = new JsonParser();
_rootElement = _jsonParser.parse(new InputStreamReader((InputStream) _requestUrl.getContent(), "UTF-8"));
_rootObject = _rootElement.getAsJsonObject();
And the from the _rootObject
do a lot of parsing and checking if json exits etc etc... And at the end the insertion...
Here is how I start the threads after everything is processed in the main class:
for (int i = 0; i < _threadList.size(); i++) {
_threadList.get(i).start();
}
INFO: here about why didnt I use bigger pool size.
What i DO NOT understand is:
-why if there are more threads the result is slower
-I mean,cmon maybe the network is slow but, few requests could fill 300mb/s?
-would implementing a better code make this faster?
I actually see it like this:
-more threads -> slower internet, which makes pulling info slow.
-bigger pool size -> slower insertion due to many connections
-more threads and small connection pool -> inserts queued and stalled
-few threads and small connection pool -> slow pull info
Updates
-tried classic connection over pooled, 1 connection/pool and results are slower, like 30 seconds slower
-tried ExecutorService over CyclicBarrier, results slower by 10 seconds.
(Too many questions for a 'comment.)
I'm confused -- you are "retrieving data" and you are "inserting" it. Which side should we focus on? Which side do you have control over?
You are getting only 30-40 rows inserted per second? That is pathetic.
Let's focus on how you are doing the INSERTs
into the MySQL table. Please provide SHOW CREATE TABLE
-- I need to see the Engine and indexes and other stuff. Please provide some clues about the INSERTs
-- One row at a time vs batched (10x improvement here)? Sequential versus random PRIMARY KEY
? How big are the tables? How big is the buffer_pool? What version of MySQL (newer versions have some extra tricks)?
There is some contention between threads, so "too many" threads can actually slow down activity. But I think that is a secondary issue.
What percent of the 306Mb/s is consumed?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.