[英]Multi-threading on a shared List
I have a scenario where there will be a list containing websites and the code block to crawl those websites. 我有一个场景,其中将包含一个包含网站的列表和用于爬网那些网站的代码块。 Is it possible to implement a multi-thread way so that each thread will take 5 or more websites from the list and crawl independently and make sure they donot take the same website which was collected by another thread.
是否可以实现多线程方式,以便每个线程从列表中获取5个或更多网站并独立进行爬网,并确保它们不使用由另一个线程收集的同一网站。
List <String> websiteList;
//crawling code block here
You could use a BlockingQueue
which could be shared by all interested consumers, for example (note, error handling skipped for clarity): 您可以使用可由所有感兴趣的使用者共享的
BlockingQueue
,例如(注意,为清楚起见,跳过了错误处理):
public static void main(String[] args) throws Exception {
// for test purposes add 10 integers
final BlockingQueue<Integer> queue = new LinkedBlockingDeque<Integer>();
for (int i = 0; i < 10; i++) {
queue.add(i); //
}
new Thread(new MyRunnable(queue)).start();
new Thread(new MyRunnable(queue)).start();
new Thread(new MyRunnable(queue)).start();
}
static class MyRunnable implements Runnable {
private Queue<Integer> queue;
MyRunnable(Queue<Integer> queue) {
this.queue = queue;
}
@Override
public void run() {
while(!queue.isEmpty()) {
Integer data = queue.poll();
if(data != null) {
System.out.println(Thread.currentThread().getName() + ": " + data);
}
}
}
}
When the Queue
is empty the Threads
will exit and the program will end. 当
Queue
为空时, Threads
将退出,程序将结束。
As mentioned in the other answers, with a requirement like this you should initially look at keeping your websites in one of Java's concurrent abstract data types from the java.util.concurrent
package, rather than in a standard list. 正如在其他答案中提到的那样,对于这样的要求,您应该首先考虑将网站保留在
java.util.concurrent
包中Java并发抽象数据类型之一java.util.concurrent
,而不是在标准列表中。 The BlockingQueue's drainTo method sounds like exactly what you're looking for given that you want threads to be able to take a bunch of sites at a time. 如果您希望线程能够一次占据一堆站点,那么BlockingQueue的排水到方法听起来完全像您要查找的内容。
You can use LinkedBlockingQueue
, put all the websiteList into this queue and share this queue among each thread. 您可以使用
LinkedBlockingQueue
,将所有websiteList放入此队列,并在每个线程之间共享此队列。 Now all threads will poll on this queue which is a blocking operation which makes sure one element is queue is fetched by only one thread. 现在,所有线程都将在此队列上轮询,这是一项阻塞操作,可确保仅一个线程提取一个元素为队列。
something like: 就像是:
String site;
while((site=queue.poll(timeout, TimeUnit.SECONDS))!=null)
{
//process site
}
You could try a DoubleBufferedList. 您可以尝试使用DoubleBufferedList。 This allows you to add lists and entries to the list from multiple threads and take lists from it using multiple threads in a completely lock-free fashion.
这使您可以从多个线程向列表中添加列表和条目,并以完全无锁的方式使用多个线程从列表中获取列表。
public class DoubleBufferedList<T> {
// Atomic reference so I can atomically swap it through.
// Mark = true means I am adding to it so momentarily unavailable for iteration.
private AtomicMarkableReference<List<T>> list = new AtomicMarkableReference<>(newList(), false);
// Factory method to create a new list - may be best to abstract this.
protected List<T> newList() {
return new ArrayList<>();
}
// Get and replace the current list.
public List<T> get() {
// Atomically grab and replace the list with an empty one.
List<T> empty = newList();
List<T> it;
// Replace an unmarked list with an empty one.
if (!list.compareAndSet(it = list.getReference(), empty, false, false)) {
// Failed to replace!
// It is probably marked as being appended to but may have been replaced by another thread.
// Return empty and come back again soon.
return Collections.<T>emptyList();
}
// Successfull replaced an unmarked list with an empty list!
return it;
}
// Grab and lock the list in preparation for append.
private List<T> grab() {
List<T> it;
// We cannot fail so spin on get and mark.
while (!list.compareAndSet(it = list.getReference(), it, false, true)) {
// Spin on mark - waiting for another grabber to release (which it must).
}
return it;
}
// Release the list.
private void release(List<T> it) {
// Unmark it - should this be a compareAndSet(it, it, true, false)?
if (!list.attemptMark(it, false)) {
// Should never fail because once marked it will not be replaced.
throw new IllegalMonitorStateException("It changed while we were adding to it!");
}
}
// Add an entry to the list.
public void add(T entry) {
List<T> it = grab();
try {
// Successfully marked! Add my new entry.
it.add(entry);
} finally {
// Always release after a grab.
release(it);
}
}
// Add many entries to the list.
public void add(List<T> entries) {
List<T> it = grab();
try {
// Successfully marked! Add my new entries.
it.addAll(entries);
} finally {
// Always release after a grab.
release(it);
}
}
// Add a number of entries.
@SafeVarargs
public final void add(T... entries) {
// Make a list of them.
add(Arrays.<T>asList(entries));
}
}
I suggest one of these 3 solutions : 我建议这三种解决方案之一 :
Keep it simple 把事情简单化
synchronized(list) {
// get and remove 5 websites from the list
}
If you can change the list type, you may use 如果您可以更改列表类型,则可以使用
BlockingQueue
If you can't change the list type, you may use 如果您无法更改列表类型,则可以使用
Collections.synchronizedList(list)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.