简体   繁体   中英

Multi-threading on a shared List

I have a scenario where there will be a list containing websites and the code block to crawl those websites. Is it possible to implement a multi-thread way so that each thread will take 5 or more websites from the list and crawl independently and make sure they donot take the same website which was collected by another thread.

List <String> websiteList;

//crawling code block here

You could use a BlockingQueue which could be shared by all interested consumers, for example (note, error handling skipped for clarity):

public static void main(String[] args) throws Exception {
    // for test purposes add 10 integers
    final BlockingQueue<Integer> queue = new LinkedBlockingDeque<Integer>();
    for (int i = 0; i < 10; i++) {
        queue.add(i);    // 
    }

    new Thread(new MyRunnable(queue)).start();
    new Thread(new MyRunnable(queue)).start();
    new Thread(new MyRunnable(queue)).start();

}

static class MyRunnable implements Runnable {
    private Queue<Integer> queue;

    MyRunnable(Queue<Integer> queue) {
        this.queue = queue;
    }

    @Override
    public void run() {
        while(!queue.isEmpty()) {
            Integer data = queue.poll();
            if(data != null) {
                System.out.println(Thread.currentThread().getName() + ": " + data);
            }
        }
    }
}

When the Queue is empty the Threads will exit and the program will end.

As mentioned in the other answers, with a requirement like this you should initially look at keeping your websites in one of Java's concurrent abstract data types from the java.util.concurrent package, rather than in a standard list. The BlockingQueue's drainTo method sounds like exactly what you're looking for given that you want threads to be able to take a bunch of sites at a time.

You can use LinkedBlockingQueue , put all the websiteList into this queue and share this queue among each thread. Now all threads will poll on this queue which is a blocking operation which makes sure one element is queue is fetched by only one thread.

something like:

String site;
while((site=queue.poll(timeout, TimeUnit.SECONDS))!=null)
{
//process site
}

You could try a DoubleBufferedList. This allows you to add lists and entries to the list from multiple threads and take lists from it using multiple threads in a completely lock-free fashion.

public class DoubleBufferedList<T> {
  // Atomic reference so I can atomically swap it through.
  // Mark = true means I am adding to it so momentarily unavailable for iteration.
  private AtomicMarkableReference<List<T>> list = new AtomicMarkableReference<>(newList(), false);

  // Factory method to create a new list - may be best to abstract this.
  protected List<T> newList() {
    return new ArrayList<>();
  }

  // Get and replace the current list.
  public List<T> get() {
    // Atomically grab and replace the list with an empty one.
    List<T> empty = newList();
    List<T> it;
    // Replace an unmarked list with an empty one.
    if (!list.compareAndSet(it = list.getReference(), empty, false, false)) {
      // Failed to replace! 
      // It is probably marked as being appended to but may have been replaced by another thread.
      // Return empty and come back again soon.
      return Collections.<T>emptyList();
    }
    // Successfull replaced an unmarked list with an empty list!
    return it;
  }

  // Grab and lock the list in preparation for append.
  private List<T> grab() {
    List<T> it;
    // We cannot fail so spin on get and mark.
    while (!list.compareAndSet(it = list.getReference(), it, false, true)) {
      // Spin on mark - waiting for another grabber to release (which it must).
    }
    return it;
  }

  // Release the list.
  private void release(List<T> it) {
    // Unmark it - should this be a compareAndSet(it, it, true, false)?
    if (!list.attemptMark(it, false)) {
      // Should never fail because once marked it will not be replaced.
      throw new IllegalMonitorStateException("It changed while we were adding to it!");
    }
  }

  // Add an entry to the list.
  public void add(T entry) {
    List<T> it = grab();
    try {
      // Successfully marked! Add my new entry.
      it.add(entry);
    } finally {
      // Always release after a grab.
      release(it);
    }
  }

  // Add many entries to the list.
  public void add(List<T> entries) {
    List<T> it = grab();
    try {
      // Successfully marked! Add my new entries.
      it.addAll(entries);
    } finally {
      // Always release after a grab.
      release(it);
    }
  }

  // Add a number of entries.
  @SafeVarargs
  public final void add(T... entries) {
    // Make a list of them.
    add(Arrays.<T>asList(entries));
  }
}

I suggest one of these 3 solutions :

Keep it simple

synchronized(list) {
    // get and remove 5 websites from the list
}

If you can change the list type, you may use

BlockingQueue

If you can't change the list type, you may use

Collections.synchronizedList(list)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM