简体   繁体   English

Java中的同步多线程(Apache HTTPClient)

[英]Synchronous multithreading in Java (Apache HTTPClient)

I am wondering how I would go about doing this. 我想知道我将如何做到这一点。 Say I load a list of 1,000 words and for each word a thread is created and say it does a google search on each word. 假设我加载了一个包含1,000个单词的列表,并为每个单词创建了一个主题并说它在每个单词上进行谷歌搜索。 The problem here is obvious. 这里的问题很明显。 I can't have 1k threads, can I. Keep in mind I am extremely new to threads and synchronization. 我不能拥有1k线程,可以。请记住,我对线程和同步非常新。 So basically I am wondering how I would go about using less threads. 所以基本上我想知道如何使用更少的线程。 I assume I have to set thread amount to a fixed number and synchronize the threads. 我假设我必须将线程数量设置为固定数字并同步线程。 Was wondering how to do this with Apache HttpClient using GetThread and then run it. 想知道如何使用GetThread使用Apache HttpClient执行此操作然后运行它。 In run I'm getting the data from webpage and turning it into a String and then checking if it contains a certain word. 在运行中,我从网页获取数据并将其转换为字符串,然后检查它是否包含某个单词。

Surely you can have as many threads as you want. 当然,您可以拥有任意数量的线程。 But in general it is not recommended to use more threads than there are processing cores on your computer. 但一般情况下,建议不要使用比计算机上的处理核心更多的线程。 And don't forget that creating 1000 internet sessions at once affects your networking. 不要忘记,一次创建1000个互联网会话会影响您的网络。 A size of one single google page is nearly 0.3 megabytes. 一个谷歌页面的大小接近0.3兆字节。 Are you really going to download 300 megabytes of data at once? 你真的要一次下载300兆字节的数据吗?

By the way, 顺便说说,

There is a funny thing about concurrency. 关于并发性有一个有趣的事情。 Some people say: "synchronization is like concurrency". 有人说:“同步就像并发”。 It is not true. 这不是真的。 Synchronization is the opposite of concurrency. 同步与并发相反。 Concurrency is when lots of things happen in parallel. 并发是许多事情并行发生的时候。 Synchronization is when I am blocking you. 同步是我阻止你的时候。 (Joshua Bloch) (约书亚布洛赫)

Maybe you can look at this problem this way. 也许你可以这样看待这个问题。

You have 1000 words and for each word you are going to carry out a search. 你有1000个单词,每个单词你将进行搜索。 In other words there are 1000 tasks to be executed and they are not related to each other, so there is no need for synchronization in the case of this problem as per the following definition from Wiki. 换句话说,有1000个任务要执行,并且它们彼此不相关,因此根据Wiki的以下定义,在此问题的情况下不需要同步。

"In computer science, synchronization refers to one of two distinct but related concepts: synchronization of processes, and synchronization of data. Process synchronization refers to the idea that multiple processes are to join up or handshake at a certain point, in order to reach an agreement or commit to a certain sequence of action. Data Synchronization refers to the idea of keeping multiple copies of a dataset in coherence with one another, or to maintain data integrity" “在计算机科学中,同步是指两个截然不同但相关的概念之一:进程的同步和数据的同步。进程同步是指多个进程在某个点加入或握手的想法,以便达到协议或承诺某一行动序列。数据同步是指保持数据集的多个副本彼此一致,或保持数据完整性的想法“

So in this problem you do not have to synchronize the 1000 processes which execute the word searches since they can run independently and dont need to join forces. 因此,在这个问题中,您不必同步执行单词搜索的1000个进程,因为它们可以独立运行而不需要联接。 So it is not a Process synchronization. 所以它不是进程同步。

It is not a Data synchronization either since the data of each search is independent of the other 999 searches. 它不是数据同步,因为每次搜索的数据独立于其他999次搜索。

Hence when Joshua says Synchronization is when I am blocking you, there is no need of blocking in this case. 因此,当约书亚说同步是我阻止你时,在这种情况下不需要阻止。

Yes all tasks can concurrently get executed in different threads. 是的,所有任务可以同时在不同的线程中执行。 Of course your system may not have the resources to run 1000 threads concurrently ( read same time ). 当然,您的系统可能没有资源同时运行1000个线程(同时读取)。 So you need concepts like pools where a pool has a certain no of threads...say if it has 10 threads...then those 10 will start 10 independent searches on 10 words from your list. 所以你需要池这样的概念,其中池有一定的线程没有...比如它有10个线程......那么这10个将从列表中的10个单词开始10次独立搜索。 If any of them is done with its task then it will take up the next word search task available and the process goes on.... 如果它们中的任何一个完成了它的任务,那么它将占用下一个可用的单词搜索任务,并且该过程继续进行....

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM