[英]SolrJ - Asynchronously indexing documents with ContentStreamUpdateRequest
我正在使用SolrJ API 4.8將富文檔索引到solr。 但我想異步索引這些文檔。 我做的功能同步發送文件,但我不知道如何更改它以使其異步。 任何的想法?
功能:
public Boolean indexDocument(HttpSolrServer server, String PathFile, InputReader external)
{
ContentStreamUpdateRequest up = new ContentStreamUpdateRequest("/update/extract");
try {
up.addFile(new File(PathFile), "text");
} catch (IOException e) {
Logger.getLogger(ANOIndexer.class.getName()).log(Level.SEVERE, null, e);
return false;
}
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
try {
server.request(up);
} catch (SolrServerException e) {
Logger.getLogger(ANOIndexer.class.getName()).log(Level.SEVERE, null, e);
return false;
} catch (IOException e) {
Logger.getLogger(ANOIndexer.class.getName()).log(Level.SEVERE, null, e);
return false;
}
return true;
}
Solr服務器:版本4.8
聽起來您可能希望使用ExecutorService和FutureTask來執行此操作:
private static HttpSolrServer server;
private static int threadPoolSize = 4; //Set this to something appropiate for your environment
public static void main(String[] args) {
ExecutorService executor = Executors.newFixedThreadPool(threadPoolSize);
ArrayList<FutureTask<Boolean>> taskList = new ArrayList<FutureTask<Boolean>>();
ArrayList<String> paths = new ArrayList<String>();
//Initialize your list of paths here
for (String path : paths) {
FutureTask<Boolean> futureTask = new FutureTask<Boolean>(new IndexDocumentTask(path));
taskList.add(futureTask);
executor.execute(futureTask);
}
for (int i = 0; i < taskList.size(); i++) {
FutureTask<Boolean> futureTask = taskList.get(i);
try {
System.out.println("Index Task " + i + (futureTask.get() ? " finished successfully." : " encountered an error."));
} catch (ExecutionException e) {
System.out.println("An Execution Exception occurred with Index Task " + i);
} catch (InterruptedException e) {
System.out.println("An Interrupted Exception occurred with Index Task " + i);
}
}
executor.shutdown();
}
static class IndexDocumentTask implements Callable<Boolean> {
private String pathFile;
public IndexDocumentTask(String pathFile) {
this.pathFile = pathFile;
}
@Override
public Boolean call() {
return indexDocument(pathFile);
}
public Boolean indexDocument(String pathFile) {
ContentStreamUpdateRequest up = new ContentStreamUpdateRequest("/update/extract");
try {
up.addFile(new File(pathFile), "text");
} catch (IOException e) {
Logger.getLogger(ANOIndexer.class.getName()).log(Level.SEVERE, null, e);
return false;
}
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
try {
server.request(up);
} catch (SolrServerException e) {
Logger.getLogger(ANOIndexer.class.getName()).log(Level.SEVERE, null, e);
return false;
} catch (IOException e) {
Logger.getLogger(ANOIndexer.class.getName()).log(Level.SEVERE, null, e);
return false;
}
return true;
}
}
這是未經測試的代碼,所以我不確定調用server.request(up)
是否是線程安全的。 我認為只使用一個HttpSolrServer實例更簡潔,但您也可以在每個任務中創建新的HttpSolrServer實例。
如果您願意,可以擴充IndexDocumentTask以實現Callable<Tuple<String, Boolean>>
,以便您可以檢索要編制索引的文檔的文件名,以及索引是否成功。
即使我不認為一次向Solr服務器發送多個請求應該是一個問題,您可能想要限制您的請求,以免過載Solr服務器。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.