SolrJ - 使用ContentStreamUpdateRequest異步索引文檔

Question

我正在使用SolrJ API 4.8將富文檔索引到solr。 但我想異步索引這些文檔。 我做的功能同步發送文件，但我不知道如何更改它以使其異步。 任何的想法？

功能：

public Boolean indexDocument(HttpSolrServer server, String PathFile, InputReader external)
{  

        ContentStreamUpdateRequest up = new ContentStreamUpdateRequest("/update/extract");

        try {
                up.addFile(new File(PathFile), "text");
        } catch (IOException e) {
                Logger.getLogger(ANOIndexer.class.getName()).log(Level.SEVERE, null, e);
                return false;
        }

        up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);

        try {
                server.request(up);
        } catch (SolrServerException e) {
                Logger.getLogger(ANOIndexer.class.getName()).log(Level.SEVERE, null, e);
                return false;

        } catch (IOException e) {
                Logger.getLogger(ANOIndexer.class.getName()).log(Level.SEVERE, null, e);
                return false;   
        }
        return true;
}

Solr服務器：版本4.8

Answer 1

聽起來您可能希望使用ExecutorService和FutureTask來執行此操作：

private static HttpSolrServer server;
private static int threadPoolSize = 4;  //Set this to something appropiate for your environment

public static void main(String[] args) {
    ExecutorService executor = Executors.newFixedThreadPool(threadPoolSize);
    ArrayList<FutureTask<Boolean>> taskList = new ArrayList<FutureTask<Boolean>>();
    ArrayList<String> paths = new ArrayList<String>();
    //Initialize your list of paths here

    for (String path : paths) {
        FutureTask<Boolean> futureTask = new FutureTask<Boolean>(new IndexDocumentTask(path));
        taskList.add(futureTask);
        executor.execute(futureTask);
    }

    for (int i = 0; i < taskList.size(); i++) {
        FutureTask<Boolean> futureTask = taskList.get(i);

        try {
            System.out.println("Index Task " + i + (futureTask.get() ? " finished successfully." : " encountered an error."));
        } catch (ExecutionException e) {
            System.out.println("An Execution Exception occurred with Index Task " + i);
        } catch (InterruptedException e) {
            System.out.println("An Interrupted Exception occurred with Index Task " + i);
        }
    }

    executor.shutdown();
}

static class IndexDocumentTask implements Callable<Boolean> {

    private String pathFile;

    public IndexDocumentTask(String pathFile) {
        this.pathFile = pathFile;
    }

    @Override
    public Boolean call() {
        return indexDocument(pathFile);
    }

    public Boolean indexDocument(String pathFile) {
        ContentStreamUpdateRequest up = new ContentStreamUpdateRequest("/update/extract");

        try {
            up.addFile(new File(pathFile), "text");
        } catch (IOException e) {
            Logger.getLogger(ANOIndexer.class.getName()).log(Level.SEVERE, null, e);
            return false;
        }

        up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);

        try {
            server.request(up);
        } catch (SolrServerException e) {
            Logger.getLogger(ANOIndexer.class.getName()).log(Level.SEVERE, null, e);
            return false;

        } catch (IOException e) {
            Logger.getLogger(ANOIndexer.class.getName()).log(Level.SEVERE, null, e);
            return false;
        }
        return true;
    }
}

這是未經測試的代碼，所以我不確定調用server.request(up)是否是線程安全的。 我認為只使用一個HttpSolrServer實例更簡潔，但您也可以在每個任務中創建新的HttpSolrServer實例。

如果您願意，可以擴充IndexDocumentTask以實現Callable<Tuple<String, Boolean>> ，以便您可以檢索要編制索引的文檔的文件名，以及索引是否成功。

即使我不認為一次向Solr服務器發送多個請求應該是一個問題，您可能想要限制您的請求，以免過載Solr服務器。

SolrJ - 使用ContentStreamUpdateRequest異步索引文檔

問題描述

1 個解決方案

解決方案1
1 已采納 2014-11-11 16:28:26

SolrJ - 使用ContentStreamUpdateRequest異步索引文檔

問題描述

1 個解決方案

解決方案1 1 已采納 2014-11-11 16:28:26

解決方案1
1 已采納 2014-11-11 16:28:26