如何使用Lucene在句子中搜索关键字

Question

I receive vendor name and their address as string which I index in the Lucene like this, , , , . 我收到供应商名称及其地址作为字符串，我在Lucene中将其索引为，，，。 Now, from another servlet I receive text which has vendor names and the address. 现在，从另一个servlet中，我收到包含供应商名称和地址的文本。 Example, "I have problem in using the credit card, xxxxx, in the shop , ", or, "my credit card is declined in the shop , ". 例如，“我在商店中使用信用卡xxxxx时遇到问题”，或“我的商店中的信用卡被拒绝”。 I remove stop words like I, the, in, problem, shop, etc and have a clean text with " ". 我删除了诸如“我”，“在”，“在”，“商店”等中的停用词，并使用“”显示了清晰的文字。 I need to find all those vendor(s) from the vendor_name or area inside the given text. 我需要从vendor_name或给定文本内的区域中找到所有那些卖方。

This is how I index the vendor details; 这就是我索引供应商详细信息的方式。 every line in the file is a vendor and their details separated by comma. 文件中的每一行都是一个供应商，其详细信息用逗号分隔。 , , , 、、、

FieldType keywordFieldType = new FieldType();
        keywordFieldType.setStored(true);
        keywordFieldType.setIndexed(true);
        keywordFieldType.setTokenized(false);
        writer = new IndexWriter(dir, iwc);
        BufferedReader reader = new BufferedReader(new FileReader(
                VENDOR_DETAILS));
        String line = reader.readLine();
        while (line != null) {
            Document document = new Document();
            document.add(new Field("content", line.toLowerCase(),
                    keywordFieldType));
            writer.addDocument(document);
            line = reader.readLine();
        }
        writer.commit();

This is how I search the index, 这就是我搜索索引的方式

QueryParser queryParser = new QueryParser(VERSION, "content",
            new WhitespaceAnalyzer(VERSION));

    String special = "content:" + stringToQuery.trim();
    try {
        if (searcherManager == null) {
            searcherManager = new SearcherManager(
                    FSDirectory.open(new File(INDEX_DIRECTORY)),
                    new SearcherFactory());
        }
        searcher = searcherManager.acquire();
        TopDocs docs = searcher.search(queryParser.parse(special), 100);
        int hitCount = docs.totalHits;

How do I query the Lucene to search the above requirement? 如何查询Lucene来搜索以上要求？ What type of Query should I use to find the vendor details inside the given text? 我应该使用哪种类型的查询来查找给定文本内的供应商详细信息？

Answer 1

You are adding you documents without tokenization, but are tokenizing the query, so you have a mismatch in your analysis at query vs index time. 您要添加的文档没有标记化，但是正在标记化查询，因此查询和索引时的分析不匹配。 Since the field appears to be free text, tokenizing it is important to effective searching. 由于该字段似乎是自由文本，因此标记化标记对于有效搜索很重要。 Rather than specifying a FieldType at all, I would recommend just using TextField . 我不建议完全指定FieldType，而建议仅使用TextField 。 You can use WhitespaceTokenizer on both, as you do at query time, but I would consider StandardAnalyzer as a better starting point. 您可以像在查询时一样在两者上同时使用WhitespaceTokenizer ，但是我认为StandardAnalyzer是更好的起点。

如何使用Lucene在句子中搜索关键字

问题描述

1 个解决方案

解决方案1
0 2014-03-26 15:45:36

如何使用Lucene在句子中搜索关键字

问题描述

1 个解决方案

解决方案1 0 2014-03-26 15:45:36

解决方案1
0 2014-03-26 15:45:36