简体   繁体   English

我将如何使用Lucene为客户建立索引

[英]How will I go about indexing a customer using Lucene

I have a web application which stores customers usernames, emails and phone numbers. 我有一个Web应用程序,用于存储客户的用户名,电子邮件和电话号码。 I want customers to search for other users using email, phone or username for a start just to understand the whole lucene concept. 我希望客户首先使用电子邮件,电话或用户名搜索其他用户,以了解整个lucene概念。 then later on i will add functionality to search within a user an item he posts. 然后稍后我将添加功能以在用户中搜索他发布的项目。 I am following this example on www.lucenetutorial.com/lucene-in-5-minutes.html 我在www.lucenetutorial.com/lucene-in-5-minutes.html上关注此示例

public class HelloLucene {
  public static void main(String[] args) throws IOException, ParseException {
    // 0. Specify the analyzer for tokenizing text.
    //    The same analyzer should be used for indexing and searching
    StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_40);

    // 1. create the index
    Directory index = new RAMDirectory();

    IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_40, analyzer);

    IndexWriter w = new IndexWriter(index, config);
    addDoc(w, "Lucene in Action", "193398817");
    addDoc(w, "Lucene for Dummies", "55320055Z");
    addDoc(w, "Managing Gigabytes", "55063554A");
    addDoc(w, "The Art of Computer Science", "9900333X");
    w.close();

    // 2. query
    String querystr = args.length > 0 ? args[0] : "lucene";

    // the "title" arg specifies the default field to use
    // when no field is explicitly specified in the query.
    Query q = new QueryParser(Version.LUCENE_40, "title", analyzer).parse(querystr);

    // 3. search
    int hitsPerPage = 10;
    IndexReader reader = DirectoryReader.open(index);
    IndexSearcher searcher = new IndexSearcher(reader);
    TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
    searcher.search(q, collector);
    ScoreDoc[] hits = collector.topDocs().scoreDocs;

    // 4. display results
    System.out.println("Found " + hits.length + " hits.");
    for(int i=0;i<hits.length;++i) {
      int docId = hits[i].doc;
      Document d = searcher.doc(docId);
      System.out.println((i + 1) + ". " + d.get("isbn") + "\t" + d.get("title"));
    }

    // reader can only be closed when there
    // is no need to access the documents any more.
    reader.close();
  }

  private static void addDoc(IndexWriter w, String title, String isbn) throws IOException {
    Document doc = new Document();
    doc.add(new TextField("title", title, Field.Store.YES));

    // use a string field for isbn because we don't want it tokenized
    doc.add(new StringField("isbn", isbn, Field.Store.YES));
    w.addDocument(doc);
  }
}

I want new customers to be added to index automatically on registration. 我希望新客户在注册时自动添加到索引中。 customerId is timestamp. customerId是时间戳。 so should i add a new document for each field on the customers details or should i concatenate all fields into a string and add as a single document? 所以我应该为客户详细信息中的每个字段添加一个新文档,还是应该将所有字段都连接成字符串并添加为单个文档? Please go easy on me I am really new. 请放心,我真的很新。

This is a good place to start with Lucene indexing mechanism http://www.ibm.com/developerworks/library/wa-lucene/ 这是从Lucene索引机制开始的好地方http://www.ibm.com/developerworks/library/wa-lucene/

In the bottom line when lucene index the document, it first converts it into lucene document form. 在lucene索引文档的最底行,它首先将其转换为lucene文档形式。 This lucene document comprises of set of fields and each field is a set of terms. 该lucene文档包括一组字段,每个字段都是一组术语。 Term are nothing but stream of bytes. 术语不过是字节流。

The document which is to be index to pass to analyzer which forms these terms out of it, and these terms keywords which are match during searching process. 要作为索引传递给分析器的文档,该分析器将根据这些文档形成这些术语以及在搜索过程中匹配的这些术语关键字。

When we perform a search process the query is analyzed through the same analyzer and then is match against the terms. 当我们执行搜索过程时,查询将通过相同的分析器进行分析,然后与条款进行匹配。 So you dont have to create a document for each field, rather you should create a single document for each user. 因此,您不必为每个字段创建一个文档,而应该为每个用户创建一个文档。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM