简体繁体中英

Java lucene standard analyzer`s default delimiters?

原文 2011-06-03 07:01:23 3 1 java/ lucene/ delimiter

i am looking for all the delimiters on which java lucene standard analyzer tokenizes the input string.

need to know all delimiters that are by default used for tokenizing.

1 answers

I know (from Lucene in Action) that all characters which are not a-zA-Z or variatons of a-zA-Z that have diacritics are used as delimiters, including numbers.
So you might have Mc'Donald splitted in "Mc" "Donald", you might have "Web2.0" tokenized as "Web", and so on.
The best is to do a test and enter all kinds of characters and then post your results here.

Scrub Lucene search terms with the Standard Analyzer

Remove Space Character from Lucene Standard Analyzer

java lucene custom analyzer and tokenizer creating problem in termvector offsets?

Lucene Analyzer for Indexing and Searching

Lucene custom analyzer

Extending Lucene Analyzer

Creating the Lucene Analyzer object

How to test a Lucene Analyzer?

Custom Analyzer in Lucene 8.5

Arabic analyzer Lucene

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Scrub Lucene search terms with the Standard Analyzer Remove Space Character from Lucene Standard Analyzer java lucene custom analyzer and tokenizer creating problem in termvector offsets? Lucene Analyzer for Indexing and Searching Lucene custom analyzer Extending Lucene Analyzer Creating the Lucene Analyzer object How to test a Lucene Analyzer? Custom Analyzer in Lucene 8.5 Arabic analyzer Lucene

Related Tags

Java lucene standard analyzer`s default delimiters?

Question

1 answers

solution1 0 ACCPTED 2011-06-03 07:55:36

solution1
0 ACCPTED 2011-06-03 07:55:36