简体   繁体   中英

nhibernate.search / lucene.net multi-lingual analyser

I am trying to integrate NHibernate.Search into a multi-lingual website. Now, this website contains a class Article which is multilingual. This is done by having a seperate class - Article_CultureInfo which stores the language-specific content. Fields of Article are

Article
-------
ID
Name

And Article_CultureInfo are:

Article_CultureInfo
-------
ID
ArticleId
CultureCode
PageTitle
Content

I am using Nhibernate.Search.Mapping to map out the field/document information. I would like to incorporate search features like stemming and synonym analysis where possible based on the language. Is there any way the Lucene Analyser can be specified at run-time, not compile time / initialisation?

Say we are analysing the content of PageTitle which is to be stored in the respective Lucene index - This content can be English, French, Italian, etc based on the value of CultureCode . Thus, the analyser should change based on this value. I have tried implementing a custom MultilingualAnalyser , however the only data available to me are the string to be analysed, ie the value of PageTitle . From that only, I cannot deduce the language. (I could look into language detection techniques but that is out of the scope since I already know specifically what it is, and would be overkill and not 100% reliable.)

If I were to have apart from the tokens, an instance of the object, I could be able to get the CultureCode value out of it, and analyse accordingly. Any ideas would be greatly appreciated - I really wish to avoid using Lucene.Net directly since NHibernate.Search looks to integrate very nicely.

Thanks!

I've basically done a work-around for this method - Quite an overkill but works.

I've created a new implementation of IGetter , which is used for multilingual properties, which I called MultilingualGetter . This is basically the same as the BasicGetter - I couldn't extend from it as for some reason it is sealed , so I copied the code.

What this IGetter does is: When the Get() method is called on it, it is given the target object. This is the instance of the class that contains the property. I check that it implements an interface for multilingual objects which I've created, IMultilingualContentInfo . It then retrieves the current culture from the IMultilingualContentInfo , and appends it on the front of the actual text, eg [en]Hello World!.

This text is then passed on to a custom analyzer I created which parses the culture as well, and can deduce what it is. It is then using a SnowballFilter to stem the text based on the language.

Below is the code for Get() method of the custom IGetter implementation - IMultilingualContentInfo

    /// <summary>
    /// Gets the value of the Property from the object.
    /// </summary>
    /// <param name="target">The object to get the Property value from.</param>
    /// <returns>
    /// The value of the Property for the target.
    /// </returns>
    public object Get(object target)
    {

        if (target is IMultilingualContentInfo)
        {
            try
            {
                IMultilingualContentInfo multiLingualTarget = (IMultilingualContentInfo)target;
                string s = (string)property.GetValue(target, new object[0]);
                if (!string.IsNullOrWhiteSpace(s))
                {
                    MultilingualLuceneTextContent mlText = new MultilingualLuceneTextContent();
                    mlText.Culture = multiLingualTarget.CultureInfo.GetCultureCode();
                    s = mlText.GetTextIncCulture();

                }
                return s;
            }
            catch (Exception e)
            {
                throw new PropertyAccessException(e, "Exception occurred", false, clazz, propertyName);
            }
        }
        else
        {
            throw new InvalidOperationException("Multilingual Getter is only available on IMultilingualContentInfo objects");
        }

    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM