简体   繁体   中英

Azure Search Documents Add Custom Analyzers, Tokenizers and TokenFilters

I'm migrating Azure Search sdk from Microsoft.Azure.Search (v10) to Azure.Search.Documents (v11).

Before, with the v10, we had the ability to create indexes with Custom Analyzers, Tokenizers... using the C# SDK, like the following :

var index = new Microsoft.Azure.Search.Models.Index(
                name: GetIndexName(),
                defaultScoringProfile: defaultScoringProfile,
                fields: AzureQuestionItemDefinition.GetQuestionItemFieldsDefinition(),
                analyzers: new[] {
                    new CustomAnalyzer
                    {
                        Name = "standardAnalyzer",
                        Tokenizer = TokenizerName.Standard,
                        TokenFilters = new[]
                        {
                            TokenFilterName.Lowercase,
                            TokenFilterName.AsciiFolding,
                            TokenFilterName.Phonetic,
                        }
                    },
                    new CustomAnalyzer
                    {
                        Name = "prefixAnalyzer",
                        Tokenizer = TokenizerName.Standard,
                        TokenFilters = new[]
                        {
                            TokenFilterName.Lowercase,
                            TokenFilterName.AsciiFolding,
                            TokenFilterName.Phonetic,
                            "edgeNgramTokenFilter"
                        }
                    },
                },
                tokenFilters: new[]
                {
                    new EdgeNGramTokenFilterV2("edgeNgramTokenFilter", minGram: 2, maxGram: 10, EdgeNGramTokenFilterSide.Front),
                },
                scoringProfiles: new[]
                {
                    new ScoringProfile(defaultScoringProfile)
                    {
                        TextWeights = new TextWeights()
                        {
                            Weights = new Dictionary<string, double>() {
                                { nameof(QuestionItem.Text), 5.0 },
                                { nameof(QuestionItem.Context), 5.0 },
                                { $"{nameof(QuestionItem.Asker)}/{nameof(QuestionItem.Asker.Name)}", 3.0 },
                                { $"{nameof(QuestionItem.Answers)}/{nameof(AnswerItem.Text)}", 2.0 },
                                { $"{nameof(QuestionItem.Answers)}/{nameof(AnswerItem.AnswererName)}", 2.0 }
                            }
                        }
                    }
                }

While migrating to the new Azure.Search.Documents v11, I couldn't find a way to create my index like so using the C# SDK.

I found that the SearchIndex attributes are readonly :

//
    // Summary:
    //     Represents a search index definition, which describes the fields and search behavior
    //     of an index.
    public class SearchIndex : IUtf8JsonSerializable
    {
        //
        // Summary:
        //     Initializes a new instance of the Azure.Search.Documents.Indexes.Models.SearchIndex
        //     class.
        //
        // Parameters:
        //   name:
        //     The name of the index.
        //
        // Exceptions:
        //   T:System.ArgumentException:
        //     name is an empty string.
        //
        //   T:System.ArgumentNullException:
        //     name is null.
        public SearchIndex(string name);
        //
        // Summary:
        //     Initializes a new instance of the Azure.Search.Documents.Indexes.Models.SearchIndex
        //     class.
        //
        // Parameters:
        //   name:
        //     The name of the index.
        //
        //   fields:
        //     Fields to add to the index.
        //
        // Exceptions:
        //   T:System.ArgumentException:
        //     name is an empty string.
        //
        //   T:System.ArgumentNullException:
        //     name or fields is null.
        public SearchIndex(string name, IEnumerable<SearchField> fields);

        //
        // Summary:
        //     The name of the scoring profile to use if none is specified in the query. If
        //     this property is not set and no scoring profile is specified in the query, then
        //     default scoring (tf-idf) will be used.
        public string DefaultScoringProfile { get; set; }
        //
        // Summary:
        //     Options to control Cross-Origin Resource Sharing (CORS) for the index.
        public CorsOptions CorsOptions { get; set; }
        //
        // Summary:
        //     A description of an encryption key that you create in Azure Key Vault. This key
        //     is used to provide an additional level of encryption-at-rest for your data when
        //     you want full assurance that no one, not even Microsoft, can decrypt your data
        //     in Azure Cognitive Search. Once you have encrypted your data, it will always
        //     remain encrypted. Azure Cognitive Search will ignore attempts to set this property
        //     to null. You can change this property as needed if you want to rotate your encryption
        //     key; Your data will be unaffected. Encryption with customer-managed keys is not
        //     available for free search services, and is only available for paid services created
        //     on or after January 1, 2019.
        public SearchResourceEncryptionKey EncryptionKey { get; set; }
        //
        // Summary:
        //     The type of similarity algorithm to be used when scoring and ranking the documents
        //     matching a search query. The similarity algorithm can only be defined at index
        //     creation time and cannot be modified on existing indexes. If null, the ClassicSimilarity
        //     algorithm is used.
        public SimilarityAlgorithm Similarity { get; set; }
        //
        // Summary:
        //     Gets the name of the index.
        [CodeGenMemberAttribute("name")]
        public string Name { get; }
        //
        // Summary:
        //     Gets the analyzers for the index.
        public IList<LexicalAnalyzer> Analyzers { get; }
        //
        // Summary:
        //     Gets the character filters for the index.
        public IList<CharFilter> CharFilters { get; }
        //
        // Summary:
        //     Gets or sets the fields in the index. Use Azure.Search.Documents.Indexes.FieldBuilder
        //     to define fields based on a model class, or Azure.Search.Documents.Indexes.Models.SimpleField,
        //     Azure.Search.Documents.Indexes.Models.SearchableField, and Azure.Search.Documents.Indexes.Models.ComplexField
        //     to manually define fields. Index fields have many constraints that are not validated
        //     with Azure.Search.Documents.Indexes.Models.SearchField until the index is created
        //     on the server.
        public IList<SearchField> Fields { get; set; }
        //
        // Summary:
        //     Gets the scoring profiles for the index.
        public IList<ScoringProfile> ScoringProfiles { get; }
        //
        // Summary:
        //     Gets the suggesters for the index.
        public IList<SearchSuggester> Suggesters { get; }
        //
        // Summary:
        //     Gets the token filters for the index.
        public IList<TokenFilter> TokenFilters { get; }
        //
        // Summary:
        //     Gets the tokenizers for the index.
        public IList<LexicalTokenizer> Tokenizers { get; }
        //
        // Summary:
        //     The Azure.ETag of the Azure.Search.Documents.Indexes.Models.SearchIndex.
        public ETag? ETag { get; set; }
    }

My question is how to set a custom Tokenizers, TokenFilters, ScoringProfiles...

Collection properties are initialized by default in the new Azure .NET client libraries. Although you can't set the properties, you can still call Add on each one:

var index = new SearchIndex("myindex");
index.ScoringProfiles.Add(new ScoringProfile(...));

I personally find this less convenient since I like to write expression-based code, so I've already passed along this feedback to the Azure SDK team.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM