简体   繁体   中英

Adding custom stopwords to IBM Watson Discovery

I'm trying to add a custom stopwords on a colletion of Watson Discovery, but I only get the error 500 "Error when creating 'stopwords'.". Same on both web and api (curl).

I've tried:

I've checked ( https://cloud.ibm.com/docs/services/discovery?topic=discovery-query-concepts&locale=en ):

  • Advanced plans - ok
  • The size limit is one million characters - ok
  • Only one custom stopword list per collection - ok
  • All stopwords should be lowercase. - ok
  • Delete and create a new collection - ok

Also, I ran curl with invalid collection and environment to check the api validation (unnecessary I know), and it returned 404 "Could not find listed collection" as expected (ok, it's working).

Am I missing something? What more can I check?

curl command:

curl -X POST -u "apikey":"..." --data-binary @custom_stopwords_pt.txt "https://gateway.watsonplatform.net/discovery/api/v1/environments/.../collections/.../word_lists/stopwords?version=2019-04-30"

Thank's

I am using python sdk and the following code to upload a custom stopword list, discovery query seems to use the custom stopwords and update the search query results without any error, I did not even need to Delete and create a new collection.

authenticator = IAMAuthenticator('<your api key>')
discovery = DiscoveryV1(
    version='2020-01-15',
    authenticator=authenticator)
discovery.set_service_url('<service url>')

# get the writable environment id (as eid) and the collection id (as cid) 
# where you want to upload the stopwords list

discovery.create_stopword_list(environment_id=eid,
                               collection_id=cid,
                               stopword_file=stopwords,
                               stopword_filename ='custom_stopwords.txt')

# it takes some time to update the sotpword list
# once it's done, the 'status' returned by the next api should be 'active'
discovery.get_stopword_list_status(environment_id=eid, 
                                   collection_id=cid).get_result()
# {'status': 'active', 'type': 'stopwords'}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM