简体   繁体   中英

How does sklearn-crfsuite handle strings?

I've been following the sklearn-crfsuitetutorial .

The sample of features used to train the CRF model is shown below.

{'+1:postag': 'Fpa',
 '+1:postag[:2]': 'Fp',
 '+1:word.istitle()': False,
 '+1:word.isupper()': False,
 '+1:word.lower()': '(',
 'BOS': True,
 'bias': 1.0,
 'postag': 'NP',
 'postag[:2]': 'NP',
 'word.isdigit()': False,
 'word.istitle()': True,
 'word.isupper()': False,
 'word.lower()': 'melbourne',
 'word[-2:]': 'ne',
 'word[-3:]': 'rne'}

How does sklearn-crfsuite convert strings like melbourne to floats, since the features for CRFs should be only floats. There is no mention of this anywhere in the documentation.

sklearn-crf features are in a python-crfsuite format. Each string is considered as the key:

    * {"string_key": "string_value", ...} dict; that's the same as
      {"string_key=string_value": 1.0, ...}
    * ["string_key1", "string_key2", ...] list; that's the same as
      {"string_key1": 1.0, "string_key2": 1.0, ...}

You can find more here: https://github.com/scrapinghub/python-crfsuite/blob/master/pycrfsuite/_pycrfsuite.pyx

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM