简体   繁体   中英

How to do I add strings to ClassificationDataSet?

How do I build a dataset with strings in pybrain.datasets.addSample()? I'm getting an error which says "cannot convert string to float: gas".

Am I missing something, like an index value or a defined link between the input and target? I'm not sure how to read the documentation on this. Thanks for your help.

import pybrain
from pybrain.datasets import ClassificationDataSet

#set up input and target variables
ds = ClassificationDataSet(inp=2, target=1)

#add data to dataset
ds.addSample(('gas', 'blue'), ('car',))
ds.addSample(('desiel', 'brown'), ('truck',))

# error
ValueError: could not convert string to float: gas

It looks like pybrain only uses float types. Because of this, you might want to create a unique float value for each unique string variable. Maybe apply the ord() function to each character in the string, for each string in the tuple. Best practice is to use a list comprehension statement rather than map() and lambda functions.

>>> ord('a')
97
>>> ord('\u00c2')
192

or like

>>> [ord(c) for c in 'Hello World!']
[72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100, 33]

so maybe like this:

>>>x = [('gas', 'blue'),]

>>>for var in x:
>>>    # for each letter of word
>>>    for c in var:
>>>        # list of ord() values for each letter of word
>>>        letter = [ord(i) for i in c]
>>>        # convert list to string
>>>        number = [str(i) for i in letter]
>>>        # join() to combine list into a single string
>>>        word = ''.join(number)
>>>        print c, word
gas 10397115
blue 98108117101

Representing strings as float type along with using Natural Language Tool Kit to represent occurrences of words might help in preparing your data for training a neural network model on.

Python3 convert Unicode String to int representation

https://stackoverflow.com/questions/36680250/pybrain-neural-network-nominal-string-inputs

https://datascience.stackexchange.com/questions/869/neural-network-parse-string-data

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM