简体   繁体   中英

how to use split() on python numpy.bytes_ type? (read dictionary from file)

I want to read data from a (very large, whitespace separated, two-column) text file into a Python dictionary. I tried to do this with a for-loop but that was too slow. MUCH faster is reading it with numpy loadtxt into a struct array and then converting it to a dictionary:

data = np.loadtxt('filename.txt', dtype=[('field1', 'a20'), ('field2', int)], ndmin=1)
result = dict(data)

But this is surely not the best way? Any advice?

The main reason I need something else, is that the following does not work:

data[0]['field1'].split(sep='-')

It leads to the error message:

TypeError: Type str doesn't support the buffer API

If the split() method exists, why can't I use it? Should I use a different dtype? Or is there a different (fast) way to read the text file? Is there anything else I am missing?

Versions: python version 3.3.2 numpy version 1.7.1

Edit: changed data['field1'].split(sep='-') to data[0]['field1'].split(sep='-')

The standard library split returns a variable number of arguments, depending on how many times the separator is found in the string, and is therefore not very suitable for array operations. My char numpy arrays (I'm running 1.7) do not have a split method, by the way.

You do have np.core.defchararray.partition , which is similar but poses no problems for vectorization, as well as all the other string operations :

>>> a = np.array(['a - b', 'c - d', 'e - f'], dtype=np.string_)
>>> a
array(['a - b', 'c - d', 'e - f'], 
      dtype='|S5')
>>> np.core.defchararray.partition(a, '-')
array([['a ', '-', ' b'],
       ['c ', '-', ' d'],
       ['e ', '-', ' f']], 
      dtype='|S2')

Because: type(data[0]['field1']) gives <class 'numpy.bytes_'> , the split() method does not work when it has a "normal" string as argument (is this a bug?)

the way I solved it: data[0]['field1'].split(sep=b'-') (the key to this is to put the b in front of '-')

And of course Jaime's suggestion to use the following was very helpful: np.core.defchararray.partition(a, '-') but also in this case b'-' is needed to make it work.

In fact, a related question was answered here: Type str doesn't support the buffer API although at first sight I did not realise this was the same issue.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM