简体   繁体   中英

Converting a 1D numpy array to a list of lists

I want to split a 1D numpy array into a list of lists, but I am not sure how I could do that.

Basically I am dealing with an array that is filled with tags:

array(['java database servlets derby', 'java graphics groovy awt basic',
       'java lucene', ..., 'javascript android',
       'iphone ios ipad file uiimage',
       'javascript jquery transition effect'], dtype=object)

with shape:

(5000L,)

As you can see every row contains tags separated by white-spaces. I want to store every row as a list with all the tags as separate elements and combine those lists into a list of lists. The result should look like this then:

list_of_lists = [["tag","tag","tag"],["tag","tag","tag"]...]

How could I achieve this? And if you guys know a better method to achieve what I want (namely a data structure where I can access every tag as an element of the specified row) I would be glad to hear it.

Thanks in advance.

Using list comprehension, str.split :

>>> from numpy import array
>>> a = array(['java database servlets derby', 'java graphics groovy awt basic',
...            'java lucene', 'javascript android',
...            'iphone ios ipad file uiimage',
...            'javascript jquery transition effect'])
>>> list_of_lists = [x.split() for x in a]
>>> list_of_lists
[['java', 'database', 'servlets', 'derby'],
 ['java', 'graphics', 'groovy', 'awt', 'basic'],
 ['java', 'lucene'],
 ['javascript', 'android'],
 ['iphone', 'ios', 'ipad', 'file', 'uiimage'],
 ['javascript', 'jquery', 'transition', 'effect']]

There is subtitle difference between the array with dtype=object , and the version in falsetru answer, which has dtype='|S35' . The first is an array of pointers to strings, the other is 6 strings of length 35 for a total of 210 bytes. The [x.split() for x in a] is the same for both. But the object array allows:

for i in range(6): a[i]=a[i].split()

producing

array([['java', 'database', 'servlets', 'derby'],
       ['java', 'graphics', 'groovy', 'awt', 'basic'], ['java', 'lucene'],
       ['javascript', 'android'],
       ['iphone', 'ios', 'ipad', 'file', 'uiimage'],
       ['javascript', 'jquery', 'transition', 'effect']], dtype=object)

If all those sublists were of the same length, or padded to the same length, they could be put in a structured array. eg

array([('java', 'database', 'servlets', 'derby', ''),
       ('java', 'graphics', 'groovy', 'awt', 'basic'),
       ('java', 'lucene', '', '', ''),
       ('javascript', 'android', '', '', ''),
       ('iphone', 'ios', 'ipad', 'file', 'uiimage'),
       ('javascript', 'jquery', 'transition', 'effect', '')], 
      dtype=[('f0', 'S10'), ('f1', 'S10'), ('f2', 'S10'), ('f3', 'S10'), ('f4', 'S10')])

then you could access specific fields, across all 'rows', by name

a2['f0']
# array(['java', 'java', 'java', 'javascript', 'iphone', 'javascript'],dtype='|S10')

http://docs.scipy.org/doc/numpy/user/basics.rec.html

This is numpy, please don't use loops :P You can use np.char.split to just apply split to all elements of the array at once:

A = np.char.split(A)

You don't need to make it a list if you really just want

a data structure where I can access every tag as an element of the specified row

just the array works fine for that:

>>> A = np.char.split(A)
>>> A[0]
['java', 'database', 'servlets', 'derby']
>>> A
array([['java', 'database', 'servlets', 'derby'],
       ['java', 'graphics', 'groovy', 'awt', 'basic'], ['java', 'lucene'],
       ['javascript', 'android'],
       ['iphone', 'ios', 'ipad', 'file', 'uiimage'],
       ['javascript', 'jquery', 'transition', 'effect']], dtype=object)

But you can convert to a list with:

>>> A.tolist()
[['java', 'database', 'servlets', 'derby'],
 ['java', 'graphics', 'groovy', 'awt', 'basic'],
 ['java', 'lucene'],
 ['javascript', 'android'],
 ['iphone', 'ios', 'ipad', 'file', 'uiimage'],
 ['javascript', 'jquery', 'transition', 'effect']]

(note that if your dtype is object , use A = A.astype('S') first to make it a string array.)

To be honest, in a length 5000 array, it seems this is about the same speed as the loop comprehension, though. np.char probably isn't doing much different under the hood.

By the way, you can read the text in with numpy itself if you're not using pandas for anything else. If your file looks like:

java database servlets derby
java graphics groovy awt basic
java lucene
javascript android
iphone ios ipad file uiimage
javascript jquery transition effect

Then:

A = np.genfromtxt('tags.txt', dtype='S', delimiter='\n')
A = np.char.split(A)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM