简体   繁体   English

将一维numpy数组转换为列表列表

[英]Converting a 1D numpy array to a list of lists

I want to split a 1D numpy array into a list of lists, but I am not sure how I could do that. 我想将一维numpy数组拆分为列表列表,但是我不确定如何做到这一点。

Basically I am dealing with an array that is filled with tags: 基本上我正在处理一个充满标签的数组:

array(['java database servlets derby', 'java graphics groovy awt basic',
       'java lucene', ..., 'javascript android',
       'iphone ios ipad file uiimage',
       'javascript jquery transition effect'], dtype=object)

with shape: 形状:

(5000L,)

As you can see every row contains tags separated by white-spaces. 如您所见,每一行都包含用空格分隔的标签。 I want to store every row as a list with all the tags as separate elements and combine those lists into a list of lists. 我想将每一行存储为列表,并将所有标记作为单独的元素存储,并将这些列表组合为列表。 The result should look like this then: 结果应如下所示:

list_of_lists = [["tag","tag","tag"],["tag","tag","tag"]...]

How could I achieve this? 我怎样才能做到这一点? And if you guys know a better method to achieve what I want (namely a data structure where I can access every tag as an element of the specified row) I would be glad to hear it. 而且,如果你们知道实现我想要的更好的方法(即一种数据结构,可以在其中访问每个标记作为指定行的元素),那么我将很高兴听到它。

Thanks in advance. 提前致谢。

Using list comprehension, str.split : 使用列表str.split

>>> from numpy import array
>>> a = array(['java database servlets derby', 'java graphics groovy awt basic',
...            'java lucene', 'javascript android',
...            'iphone ios ipad file uiimage',
...            'javascript jquery transition effect'])
>>> list_of_lists = [x.split() for x in a]
>>> list_of_lists
[['java', 'database', 'servlets', 'derby'],
 ['java', 'graphics', 'groovy', 'awt', 'basic'],
 ['java', 'lucene'],
 ['javascript', 'android'],
 ['iphone', 'ios', 'ipad', 'file', 'uiimage'],
 ['javascript', 'jquery', 'transition', 'effect']]

There is subtitle difference between the array with dtype=object , and the version in falsetru answer, which has dtype='|S35' . 具有dtype=object的数组与falsetru答案中的版本具有falsetru dtype='|S35'之间存在字幕差异。 The first is an array of pointers to strings, the other is 6 strings of length 35 for a total of 210 bytes. 第一个是指向字符串的指针数组,另一个是长度为35的6个字符串,总共210个字节。 The [x.split() for x in a] is the same for both. [x.split() for x in a]两者相同。 But the object array allows: 但是object数组允许:

for i in range(6): a[i]=a[i].split()

producing 生产

array([['java', 'database', 'servlets', 'derby'],
       ['java', 'graphics', 'groovy', 'awt', 'basic'], ['java', 'lucene'],
       ['javascript', 'android'],
       ['iphone', 'ios', 'ipad', 'file', 'uiimage'],
       ['javascript', 'jquery', 'transition', 'effect']], dtype=object)

If all those sublists were of the same length, or padded to the same length, they could be put in a structured array. 如果所有这些子列表的长度相同或填充为相同的长度,则可以将它们放入结构化数组中。 eg 例如

array([('java', 'database', 'servlets', 'derby', ''),
       ('java', 'graphics', 'groovy', 'awt', 'basic'),
       ('java', 'lucene', '', '', ''),
       ('javascript', 'android', '', '', ''),
       ('iphone', 'ios', 'ipad', 'file', 'uiimage'),
       ('javascript', 'jquery', 'transition', 'effect', '')], 
      dtype=[('f0', 'S10'), ('f1', 'S10'), ('f2', 'S10'), ('f3', 'S10'), ('f4', 'S10')])

then you could access specific fields, across all 'rows', by name 那么您可以按名称访问所有“行”中的特定字段

a2['f0']
# array(['java', 'java', 'java', 'javascript', 'iphone', 'javascript'],dtype='|S10')

http://docs.scipy.org/doc/numpy/user/basics.rec.html http://docs.scipy.org/doc/numpy/user/basics.rec.html

This is numpy, please don't use loops :P You can use np.char.split to just apply split to all elements of the array at once: 这是numpy,请不要使用循环:P您可以使用np.char.split一次将split应用于数组的所有元素:

A = np.char.split(A)

You don't need to make it a list if you really just want 如果您只想要一个列表,则无需将其列出

a data structure where I can access every tag as an element of the specified row 我可以访问每个标签作为指定行的元素的数据结构

just the array works fine for that: 只是数组可以正常工作:

>>> A = np.char.split(A)
>>> A[0]
['java', 'database', 'servlets', 'derby']
>>> A
array([['java', 'database', 'servlets', 'derby'],
       ['java', 'graphics', 'groovy', 'awt', 'basic'], ['java', 'lucene'],
       ['javascript', 'android'],
       ['iphone', 'ios', 'ipad', 'file', 'uiimage'],
       ['javascript', 'jquery', 'transition', 'effect']], dtype=object)

But you can convert to a list with: 但是您可以使用以下方法将其转换为列表:

>>> A.tolist()
[['java', 'database', 'servlets', 'derby'],
 ['java', 'graphics', 'groovy', 'awt', 'basic'],
 ['java', 'lucene'],
 ['javascript', 'android'],
 ['iphone', 'ios', 'ipad', 'file', 'uiimage'],
 ['javascript', 'jquery', 'transition', 'effect']]

(note that if your dtype is object , use A = A.astype('S') first to make it a string array.) (请注意,如果您的dtype是object ,请首先使用A = A.astype('S')使其成为字符串数组。)

To be honest, in a length 5000 array, it seems this is about the same speed as the loop comprehension, though. 老实说,在长度为5000的数组中,这似乎与循环理解的速度大致相同。 np.char probably isn't doing much different under the hood. np.char可能在np.char没有太大不同。

By the way, you can read the text in with numpy itself if you're not using pandas for anything else. 顺便说一句,如果您不将熊猫用于其他任何事情,则可以使用numpy本身读取文本。 If your file looks like: 如果文件如下所示:

java database servlets derby
java graphics groovy awt basic
java lucene
javascript android
iphone ios ipad file uiimage
javascript jquery transition effect

Then: 然后:

A = np.genfromtxt('tags.txt', dtype='S', delimiter='\n')
A = np.char.split(A)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM