变平列表清单

Question

我有以下数据结构：

 a= [
       [u'happy', u'thursday', u'from', u'my', u'big', u'sweater', u'and', u'this', 
        u'ART', u'@', u'East', u'Village', u',', u'Manhattan', u'https', 
        u':', u'//t.co/5k8PUInmqK'],
       [u'RT', u'@', u'MayorKev', u':', u'IM', u'SO', u'HYPEE', u'@', u'calloutband', 
        u'@', u'FreakLikeBex', u'#', u'Callout', u'#', u'TheBitterEnd', u'#',
        u'Manhattan', u'#', u'Music', u'#', u'LiveMusic', u'#', u'NYC', 
         u'#', u'NY', u'#',
        u'Jersey', u'#', u'NJ', u'http', u':', u'//t.co/0\u2026']
     ]

我的看法是，它是一个字符串列表的列表，除了它用一对[]而不是（）包围。 对[]是系统生成的，其结果是：

a = [nltk.tokenize.word_tokenize(tweetL) for tweetL in tweetList]

最终，我需要将此结构展平为字符串列表，并对单词进行一些正则表达式和计数操作，但是外面的[]阻止了这种情况。

我尝试使用：

list.extend()

和

ll = len(a)
for n in xrange(ll):
    print 'list - ', a[n], 'number = ', n

但仍然得到相同的结果：

list - [ number =  1
list - u number =  2
list - ' number =  3
list - h number =  4
list - a number =  5
list - p number =  6
list - p number =  7

如您所见，代码将字符串的每个符号视为列表的元素，而不是将整个字符串视为元素

可以有效地做什么？

尝试了这个：

flat_list = [i for sublist in a for i in sublist] 
for i in flat_list:
    print 'element - ', i

结果（部分）：

element -  h
element -  a
element -  p
element -  p
element -  y
element -   
element -  t

Answer 1

我不确定我是否完全理解您的问题，请告诉我是否离我远一点，但是根据您提供的输入，您将获得一个列表列表。 不仅如此，如果这就是您一直拥有的结构，那么您只需拿出所需的东西即可

a = a[0]

那只会给您一个清单。

然后，您可以简单地迭代为：

for i in a:
    print(i)

但是，如果这只是一个示例，并且您实际上有类似以下内容：

[[],[],[],[]]

并且您想要将其完全展平为一个列表，那么您要使用的理解是：

flat_list = [i for sublist in a for i in sublist]

然后，您只需一个列表即可： [1, 2, 3, 4]

然后，您只需迭代所需的内容即可：

for i in flat_list:
    print(i)

或者，如果您也想打印索引，则可以执行以下操作：

for i, v in enumerate(flat_list):
    print("{}: {}".format(i, v))

关于您使用extend的最后评论。

作为该方法的帮助extend如下：

extend(...)
    L.extend(iterable) -- extend list by appending elements from the iterable

因此，它的用法像本示例一样“扩展”了列表：

a = [1, 2, 3]
b = [4, 5, 6]
a.extend(b)
# a will now be [1, 2, 3, 4, 5, 6]

运行您的输入：

a = [[u'happy', u'thursday', u'from', u'my', u'big', u'sweater', u'and', u'this', u'ART', u'@', u'East', u'Village', u',', u'Manhattan', u'https', u':', u'//t.co/5k8PUInmqK'], [u'RT', u'@', u'MayorKev', u':', u'IM', u'SO', u'HYPEE', u'@', u'calloutband', u'@', u'FreakLikeBex', u'#', u'Callout', u'#', u'TheBitterEnd', u'#', u'Manhattan', u'#', u'Music', u'#', u'LiveMusic', u'#', u'NYC', u'#', u'NY', u'#', u'Jersey', u'#', u'NJ', u'http', u':', u'//t.co/0\u2026']]

在我的代码上，产生以下输出：

0: happy
1: thursday
2: from
3: my
4: big
5: sweater
6: and
7: this
8: ART
9: @
10: East
11: Village
12: ,
13: Manhattan
14: https
15: :
16: //t.co/5k8PUInmqK

Answer 2

a= [[u'happy', u'thursday', u'from', u'my', u'big', u'sweater', u'and', u'this', u'ART', u'@', u'East', u'Village', u',', u'Manhattan', u'https', u':', u'//t.co/5k8PUInmqK'], [u'RT', u'@', u'MayorKev', u':', u'IM', u'SO', u'HYPEE', u'@', u'calloutband', u'@', u'FreakLikeBex', u'#', u'Callout', u'#', u'TheBitterEnd', u'#', u'Manhattan', u'#', u'Music', u'#', u'LiveMusic', u'#', u'NYC', u'#', u'NY', u'#', u'Jersey', u'#', u'NJ', u'http', u':', u'//t.co/0\u2026']]

from itertools import chain

flat_a = list(chain.from_iterable(a))

['happy', 'thursday', 'from', 'my', 'big', 'sweater', 'and', 'this', 'ART', '@', 'East', 'Village', ',', 'Manhattan', 'https', ':', '//t.co/5k8PUInmqK', 'RT', '@', 'MayorKev', ':', 'IM', 'SO', 'HYPEE', '@', 'calloutband', '@', 'FreakLikeBex', '#', 'Callout', '#', 'TheBitterEnd', '#', 'Manhattan', '#', 'Music', '#', 'LiveMusic', '#', 'NYC', '#', 'NY', '#', 'Jersey', '#', 'NJ', 'http', ':', '//t.co/0…']

print(flat_a)

Answer 3

a= [[u'happy', u'thursday', u'from', u'my', u'big', u'sweater', u'and', u'this', u'ART', u'@', u'East', u'Village', u',', u'Manhattan', u'https', u':', u'//t.co/5k8PUInmqK'], [u'RT', u'@', u'MayorKev', u':', u'IM', u'SO', u'HYPEE', u'@', u'calloutband', u'@', u'FreakLikeBex', u'#', u'Callout', u'#', u'TheBitterEnd', u'#', u'Manhattan', u'#', u'Music', u'#', u'LiveMusic', u'#', u'NYC', u'#', u'NY', u'#', u'Jersey', u'#', u'NJ', u'http', u':', u'//t.co/0\u2026']]
for L in a:
    for e in L:
        print "element "+e


element happy
element thursday
element from
element my
element big
element sweater
element and
element this
element ART
element @
element East

Answer 4

嵌套列表理解应该可以解决您的第一个问题。

a = [token for tweetL in tweetList for token in nltk.tokenize.word_tokenize(tweetL)]

此构造使您可以迭代从嵌套的for循环中找到的元素。 最外面的for循环始终排在最前面，然后是最外面的第二个，依此类推，直到最里面的for循环排在最后。

理解以下内容可能会有所帮助：

a = []
for tweetL in tweetList:
    for token in nltk.tokenize.word_tokenize(tweetL):
        a.append(token)

在Python 2中，您可以使用utf-8对unicode字符串进行编码。 这会将它们从unicode类型转换为str类型，这应该可以解决UnicodeEncodeError 。

例：

u'\u2713'.encode('utf-8')

有关Python 2 Unicode的更多信息，您可以在这里阅读： https : //docs.python.org/2/howto/unicode.html

变平列表清单

问题描述

4 个解决方案

解决方案1
2 2015-10-15 23:22:53

解决方案2
1 2015-10-15 23:45:38

解决方案3
1 2015-10-16 00:12:33

解决方案4
1 已采纳 2015-10-16 00:33:33

变平列表清单

问题描述

4 个解决方案

解决方案1 2 2015-10-15 23:22:53

解决方案2 1 2015-10-15 23:45:38

解决方案3 1 2015-10-16 00:12:33

解决方案4 1 已采纳 2015-10-16 00:33:33

解决方案1
2 2015-10-15 23:22:53

解决方案2
1 2015-10-15 23:45:38

解决方案3
1 2015-10-16 00:12:33

解决方案4
1 已采纳 2015-10-16 00:33:33