[英]Map reduce's code python with an error 'string index out of range'
我的数据如下:
1 1.45
1 1.153
2 2.179
2 2.206
2 2.59
2 2.111
3 3.201
3 3.175
4 4.228
4 4.161
4 4.213
我想要的输出是:
1 2 (1 occurs 2 times)
2 4
3 2
4 3
为此,我运行以下代码:
SubPatent2count = {}
for line in data.split('\n'):
for num in line.split('\t'):
Mapper_data = ["%s\t%d" % (num[0], 1) ]
for line in Mapper_data:
Sub_Patent,count = line.strip().split('\t',1)
try:
count = int(count)
except ValueError:
continue
try:
SubPatent2count[Sub_Patent] = SubPatent2count[Sub_Patent]+count
except:
SubPatent2count[Sub_Patent] = count
for Sub_Patent in SubPatent2count.keys():
print ('%s\t%s'% ( Sub_Patent, SubPatent2count[Sub_Patent] ))
最后我得到这个错误:
3 for num in line.split('\t'):
4 #print(num[0])
----> 5 Mapper_data = ["%s\t%d" % (num[0], 1) ]
6 #print(Mapper_data)
7 for line in Mapper_data:
IndexError: string index out of range
如果您对如何处理此错误有任何想法,请帮助。 谢谢!
num[0]
可能是一个空字符串,这就是为什么出现索引超出范围错误的原因。 另一种可能性是,您实际上是用空字符串而不是制表符来分隔每行中的数字。
无论如何,您的代码似乎有些奇怪。 例如,您将数据编码为一个元素的列表( Mapped_data
)的字符串,然后对其进行解码以进行处理。 确实没有必要,您应该避免这种情况。
试试这个代码:
from collections import Counter
decoded_data = [ int(l.split(' ', 1)[0]) for l in data.split('\n') if len(l)>0]
SubPatent2count = Counter(decoded_data)
for k in SubPatent2count:
print k, SubPatent2count[k]
只是建议另一种方法:您是否尝试过使用itertools
列表理解+ 分组 ?
from itertools import groupby
print([(key, len(list(group))) for key, group in groupby([x.split(' ')[0] for x in data.split('\n')])])
# where [x.split(' ')[0] for x in data.split('\n')] generates a list of all starting number
# and groupy counts them
或者,如果您想要确切的输出:
from itertools import groupby
mylist = [(key, len(list(group))) for key, group in groupby([x.split(' ')[0] for x in data.split('\n')])]
for key, repetition in mylist:
print(key, repetition)
谢谢大家,您的建议确实对我有所帮助,我将代码更改如下:
SubPatent2count = {}
for line in data.split('\n'):
Mapper_data = ["%s\o%d" % (line.split(' ')[0], 1) ]
for line in Mapper_data:
Sub_Patent,count = line.strip().split('\o',1)
try:
count = int(count)
except ValueError:
continue
try:
SubPatent2count[Sub_Patent] = SubPatent2count[Sub_Patent]+count
except:
SubPatent2count[Sub_Patent] = count
for Sub_Patent in SubPatent2count.keys():
print ('%s\t%s'% ( Sub_Patent, SubPatent2count[Sub_Patent] ))
它给出以下结果:
1 2 (1 occurs 2 times)
2 4
3 2
4 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.