繁体   English   中英

我在python中使用mincemeat有困难map-reduce来计算不同文件的wordcount

[英]I am having difficulty using mincemeat in python for map-reduce to calculate wordcount of different files

这是代码:

import glob
import mincemeat
import re

text_files = glob.glob('finalcount/1/*')
def file_contents(file_name):
    f = open(file_name)
    try:
        return f.read()
    finally:
        f.close()

source = dict((file_name, file_contents(file_name))
          for file_name in text_files)

def mapfn(key, value):
    for line in value.splitlines():
        list1 = [ ]
        for temp in re.split('[\t]+',line):
            list1.append(temp)
        x = int(list1[1].strip());
        yield [list1[0],x]

def reducefn(key, value):
    return key, sum(value)

s = mincemeat.Server()
s.datasource = source
s.mapfn = mapfn
s.reducefn = reducefn

results = s.run_server(password="wola")
print results

此代码用于计算多个文件的字数。 但它不断返回错误:

error: uncaptured python exception, closing channel <__main__.Client connected at 0x25c1990> 
(<type 'exceptions.ValueError'>:invalid literal for int() with base 10: '' 
 [C:\Python27\lib\asyncore.py|read|83] 
 [C:\Python27\lib\asyncore.py|handle_read_event|444] 
 [C:\Python27\lib\asynchat.py|handle_read|140] 
 [mincemeat.py|found_terminator|97] 
 [mincemeat.py|process_command|195] 
 [mincemeat.py|call_mapfn|171] 
 [projcount.py|mapfn|21])

我正在处理的输入文件如下所示。 现在我想在不同的文件中添加单词并将它们旁边的数字相加。

fawn    24
gai 1
nunnery 11
sowell  3
sonja   29
woods   591
clotted 1
spiders 84
hanging 522

line.split()替换re.split之后,我收到了这个错误。

error: uncaptured python exception, closing channel <__main__.Client connected at 0x2531990> 
(<type 'exceptions.IndexError'>:list index out of range 
 [C:\Python27\lib\asyncore.py|read|83] 
 [C:\Python27\lib\asyncore.py|handle_read_event|444] 
 [C:\Python27\lib\asynchat.py|handle_read|140] 
 [mincemeat.py|found_terminator|97]
 [mincemeat.py|process_command|195] 
 [mincemeat.py|call_mapfn|171] 
 [projcount.py|mapfn|21]) 

我在不同的场合得到了这个错误,我发现问题出现在你使用python 3.3时,我删除了3.3并安装了2.7.5( http://python.org/download/ )并且现在工作正常。 :)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM