简体   繁体   English

如何在不使用map reduce的情况下使用Python编写单词计数程序

[英]How to write a wordcount program using Python without using map reduce

Actually im new to hadoop and also to python .... So my doubt is how to run a python script in hadoop..... And also i was writing a wordcount program using python..So, can we execute this script without using the map reduce.... Actually i wrote the code i can see the output as below Darkness 1 Heaven 2 It 3 Light 4 age 5 age 6 all 7 all 8 authorities 9 before 10 before 11 being 12 belief 13 best 14 comparison 15 degree 16 despair 17 direct 18 direct 19 实际上,我对hadoop以及python都是新手....所以我的疑问是如何在hadoop中运行python脚本.....而且我也正在使用python编写wordcount程序。使用地图缩小...。实际上,我编写了代码,我可以看到以下输出:黑暗1天堂2它3亮4年龄5年龄6所有7所有8权威9前10前11是12信念13最好14比较15学位16绝望17直接18直接19

It is counting number of words in a list..but whati have to achieve is grouping and deleting the duplicates and also count number of times of its occurrences  ..... 

Below is my code . can somebody please tell me where i have done the mistake

********************************************************
   Wordcount.py
********************************************************

import urllib2
import random
from operator import itemgetter

current_word = {}
current_count = 0
story = 'http://sixty-north.com/c/t.txt'
request = urllib2.Request(story)
response = urllib2.urlopen(request)
each_word = []
words = None
count = 1
same_words ={}
word = []
""" looping the entire file """
for line in response:
    line_words = line.split()
    for word in line_words:  # looping each line and extracting words
        each_word.append(word)
        random.shuffle(each_word)
        Sort_word = sorted(each_word)
for words in Sort_word:
    same_words = words.lower(),int(count)
    #print same_words
    #print words
    if not words in current_word :
        current_count = current_count +1
        print '%s\t%s' % (words, current_count)
    else:
        current_count = 1
        #if Sort_word == words.lower():
            #current_count += count
current_count = count
current_word = word
        #print '2. %s\t%s' % (words, current_count)

For running python Based MR tasks ,have a Look at: 对于运行基于python的MR任务,请查看:

http://hadoop.apache.org/docs/r1.1.2/streaming.html http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/ http://hadoop.apache.org/docs/r1.1.2/streaming.html http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/

You need to design your code in Terms of Mapper - Reducer to enable Hadoop to execute your Python script. 您需要根据Mapper-Reducer设计代码,以使Hadoop执行您的Python脚本。 Read on the Map-Reduce Programming Paradigm before you can jump into writing the code. 在跳入编写代码之前,请先阅读Map-Reduce编程范例。 Its important to understand the MR programming paradigm and the role of {Key , value } pairs in solving the problem. 了解MR编程范例以及{Key,value}对在解决问题中的作用非常重要。

#Modified your above code to generate the required output
import urllib2
import random
from operator import itemgetter

current_word = {}
current_count = 0
story = 'http://sixty-north.com/c/t.txt'
request = urllib2.Request(story)
response = urllib2.urlopen(request)
each_word = []
words = None
count = 1
same_words ={}
word = []
""" looping the entire file """
#Collect All the words into a list
for line in response:
    #print "Line = " , line
    line_words = line.split()
    for word in line_words:  # looping each line and extracting words
        each_word.append(word)

#for every word collected, in dict same_words
#if a key exists, such that key == word then increment Mapping Value by 1
# Else add word as new key with mapped value as 1
for words in each_word:
    if words.lower() not in same_words.keys() :
        same_words[words.lower()]=1
    else:
        same_words[words.lower()]=same_words[words.lower()]+1

for each in same_words.keys():
    print "word = ",each, ", count = ",same_words[each]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我在python中使用mincemeat有困难map-reduce来计算不同文件的wordcount - I am having difficulty using mincemeat in python for map-reduce to calculate wordcount of different files Python:使用 map 写外积并减少 - Python: write outer product using map and reduce python map减少西里尔文本中的简单wordcount - python map reduce simple wordcount in cyrillic text 如何编写 python 程序以使用 reduce() 组合两个字符串的第二个字母 - How to write python program to Combine second letters of the two strings using reduce() 如何使用 Python 中的 map 和 reduce 函数为 e^x 编写泰勒级数? - How can I write the Taylor Series for e^x using map and reduce functions in Python? 如何使用map或reduce函数在Python中压缩列表? - How can I compress a list in Python using map or reduce function? 如何使用 map 在 Python 中的步骤中组成函数列表并减少 - How to compose a list of functions in steps in Python using map and reduce Python - 如何编写程序以在不使用 %、//、/ 或任何乘法的情况下获得两个数的商的余数? - Python - How to write a program to get the remainder of the quotient of two numbers without using %, //, / or any multiplication? 使用Python代码在Hadoop流中运行Wordcount - Running Wordcount with Hadoop streaming, using Python code 无法在Hadoop中使用python运行map reduce? - unable to run map reduce using python in Hadoop?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM