简体   繁体   English

在Ubuntu 16.04中使用JAVA中的MapReduce在文本文件中搜索给定单词

[英]search a given word in a text File using MapReduce in JAVA in ubuntu 16.04

I have to make a project to find a given word (string). 我必须做一个项目来查找给定的单词(字符串)。 This string will be inputted by the user. 该字符串将由用户输入。 Then find the occurrence of the word in a particular text file stored in HDFS. 然后在存储在HDFS中的特定文本文件中查找单词的出现。 The output should tell the presence of the word string. 输出应该表明单词字符串的存在。

package stringSearchJob;
import java.io.IOException;
import java.util.Scanner;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class StringSearch{
    public static void main(String argv[]) throws Exception {
        try {
            if (argv.length<3) {
                System.err.println("Give the input/ output/ keyword!");
                return;
            }
            JobConf conf = new JobConf(StringSearch.class);
            Job job = new Job(conf,"StringSearch");

            FileInputFormat.addInputPath(job, new Path(argv[0]));
            FileOutputFormat.setOutputPath(job, new Path(argv[1]));
            conf.set("search", argv[2]);

            job.setJarByClass(StringSearch.class);
            job.setMapperClass(WordMapper.class);
            job.setNumReduceTasks(0);

            job.setMapOutputKeyClass(Text.class);
            job.setMapOutputValueClass(IntWritable.class);

            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);

            JobClient.runJob(conf); 
            job.waitForCompletion(true);
       }    
       catch (Exception e) {
           e.printStackTrace();
       }
  }    
  public static class WordMapper extends Mapper<LongWritable, Text, Text, IntWritable>{ 
    @Override 
    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            try {
                Configuration conf = context.getConfiguration();
                String search = conf.get("search");
                String line = value.toString();
                Scanner scanner = new Scanner(line);
                while (scanner.hasNext()) {
                    if (line.contains(search)) {
                        String line1 = scanner.next();
                        context.write(new Text(line1), new IntWritable(1));
                    }
                }
                scanner.close();
            }
            catch (IOException e){
                e.printStackTrace();
            }
            catch (InterruptedException e){
                e.printStackTrace();
            }
        }    
    }    
}

Is my code wrong? 我的代码错了吗? Because the output I get on Ubuntu-16.04 Terminal is not correct. 因为我在Ubuntu-16.04 Terminal上获得的输出不正确。 The steps I followed are as follows: 我遵循的步骤如下:

  1. After wring the above code, I exported it into a Runnable JAR file named as StringSearch.jar . 拧上上面的代码后,我将其导出到名为StringSearch.jar的Runnable JAR文件中。 The class name was StringSearch . 类名是StringSearch
  2. Now, on the Terminal I wrote the following commands: 现在,在终端上,我编写了以下命令:

     hadoop fs -mkdir /user hadoop fs -mkdir /user/hduser hadoop fs -mkdir /user/hduser/StringSearch hadoop fs -mkdir Stringsearch/input hadoop -fs -copyFromLocal sample.txt StringSearch/input hadoop jar StringSearchNew.jar StringSearch /user/hduser/StringSearch/input user/hduser/StringSearch/output 'Lord' 
  3. And I am getting the errors as follows. 而且我得到的错误如下。

     17/08/20 19:17:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/08/20 19:17:41 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 17/08/20 19:17:41 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 17/08/20 19:17:41 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized Exception in thread "main" org.apache.hadoop.mapred.InvalidJobConfException: Output directory not set in JobConf. at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:117) at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:268) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:870) at stringSearchJob.StringSearch.main(StringSearch.java:43) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) 

I basically learned how to use Hadoop MapReduce from Internet only. 我基本上了解了如何仅从Internet使用Hadoop MapReduce。 When I tried to make the program in JAVA after going through all other similar answers, it didn't gave the output. 当我尝试了所有其他类似的答案后,试图用JAVA制作程序时,它没有给出输出。 I am a complete newbie to Hadoop and thus would benefit if you please help me to resort the issue. 我是Hadoop的新手,因此,如果您能帮助我解决这个问题,将会对您有所帮助。 I don't get what's wrong in here! 我不明白这里有什么问题!


After reading the answer, I edited the code and got the following errors: 阅读答案后,我编辑了代码并得到以下错误:

    17/08/24 05:01:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.io.IOException: No FileSystem for scheme: hdfs
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:172)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:357)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPath(FileInputFormat.java:520)
at stringSearchJob.StringSearch.main(StringSearch.java:28)
... 11 more

Set your input and output directory to JobConf object not Job object 将输入和输出目录设置为JobConf对象而不是Job对象

You must change as below : 您必须进行以下更改:

 FileInputFormat.setInputPaths(conf /*from job to conf*/, new Path(args[0]));
 FileOutputFormat.setOutputPath(conf /*from job to conf*/, new Path(args[1]));

So the modified code should look like as below: 因此,修改后的代码应如下所示:

if (argv.length<3) {
                System.err.println("Give the input/ output/ keyword!");
                return;
            }
            JobConf conf = new JobConf(StringSearch.class);
            Job job = new Job(conf,"StringSearch");

            FileInputFormat.setInputPaths(conf, new Path(args[0]));
     FileOutputFormat.setOutputPath(conf, new Path(args[1]));
            conf.set("search", argv[2]);

            job.setJarByClass(StringSearch.class);
            job.setMapperClass(WordMapper.class);
            job.setNumReduceTasks(0);

            job.setMapOutputKeyClass(Text.class);
            job.setMapOutputValueClass(IntWritable.class);

            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);

            JobClient.runJob(conf); 
            job.waitForCompletion(true);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在Java中的文本文件中搜索特定单词 - how to search for a certain word in a text file in java 在WSL Ubuntu 16.04中使用`.jar`复制文件 - Copy file using `.jar` in WSL Ubuntu 16.04 如何在java中的给定文本中搜索最大大小的单词? - How to search the word with max size in a given text in java? 如何使用Java搜索文件中的单词 - How to search for a word in a file using java 如何使用Java搜索用户确定的单词并计算文本文件中的出现次数? - How would I search for a user determined word and count the occurrences in a text file using java? 使用Java中的自平衡不可变二进制搜索树从巨大的文本文件中查找词频? - Find word frequency from huge text file using self-balancing immutable Binary Search Tree in Java? 我正在尝试使用Hadoop MapReduce计算文本文件中“单词对”的出现次数 - I am trying to count the number of occurrences of “pairs of word” in a text file using Hadoop MapReduce 大家好,我想使用Java代码在我的文本文件中搜索单词“((错误:87)”) - Hi All, I want to search this word “(Error: 87)” in my text file using Java code 在Ubuntu 16.04上使用Java中的JNI方法在OpenALPR中导入的软件包中的错误 - Error in packages imported in OpenALPR using JNI method in Java on Ubuntu 16.04 无法在ubuntu 16.04上使用PPA存储库安装java8 - Unable to install java8 using PPA repository on ubuntu 16.04
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM