简体   繁体   English

字计数器程序无法产生正确的字数

[英]Word counter program not producing correct number of words

I'm new to reading text from a file. 我是从文件中读取文本的新手。 I've got a task for which I need to print the amount of words which are in a file. 我有一项任务需要打印文件中的单词数量。

I'm using TextEdit on mac OS which ends in .rtf 我在以.rtf结尾的mac OS上使用TextEdit

When I run the following program, I get the output 5 even when the document is empty. 当我运行以下程序时,即使文档为空,我也会得到输出5。 When I add words, the count doesn't increment correctly. 当我添加单词时,计数不会正确增加。

Thanks. 谢谢。

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;

public class Analyze{ 


public static void main(String[] args) throws FileNotFoundException{
    Scanner console = new Scanner(System.in);
    int words = 0; 
    System.out.println("This is a word counter");
    System.out.println("File name");
    String filename = console.next();
    File name = new File(filename);

    Scanner int2 = new Scanner(name);

    while (int2.hasNext()) {
        String temp = int2.next();
        words++;
    }

    System.out.println(words);
    }
}

The problem is that you are reading a RTF file. 问题是您正在读取RTF文件。

A 'blank' (as in no entered text) RTF file generated with TextEdit looks like this: 用TextEdit生成的“空白”(无输入文本)RTF文件如下所示:

{\rtf1\ansi\ansicpg1252\cocoartf1404\cocoasubrtf130
{\fonttbl}
{\colortbl;\red255\green255\blue255;}
\margl1440\margr1440\vieww10800\viewh8400\viewkind0
}

As you can see, the five lines correspond to the output of 5. 如您所见,五行对应于5的输出。

Either parse RTF in your program, which I doubt you want to do, or switch TextEdit to plaintext mode. 我怀疑您要解析程序中的RTF,还是将TextEdit切换为纯文本模式。 See here 这里

The file you're trying to count is an RTF file? 您要计算的文件是RTF文件? Does it support italics, bold, font selection and things like that? 它是否支持斜体,粗体,字体选择等? In that case, it probably contains some data, even if there is no text. 在这种情况下,即使没有文本,它也可能包含一些数据。 Your program does not care about the file format, so it naïvely reads everything as text. 您的程序不关心文件格式,因此它天真地将所有内容读取为文本。

Try running od or hexdump on your file (not sure if these exist on Mac OS X?) -- they print the exact bytes of a file. 尝试在文件上运行odhexdump (不确定Mac OS X上是否存在这些文件)-它们会打印文件的确切字节。 A truly empty file should not yield any output. 真正为空的文件不应产生任何输出。

If your computer doesn't have the od or hexdump programs, you could try cat . 如果您的计算机没有odhexdump程序,则可以尝试cat It doesn't print the contents as numbers, so it doesn't give a 100% accurate view of special characters, but it should be able to demonstrate to you whether your file is empty or not. 它不会将内容打印为数字,因此不会提供100%准确的特殊字符视图,但是它应该能够向您演示文件是否为空。

Besides the RTF-Problem, also note that 除了RTF问题外,还请注意

A Scanner breaks its input into tokens using a delimiter pattern, which by default matches whitespace. 扫描程序使用定界符模式将其输入分为令牌,默认情况下,该模式与空格匹配。

with whitespace as in 带有空格

A whitespace character: [ \\t\\n\\x0B\\f\\r] 空格字符:[\\ t \\ n \\ x0B \\ f \\ r]

so the count is including tabs, newlines, etc. not only blanks 因此计数包括制表符,换行符等,而不仅是空白

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM