字符计数在Java中不累加

Question

I am writing a Java program that goes through a file and provides counts of characters. 我正在编写一个遍历文件并提供字符计数的Java程序。 The problem I'm having is that my counts aren't adding up. 我遇到的问题是我的人数没有增加。 When I add the isAlphabetic(char c) and isDigit(char c) , they do not equal the isLetterOrDigit(char c) method (please forgive me if I'm using the wrong terminology). 当我添加isAlphabetic(char c)和isDigit(char c) ，它们不等于isLetterOrDigit(char c)方法（如果我使用的术语错误，请原谅我）。

What am I missing? 我想念什么？ Here is a copy of my code so far. 到目前为止，这是我的代码的副本。

for (String word : words) {
                char[] ch = word.toCharArray();
                for (int i = 0; i < word.length(); i++) {
                    if (Character.isBmpCodePoint(ch[i])) {
                        charCount++;
                        if (Character.isLetterOrDigit(ch[i])) {
                            alphnumCount++;
                        }
                        if (Character.isAlphabetic(ch[i])) {
                            alphabetCount++;
                        }
                        if (Character.isDigit(ch[i])) {
                            numericCount++;
                        }
                    }
                }
            }
            // Reading next line into currentLine
            currentLine = reader.readLine();
        }
        // Printing charCount, wordCount and lineCount
        System.out.println("Number Of Chars In..Lab.docx File : " + charCount);
        System.out.println("Number Of Alph+Numeric Chars In..Lab.docx File : " + alphnumCount);
        System.out.println("Number Of Alphabet Chars In..Lab.docx File : " + alphabetCount);
        System.out.println("Number Of Numeric Chars In..Lab.docx File : " + numericCount);
        System.out.println("Number Of Words In..Lab.docx File : " + wordCount);
        System.out.println("Number Of Lines In..Lab.docx File : " + lineCount);
        System.out.println(alphabetCount + numericCount - alphnumCount);

        reader.close(); // Closing the reader
    }
}

Answer 1

I think the Problem here is that you are reading an *.docx file. 我认为这里的问题是您正在阅读*.docx文件。

If I use a simple text file with the following content your script works. 如果我使用具有以下内容的简单文本文件，则脚本可以工作。

Test123
7asdf

The output is: 输出为：

Number Of Chars In..CSCI_1136_Lab6.docx File : 12
Number Of Alph+Numeric Chars In..CSCI_1136_Lab6.docx File : 12
Number Of Alphabet Chars In..CSCI_1136_Lab6.docx File : 8
Number Of Numeric Chars In..CSCI_1136_Lab6.docx File : 4
Number Of Words In..CSCI_1136_Lab6.docx File : 2
Number Of Lines In..CSCI_1136_Lab6.docx File : 2
0

If you want to count the characters in a *.docx file that is not possible in this way, because you are interpreting the bytes of that file as String, which they don't are. 如果要在*.docx文件中计算字符，则用这种方法是不可能的，因为您正在将该文件的字节解释为String，而字节不是。

DOCX is written in an XML format, which consists of a ZIP archive file containing XML and binaries. DOCX以XML格式编写，该格式由包含XML和二进制文件的ZIP存档文件组成。

From forensicswiki . 来自法医维基。

So *.docx files are not stored in plain text, which you are expecting in your code. 因此， *.docx文件不会以纯文本形式存储，这在您的代码中是期望的。

Another point is that you are using Character.isAlphabetic() instead of Character.isLetter() : 另一点是您使用的是Character.isAlphabetic()而不是Character.isLetter() ：

Form the docs for Character.isAlphabetic() : 形成用于Character.isAlphabetic()的文档：

Determines if the specified character (Unicode code point) is an alphabet. 确定指定的字符（Unicode代码点）是否为字母。

A character is considered to be alphabetic if its general category type, provided by getType(codePoint), is any of the following: 如果getType（codePoint）提供的字符的常规类别类型为以下任意一种，则认为该字符为字母：

UPPERCASE_LETTER 大写字母

LOWERCASE_LETTER 小写字母

TITLECASE_LETTER TITLECASE_LETTER

MODIFIER_LETTER MODIFIER_LETTER

OTHER_LETTER OTHER_LETTER

LETTER_NUMBER LETTER_NUMBER

or it has contributory property Other_Alphabetic as defined by the Unicode Standard. 或具有Unicode标准定义的贡献性属性Other_Alphabetic。

Form the docs for Character.isLetter() : 形成用于Character.isLetter()的文档：

Determines if the specified character (Unicode code point) is a letter. 确定指定的字符（Unicode代码点）是否为字母。

A character is considered to be a letter if its general category type, provided by Character.getType(ch), is any of the following: 如果Character.getType（ch）提供的常规类别类型为以下任意一种，则认为该字符为字母：

UPPERCASE_LETTER 大写字母

LOWERCASE_LETTER 小写字母

TITLECASE_LETTER TITLECASE_LETTER

MODIFIER_LETTER MODIFIER_LETTER

OTHER_LETTER OTHER_LETTER

Not all letters have case. 并非所有字母都有大小写。 Many characters are letters but are neither uppercase nor lowercase nor titlecase. 许多字符是字母，但既不是大写字母也不是小写字母也不是标题字母。

So there is a difference between both methods. 因此，这两种方法之间存在差异。 The method Character.isLetterOrDigit() is using isLetter() and isDigit() ; 方法Character.isLetterOrDigit()使用isLetter()和isDigit() ;

Determines if the specified character (Unicode code point) is a letter or digit. 确定指定的字符（Unicode代码点）是字母还是数字。

A character is considered to be a letter or digit if either isLetter(codePoint) or isDigit(codePoint) returns true for the character. 如果isLetter（codePoint）或isDigit（codePoint）对该字符返回true，则该字符被视为字母或数字。

From the docs for Character.isLetterOrDigit() . 来自Character.isLetterOrDigit()的文档。

So if you use Character.isLetter() instead of Character.isAlphabetic() your result should be correct. 因此，如果使用Character.isLetter()而不是Character.isAlphabetic()结果应该正确。

This is my result for a *.docx file using Character.isLetter() : 这是我使用Character.isLetter()获得*.docx文件的结果：

Number Of Chars In..CSCI_1136_Lab6.docx File : 5923
Number Of Alph+Numeric Chars In..CSCI_1136_Lab6.docx File : 1758
Number Of Alphabet Chars In..CSCI_1136_Lab6.docx File : 1550
Number Of Numeric Chars In..CSCI_1136_Lab6.docx File : 208
Number Of Words In..CSCI_1136_Lab6.docx File : 66
Number Of Lines In..CSCI_1136_Lab6.docx File : 48
0

字符计数在Java中不累加

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-03-16 13:00:40

字符计数在Java中不累加

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-03-16 13:00:40

解决方案1
0 已采纳 2019-03-16 13:00:40