简体   繁体   English

跟踪文本文件中字符的频率

[英]Tracking Frequencies of Characters in a Text File

So, for my current Data Structures project, I have to read a text file and eventually Huffman encode the characters, and then return the newly encoded file (I'm nowhere near any of this yet). 因此,对于我当前的Data Structures项目,我必须阅读一个文本文件,并最终对Huffman进行字符编码,然后返回新编码的文件(我还远远没有这个)。

The first thing I have to do is scan through the file and determine the frequency of each character, and then create an ordered list of all the characters and their frequencies. 我要做的第一件事是浏览文件并确定每个字符的频率,然后创建所有字符及其频率的有序列表。 However, I'm having trouble coming up with a good way of keeping a running total of frequencies and existing characters. 但是,我很难提出一种保持连续的频率和现有字符的好方法。 I thought a Hashtable might be a good idea, where each key is a character, and the value it maps to is its frequency. 我认为哈希表可能是一个好主意,其中每个键都是一个字符,而它映射到的值就是它的频率。

Is there a more efficient way to do this? 有没有更有效的方法可以做到这一点?

Thanks in advance! 提前致谢!

Here is my code for find the character frequency using a text file. 这是我的代码,用于使用文本文件查找字符频率。

import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.Locale;


public class Frequency2 {

    private static char[] myarry;
    private static int[] count = new int[26];
    private static double[] precntage=new double[26];
    private static double totCount=0;


    public static void main(String[] args) throws FileNotFoundException, IOException {

        String ss = new Frequency().readFile("E:/look.txt");//path of the text file
        //System.out.println(ss);
        ss = ss.toLowerCase(Locale.ENGLISH); // put all text to lowercase

        myarry = ss.toCharArray();  // text put a char array
        //count the characters
        for (int i = 0; i < myarry.length; i++) {
            if (myarry[i] == 'a') {
                count[0] = count[0] + 1;
            } else if (myarry[i] == 'b') {
                count[1] = count[1] + 1;
            } else if (myarry[i] == 'c') {
                count[2] = count[2] + 1;
            } else if (myarry[i] == 'd') {
                count[3] = count[3] + 1;
            } else if (myarry[i] == 'e') {
                count[4] = count[4] + 1;
            } else if (myarry[i] == 'f') {
                count[5] = count[5] + 1;
            } else if (myarry[i] == 'g') {
                count[6] = count[6] + 1;
            } else if (myarry[i] == 'h') {
                count[7] = count[7] + 1;
            } else if (myarry[i] == 'i') {
                count[8] = count[8] + 1;
            } else if (myarry[i] == 'j') {
                count[9] = count[9] + 1;
            } else if (myarry[i] == 'k') {
                count[10] = count[10] + 1;
            } else if (myarry[i] == 'l') {
                count[11] = count[11] + 1;
            } else if (myarry[i] == 'm') {
                count[12] = count[12] + 1;
            } else if (myarry[i] == 'n') {
                count[13] = count[13] + 1;
            } else if (myarry[i] == 'o') {
                count[14] = count[14] + 1;
            } else if (myarry[i] == 'p') {
                count[15] = count[15] + 1;
            } else if (myarry[i] == 'q') {
                count[16] = count[16] + 1;
            } else if (myarry[i] == 'r') {
                count[17] = count[17] + 1;
            } else if (myarry[i] == 's') {
                count[18] = count[18] + 1;
            } else if (myarry[i] == 't') {
                count[19] = count[19] + 1;
            } else if (myarry[i] == 'u') {
                count[20] = count[20] + 1;
            } else if (myarry[i] == 'v') {
                count[21] = count[21] + 1;
            } else if (myarry[i] == 'w') {
                count[22] = count[22] + 1;
            } else if (myarry[i] == 'x') {
                count[23] = count[23] + 1;
            } else if (myarry[i] == 'y') {
                count[24] = count[24] + 1;
            } else if (myarry[i] == 'z') {
                count[25] = count[25] + 1;
            }

        }



        for (int i = 0; i <count.length; i++) {
            totCount+=count[i];
        }

        System.out.println("tot "+ totCount);


       // calculate presentage
        for (int i = 0; i <count.length; i++) {
               precntage[i]=((count[i]/totCount)*100);
               precntage[i]=Math.round(precntage[i]);
        }




        char s1='A';
        System.out.println("Letter\tPrecentage\tFrequency");
        for (int i = 0; i < count.length; i++) {
            String gs=Character.toString(s1++);
            System.out.println(gs+"\t"+precntage[i]+"%"+"\t\t"+count[i]);
        }


    }

    String readFile(String fileName) throws IOException {
        BufferedReader buff = new BufferedReader(new FileReader(fileName));
        try {
            StringBuilder sb = new StringBuilder();
            String l = buff.readLine();

            while (l != null) {
                sb.append(l);
                sb.append("\n");
                l = buff.readLine();
            }
            return sb.toString();
        } finally {
            buff.close();
        }
    }

}

Hope this will help you. 希望这会帮助你。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM