简体   繁体   中英

Why do I only get one result of TF-IDF?

// Calculating term frequency
    System.out.println("Please enter the required word  :");
    Scanner scan = new Scanner(System.in);
    String word = scan.nextLine();

    String[] array = word.split(" ");
    int filename = 11;
    String[] fileName = new String[filename];
    int a = 0;
    int totalCount = 0;
    int wordCount = 0;


    for (a = 0; a < filename; a++) {

        try {
            System.out.println("The word inputted is " + word);
            File file = new File(
                    "C:\\Users\\user\\fypworkspace\\TextRenderer\\abc" + a
                            + ".txt");
            System.out.println(" _________________");

            System.out.print("| File = abc" + a + ".txt | \t\t \n");

            for (int i = 0; i < array.length; i++) {

                totalCount = 0;
                wordCount = 0;

                Scanner s = new Scanner(file);
                {
                    while (s.hasNext()) {
                        totalCount++;
                        if (s.next().equals(array[i]))
                            wordCount++;

                    }

                    System.out.print(array[i] + " ---> Word count =  "
                            + "\t\t " + "|" + wordCount + "|");
                    System.out.print("  Total count = " + "\t\t " + "|"
                            + totalCount + "|");
                    System.out.printf("  Term Frequency =  | %8.4f |",
                            (double) wordCount / totalCount);

                    System.out.println("\t ");

                }
            }
        } catch (FileNotFoundException e) {
            System.out.println("File is not found");

        }

    }

System.out.println("Please enter the required word  :");
    Scanner scan2 = new Scanner(System.in);
    String word2 = scan2.nextLine();
    String[] array2 = word2.split(" ");
    int numofDoc;

    for (int b = 0; b < array2.length; b++) {

        numofDoc = 0;

        for (int i = 0; i < filename; i++) {

            try {

                BufferedReader in = new BufferedReader(new FileReader(
                        "C:\\Users\\user\\fypworkspace\\TextRenderer\\abc"
                                + i + ".txt"));

                int matchedWord = 0;

                Scanner s2 = new Scanner(in);

                {

                    while (s2.hasNext()) {
                        if (s2.next().equals(array2[b]))
                            matchedWord++;
                    }

                }
                if (matchedWord > 0)
                    numofDoc++;

            } catch (IOException e) {
                System.out.println("File not found.");
            }

        }
        System.out.println(array2[b]
                + " --> This number of files that contain the term  "
                + numofDoc);
        double inverseTF = Math.log10((float) numDoc / numofDoc);
        System.out.println(array2[b] + " --> IDF " +  inverseTF );
        double TFIDF = (((double) wordCount / totalCount) * inverseTF );
        System.out.println(array2[b] + " --> TFIDF " + TFIDF);
    }
}

Hi, this is my code for calculating term frequency and TF-IDF. The first code calculates the term frequency for each file of a given string. The second code is supposed to calculate TF-IDF for each file using the value from the above. But I only received one value. It's supposed to provide TF-IDF value for each document.

Example output for term frequency :

The word input is 'is'


| File = abc0.txt |
is ---> Word count = |2| Total count = |150| Term Frequency = | 0.0133 |

The word inputted is 'is'


| File = abc1.txt |
is ---> Word count = |0| Total count = |9| Term Frequency = | 0.0000 |

The TF-IDF

is --> This number of files that contain the term 7

is --> IDF 0.1962946357308887

is --> TFIDF 0.0028607962606519654 <<< I suppose to get one value per file, means that i have 10 files, it suppose to give me 10 different values for each different file. But, it only prints one result only. Can someone point my mistake?

The println statement you suppose to be repeated per file is

double TFIDF = (((double) wordCount / totalCount) * inverseTF );
System.out.println(array2[b] + " --> TFIDF " + TFIDF);

but it is contained in the single loop

for (int b = 0; b < array2.length; b++)

only. If you want to print this line per file you have to surround this statement by another loop over all files.

Since this is homework I won't include the final code, but give you another hint: you also included the variables wordCount and totalCount in the calculation of TFIDF. But these are unique to each filename/word pair. Therefore you need to save it not only once, but per filename/word or recaclulate them again in your final loop.

The part that prints the TDIDF needs to be moved inside the for loop that loops over all the files.

ie:

    System.out.println(array2[b]
            + " --> This number of files that contain the term  "
            + numofDoc);
    double inverseTF = Math.log10((float) numDoc / numofDoc);
    System.out.println(array2[b] + " --> IDF " +  inverseTF );
    double TFIDF = (((double) wordCount / totalCount) * inverseTF );
    System.out.println(array2[b] + " --> TFIDF " + TFIDF);
}

} }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM