簡體   English   中英

比較讀入文件中的句子-Java

[英]Comparing Sentences From a Read-In File - Java

我需要讀取一個包含2個句子的文件,以進行比較並返回0到1之間的數字。如果句子完全相同,則應返回1表示true,如果它們完全相反,則應返回0表示false。 如果句子相似,但單詞變為同義詞或近似詞,則應返回.25 .5或.75。 文本文件的格式如下:

______________________________________
Text: Sample 

Text 1: It was a dark and stormy night. I was all alone sitting on a red chair. I was not completely alone as I had three cats.

Text 20: It was a murky and stormy night. I was all alone sitting on a crimson chair. I was not completely alone as I had three felines
// Should score high point but not 1

Text 21: It was a murky and tempestuous night. I was all alone sitting on a crimson cathedra. I was not completely alone as I had three felines
// Should score lower than text20

Text 22: I was all alone sitting on a crimson cathedra. I was not completely alone as I had three felines. It was a murky and tempestuous night.
// Should score lower than text21 but NOT 0

Text 24: It was a dark and stormy night. I was not alone. I was not sitting on a red chair. I had three cats.
// Should score a 0!
________________________________________________

我有一個文件閱讀器,但是我不確定存儲每一行​​的最佳方法,因此我可以對其進行比較。 現在,我正在讀取文件,然后將其打印在屏幕上。 存儲這些然后比較它們以獲得我想要的數字的最佳方法是什么?

import java.io.*;

public class implement 
{


    public static void main(String[] args)
    {
        try
        {
            FileInputStream fstream = new FileInputStream("textfile.txt");

            DataInputStream in = new  DataInputStream (fstream);
            BufferedReader br = new BufferedReader (new InputStreamReader(in));
            String strLine;

            while ((strLine = br.readLine()) != null)
            {
                System.out.println (strLine);
            }

            in.close();
        }

        catch (Exception e)
        {
            System.err.println("Error: " + e.getMessage());
        }

    }

}

將它們保存在數組列表中。

ArrayList list = new ArrayList();
//Read File
//While loop
list.add(strLine)

要檢查句子中的每個變量,只需刪除標點符號,然后以空格分隔並在要比較的句子中搜索每個單詞。 我建議忽略單詞或2或3個字符。 這取決於你的題外話

然后將這些行保存到數組中,並根據需要進行比較。 要比較相似的單詞,您將需要一個數據庫來有效地檢查單詞。 又稱為哈希表。 一旦有了這個,就可以在數據庫中快速搜索單詞。 接下來,此哈希工作表將需要一個與每個單詞相關聯的同義詞庫。 然后為每個句子中的關鍵詞取相似的詞,然后在要比較的句子中搜索這些詞。 顯然,在搜索相似的單詞之前,您需要比較兩個實際的句子。 最后,您將需要一個高級數據結構,除了直接比較之外,您還必須自己做更多的事情。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM