讀取並存儲帶有字符串的文本文件，並將數據浮動到哈希圖中

Question

我有一個文本文件，文件中的每一行以一個單詞開頭，后跟50個浮點數，這些浮點數代表單詞的向量描述（嵌入）。 我正在嘗試讀取文件並將每個單詞及其嵌入存儲在哈希表中。 我面臨的問題是我遇到了數字格式異常或有時數組超出范圍的異常。 如何讀取和存儲每個單詞及其在哈希圖中的嵌入？

sNode類：

public class sNode{ // Node class for hash map
public String word; 
public float[] embedding; 
public sNode next;

public sNode(String S, float[] E, sNode N){ // Constructor
    word = S; 
    embedding = new float[50];
    for (int i=0;i<50;i++) 
        embedding[i] = E[i]; next = N; 
}

hashTableStrings類：

public class hashTableStrings{ 
private static sNode [] H;
private int TABLE_SIZE;
private int size; 

public hashTableStrings(int n){ // Initialize all lists to null H = new sNode[n]; for(int i=0;i<n;i++) H[i] = null; }
    size = 0;
    TABLE_SIZE = n;
    H = new sNode[TABLE_SIZE]; 
    for(int i=0;i<TABLE_SIZE;i++) 
        H[i] = null;
}

public int getSize(){ // Function to get number of key-value pairs
    return size;
}


public static void main (String [] args) throws IOException{
    Scanner scanner = new Scanner(new FileReader("glove.6B.50d.txt"));

    HashMap<String, Float> table = new HashMap<String, Float>();

    while (scanner.hasNextLine()) {
        String[] words = scanner.nextLine().split("\t\t"); // split space between word and float number embedding
        for (int i=0; i<50;i++){
            table.put(words[0], Float.parseFloat(words[i]));
        }
    }

    System.out.println(table);

}

Txt文件樣本：

可以在以下鏈接中找到該文件： https : //nlp.stanford.edu/projects/glove/

下載檔案

手套.6B.zip

然后打開

手套.6B.50d.txt

文本文件。

Answer 1

之所以會出現“數組超出范圍”異常，是因為您將字符串分割為“ \\ t \\ t”雙標簽空間。 而只有一個空間。 因此，每一行不會被分成多個單詞，而是分成1個完整的字符串，並且您只會得到1個長度數組。

 String[] words = scanner.nextLine().split("\t\t");
// words.length will return 1, since it contains only single String( Whole line).

更換split("\\t\\t")與split(" ")應該可以解決problem.By的方式，總共有51個字中的每一行（如果您在每行開始的話）。 所以你應該i < 51 not i <50 。

  for(i = 1; i < 51; i++){
     // Do your work...
    }

  //  i is starting from 1st index because at 0th index, the starting word will be placed and the floating points starts from 1st index.

但是，正如@Satish Thulva指出的那樣，您的代碼仍然存在一些問題。您對HashMap的處理方式是，鍵（單詞）只有最后一個浮點值（而不是行中的整個浮點值）值。 例如，

truecar.com  -0.23163  0.39098  -0.7428  1.5123  -1.2368  -0.89173  -0.051826  -1.1305  0.96384  -0.12672  -0.8412  -0.76053  0.10582  -0.23173  0.11274  0.26327  0.053071  0.66657  0.9423  -0.78162  1.6225  0.097435  -0.67124  0.46235  0.3226  1.3423  0.87102  0.2217  -0.068228  0.73468  -1.0692  -0.85722  -0.49683  -1.4468  -1.1979  -0.49506  -0.36319  0.53553  -0.046529  1.5829  -0.1326  -0.55717  -0.17242  0.99214  0.73551  -0.51421  0.29743  0.19933  0.87613  0.63135

就您而言，結果將是

 Key: truecar.com  value: 0.63135

要將所有浮動值存儲為key Value ，請使用HashMap<String, Float[]>

String[] words = scanner.nextLine().split(" "); // split space between word and float number embedding

        //An Array of Float which will keep values for words.
        Float values[] = new Float[ words.length-1 ];    //  because we are not going to store word as its value.
        for( int i=1; i< words.length; i++){
            values[i-1] = Float.parseFloat(words[i]) ; }

        // Now all the values are stored in array.
        // Now store it in the Map.
        table.put(words[0], values);

讀取並存儲帶有字符串的文本文件，並將數據浮動到哈希圖中

問題描述

1 個解決方案

解決方案1
0 已采納 2017-10-27 05:44:26

讀取並存儲帶有字符串的文本文件，並將數據浮動到哈希圖中

問題描述

1 個解決方案

解決方案1 0 已采納 2017-10-27 05:44:26

解決方案1
0 已采納 2017-10-27 05:44:26