简体   繁体   中英

Read and Store text file with string and float data into a hashmap

I have a text file and each line in the file starts with a word followed by 50 floating point numbers that represent the word's vector description (the embedding). I am trying to read the file and store each word and its embedding in a hash table. The problem that I am facing is that I get a Number Format Exception or sometimes an array out of bounds exception. How can I read and store each word and its embedding in a hash map?

sNode class:

public class sNode{ // Node class for hash map
public String word; 
public float[] embedding; 
public sNode next;

public sNode(String S, float[] E, sNode N){ // Constructor
    word = S; 
    embedding = new float[50];
    for (int i=0;i<50;i++) 
        embedding[i] = E[i]; next = N; 
}

hashTableStrings class:

public class hashTableStrings{ 
private static sNode [] H;
private int TABLE_SIZE;
private int size; 

public hashTableStrings(int n){ // Initialize all lists to null H = new sNode[n]; for(int i=0;i<n;i++) H[i] = null; }
    size = 0;
    TABLE_SIZE = n;
    H = new sNode[TABLE_SIZE]; 
    for(int i=0;i<TABLE_SIZE;i++) 
        H[i] = null;
}

public int getSize(){ // Function to get number of key-value pairs
    return size;
}


public static void main (String [] args) throws IOException{
    Scanner scanner = new Scanner(new FileReader("glove.6B.50d.txt"));

    HashMap<String, Float> table = new HashMap<String, Float>();

    while (scanner.hasNextLine()) {
        String[] words = scanner.nextLine().split("\t\t"); // split space between word and float number embedding
        for (int i=0; i<50;i++){
            table.put(words[0], Float.parseFloat(words[i]));
        }
    }

    System.out.println(table);

}

Txt File Sample: 在此处输入图片说明

The file can be found in the following link: https://nlp.stanford.edu/projects/glove/

Download the file

glove.6B.zip

and open the

glove.6B.50d.txt

text file.

The Reason you are getting "Array out of Bound" exception is because you are splitting the strings by "\\t\\t" double tab space. whereas there are only single space. Due to this, each line is not being divided into multiple words but as 1 whole string and and you are getting only 1 length array.

 String[] words = scanner.nextLine().split("\t\t");
// words.length will return 1, since it contains only single String( Whole line).

Replacing split("\\t\\t") with split(" ") , should fix the problem.By the way, there are total of 51 words in each line(if you include Starting words in each line). So you should i < 51 not i <50 .

  for(i = 1; i < 51; i++){
     // Do your work...
    }

  //  i is starting from 1st index because at 0th index, the starting word will be placed and the floating points starts from 1st index.

However,as @Satish Thulva has point out,there is still some problem with your code.The way your are doing with HashMap, the key(word) will have only last floating value( not the whole floating value in the line) as it value. For ex,

truecar.com  -0.23163  0.39098  -0.7428  1.5123  -1.2368  -0.89173  -0.051826  -1.1305  0.96384  -0.12672  -0.8412  -0.76053  0.10582  -0.23173  0.11274  0.26327  0.053071  0.66657  0.9423  -0.78162  1.6225  0.097435  -0.67124  0.46235  0.3226  1.3423  0.87102  0.2217  -0.068228  0.73468  -1.0692  -0.85722  -0.49683  -1.4468  -1.1979  -0.49506  -0.36319  0.53553  -0.046529  1.5829  -0.1326  -0.55717  -0.17242  0.99214  0.73551  -0.51421  0.29743  0.19933  0.87613  0.63135

In your case, the result will be

 Key: truecar.com  value: 0.63135

To store all the floating values as Value for key , use HashMap<String, Float[]>

String[] words = scanner.nextLine().split(" "); // split space between word and float number embedding

        //An Array of Float which will keep values for words.
        Float values[] = new Float[ words.length-1 ];    //  because we are not going to store word as its value.
        for( int i=1; i< words.length; i++){
            values[i-1] = Float.parseFloat(words[i]) ; }

        // Now all the values are stored in array.
        // Now store it in the Map.
        table.put(words[0], values);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM