How to read and extract data from .vec file in python

Question

How to read and extract data from .vec file in python?

f = open("test.vec","r") # opens file with name of "test.txt"
print(f.read())
f.close()

But I cant extract the information. I want that the data will be stored in individual arrays in the test.vec file.

Answer 1

I think you can get some inspiration from this project here . The important part for you starts at line 131 , ie,

...
with open(f, 'rb') as vecfile:  
    content = ''.join(str(line) for line in vecfile.readlines())
    val = struct.unpack('<iihh', content[:12])
...

Answer 2

This is my dataset: https://www.kaggle.com/datasets/yekenot/fasttext-crawl-300d-2m

It is Common Crawl 4.2 GB vec file.

Since the file is too big to display in IDE. I read it line by line & export to CSV (17 MB)

def load_vectors(fname):
   fin = io.open(fname, 'r', encoding='utf-8', newline='\n', errors='ignore')
   x = fin.readline().split()
   all = []
   while x:
     all.append(x[0])
     x = fin.readline().split()
   df = pd.DataFrame(all)
   df.to_csv('.../output/ft.csv',index=False)

Call the function here:

FASTTEXT_DATASET_PATH = '/content/drive/MyDrive/Colab Notebooks/pretrained/crawl-300d-2M.vec'
load_vectors(FASTTEXT_DATASET_PATH)

The dimension of x is (1999995, 300)

Here I print the first line: [',', '-0.0282', '-0.0557', ... '-0.0042']

In my case, I just want to export the first element of every list. So I append x[0] to a list named 'all'. Then I convert it to dataframe & export to csv file.

For those who interested to view how FastText pretrained dataset look like, I've uploaded it to Kaggle . The details of dataset: crawl-300d-2M.vec.zip: 2 million word vectors trained on Common Crawl (600B tokens) - Cased

Answer 3

with open("file.txt", "r") as ins:
    array = []
    for line in ins:
        array.append(line)

Try this one. This is kind of complicated a bit. Otherwise try this simple one.

with open('filename') as f:
    lines = f.readlines()

How to read and extract data from .vec file in python

Question

3 answers

solution1
0 2016-03-08 09:33:36

solution2
0 2022-08-08 10:41:21

solution3
-1 2016-03-08 09:30:14

How to read and extract data from .vec file in python

Question

3 answers

solution1 0 2016-03-08 09:33:36

solution2 0 2022-08-08 10:41:21

solution3 -1 2016-03-08 09:30:14

solution1
0 2016-03-08 09:33:36

solution2
0 2022-08-08 10:41:21

solution3
-1 2016-03-08 09:30:14