Python matching data in LARGE txt file?

Question

I am using a text file to store values so I don't run out of memory. I am also reading the text file line by line and need a way to match if a new value is already in the text file. The dilemma is that I can't load the whole file in python without a memory error. The only option is to open a context manager and iterate line by line and try to match the values.

with open('C:/*.txt', 'r') as file:
     for line in file:
          if line == new_data:
               return True
     return False

Is this the best method or is there a more elegant way? I know of SQLite but not much. Would it be better to do that or keep the text file?

Side questions:

What method do large databases from actually companies use? I know they are using batch processing or chunking; Are they doing the same thing as me through those chunks? And furthermore, is there a way to chunk and search multiple chunks at a time?

Answer 1

Have you tried pandas?

my_giant_file = pd.read_csv(filePath,chunksize=50000,low_memory=False,header=0)

Answer 2

As @superstew suggested, use SQLite. Managing data via text files is rarely the right answer, and SQLite is very easy to use. Someday you might find a reason to move on to a more full-featured DBMS like MySQL or Postgres, but SQLite will work very well for your use case.

Python matching data in LARGE txt file?

Question

2 answers

solution1
0 2020-06-15 17:24:27

solution2
0 2020-06-15 17:31:44

Python matching data in LARGE txt file?

Question

2 answers

solution1 0 2020-06-15 17:24:27

solution2 0 2020-06-15 17:31:44

solution1
0 2020-06-15 17:24:27

solution2
0 2020-06-15 17:31:44