I am using a text file to store values so I don't run out of memory. I am also reading the text file line by line and need a way to match if a new value is already in the text file. The dilemma is that I can't load the whole file in python without a memory error. The only option is to open a context manager and iterate line by line and try to match the values.
with open('C:/*.txt', 'r') as file:
for line in file:
if line == new_data:
return True
return False
Is this the best method or is there a more elegant way? I know of SQLite but not much. Would it be better to do that or keep the text file?
Side questions:
What method do large databases from actually companies use? I know they are using batch processing or chunking; Are they doing the same thing as me through those chunks? And furthermore, is there a way to chunk and search multiple chunks at a time?
Have you tried pandas?
my_giant_file = pd.read_csv(filePath,chunksize=50000,low_memory=False,header=0)
As @superstew suggested, use SQLite. Managing data via text files is rarely the right answer, and SQLite is very easy to use. Someday you might find a reason to move on to a more full-featured DBMS like MySQL or Postgres, but SQLite will work very well for your use case.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.