简体   繁体   中英

Python matching data in LARGE txt file?

I am using a text file to store values so I don't run out of memory. I am also reading the text file line by line and need a way to match if a new value is already in the text file. The dilemma is that I can't load the whole file in python without a memory error. The only option is to open a context manager and iterate line by line and try to match the values.

with open('C:/*.txt', 'r') as file:
     for line in file:
          if line == new_data:
               return True
     return False

Is this the best method or is there a more elegant way? I know of SQLite but not much. Would it be better to do that or keep the text file?

Side questions:

What method do large databases from actually companies use? I know they are using batch processing or chunking; Are they doing the same thing as me through those chunks? And furthermore, is there a way to chunk and search multiple chunks at a time?

Have you tried pandas?

my_giant_file = pd.read_csv(filePath,chunksize=50000,low_memory=False,header=0)

As @superstew suggested, use SQLite. Managing data via text files is rarely the right answer, and SQLite is very easy to use. Someday you might find a reason to move on to a more full-featured DBMS like MySQL or Postgres, but SQLite will work very well for your use case.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM