简体   繁体   中英

How to code PHP search engine for searching through multiple sqlite databases

I'm planning to code a search engine in PHP that will allow my company to search for text contained in multiple projects, each project contained in a unique sqlite database file.

Since at some point there will be well over 100 projects (over 100 sqlite databases), I was wondering which (if any) of the following would be a smarter programming choice:

  1. During the initial page load, loop over all databases and read in all content from each database, inserting all the content we want searchable into a PHP object/array/whatever. Then the user types in a search phrase and we just search through the PHP object?
  2. Upon the user clicking "search" each database is searched one at a time to locate the content they're interested in.

I really don't know how long it will take to do either option, or which is better practice. Most of the database files are <1MB.

Thanks a lot!

I haven't done anything like it, but in your case I would probably create one database which would include the contents of other databases, if the data is quite dynamic then this option wouldn't work unless you run the script and copy the data with a cronjob like every midnight or weekly or whatever. Besides if the databases are alike - by that I mean they have a similar structure (which would make sense because of the search) then I would probably use my suggestion above. It's hard to tell without knowing how complex the databases are.

First of all: don't forget that unless you're developing some graphical application in eg PHP-GTK, PHP pageloads are stateless. That means that if you chose option 1 you'd need to cache the data somewhere (for example a different DB). I wouldn' keep it in memory anyway.

Also, it depends on what kind of indexes you have set up. 100 text searches can be really fast if the databases have fulltext indexes.

So, looping through the files is an option. There might be some overhead due to having to open 100 different SQLite files. Also you shouldn't forget to close every file after you're done with it to decrease memory usage. You'll need to make sure all the SQlite DB's are properly indexed.

Another possibility is to create a local DB with all the searchable data and extra metadata relating to which sqlite file the original data can be found, and the last timestamp they were checked. Then on every request you can check the last modification timestamps of the sqlite files and copy over any new data in modified SQlite DB's to your local DB, update the timestamp, and search in your local DB. The performance in this case will depend an how often the SQlite files are updated and how mutch data has to be synced, but I believe that in your case it will suffice.

I would definitely not read in all of the content and then search that through php, that is pretty inefficient. It would be much more efficient to create some really effective queries and then run them on all of the databases once you have a user's query. If you could manage to notify the user of the status of your searches that would be pretty helpful. ie provide which db you are currently searching and how many remain

Create an index of all the databases and update it regularly. As that's read-only it shouldn't be a big deal.

A simple word index can just be

word[ [document,occurrences], [document,occurrences] ... ]

as in, the word "foo" appears in document 1, 3 times, document 4, 5 times.

foo[ [1,3] , [4,5] ]

That won't let you do exact phrase searching but it's simple and fast.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM