简体   繁体   中英

Should I use MySQL blob field type?

I am struggling to decide if I should be using the MySQL blob field type in an upcoming project I have.

My basic requirements are, there will be certain database records that can be viewed and have multiple files uploaded and "attached" to those records. Seeing said records can be limited to certain people on a case by case basis. Any type of file can be uploaded with virtually no restriction.

So looking at it one way, if I go the MySQL route, I don't have to worry about virus's creeping up or random php files getting uploaded and somehow executed. I also have a much easier path for permissioning and keeping data tied close to a record.

The other obvious route is storing the data in a specific folder structure outside of the webroot. in this case I'd have to come up with a special naming convention for folders/files to keep track of what they reference inside the database.

Is there a performance hit with using MySQL blob field type? I'm concerned about choosing a solution that will hinder future growth of the website as well as choosing a solution that wont be easy to maintain.

Is there a performance hit with using MySQL blob field type?

Not inherently, but if you have big BLOBs clogging up your tables and memory cache that will certainly result in a performance hit.

The other obvious route is storing the data in a specific folder structure outside of the webroot. in this case I'd have to come up with a special naming convention for folders/files to keep track of what they reference inside the database.

Yes, this is a common approach. You'd usually do something like have folders named after each table they're associated with, containing filenames based only on the primary key (ideally a integer; certainly never anything user-submitted).

Is this a better idea? It depends. There are deployment-simplicity advantages to having only a single data store, and not having to worry about giving the web user write access to anything. Also if there might be multiple copies of the app running (eg active-active load balancing) then you need to synchronise the storage, which is much easier with a database than it is with a filesystem.

If you do use the filesystem rather than a blob, the question is then, do you get the web server to serve it by pointing an Alias at the folder?

  • + is super fast
  • + caches well
  • - extra server config: virtual directory; needs appropriate file extension to return desired Content-Type
  • - extra server config: need to add Content-Disposition: attachment / X-Content-Type-Options headers to stop IE sniffing for HTML as part of anti-XSS measures

or do you serve the file manually by having a server-side script spit it out, as you would have to serving from a MySQL blob?

  • - is potentially slow
  • - needs a fair bit of manual If-Modified-Since and ETag handling to cache properly
  • + can use application's own access control methods
  • + easy to add correct Content-Type and Content-Disposition headers from the serving script

This is a trade-off there's not one globally-accepted answer for.

If your web server will be serving these uploaded files over the web, the performance will almost certainly be better if they are stored on the filesystem. The web server will then be able to apply HTTP caching hints such as Last-Modified and ETag which will help performance for users accessing the same file multiple times. Additionally, the web server will automatically set the correct Content-Type for the file when serving. If you store blobs in the database, you'll end up implementing the above mentioned features and more when you should be getting them for free from your web server.

Additionally, pulling large blob data out of your database may end up being a performance bottleneck on your database. Also, your database backups will probabaly be slower because they'll be backing up more data. If you're doing ad-hoc queries during development, it'll be inconvenient seeing large blobs in result sets for select statements. If you want to simply inspect an uploaded file, it will be inconvenient and roundabout to do so because it'll be awkwardly stored in a database column.

I would stick with the common practice of storing the files on the filesystem and the path to the file in the database.

In my experience storing a BLOB in MySQL is OK, as long you store only the blob in one table, while other fields are in another (joined) table. Conversely, searching in the fields of a table with a few standard fields and one blob field with 100 MB of data can slow queries dramatically.

I had to change the data layer of a mailing app for this issue where emails were stored with content in the same table as date sent, email addresses, etc. It was taking 9 secs to search 10000 emails. Now it takes what it should take ;-)

Data should be stored in one consistent place: the database. This performance and Content-Type thing is not an issue at all, because there is nothing stopping you from caching those BLOB fields to the local web server and serving it from there as it is requested for the first time. You do not need to access that table on every page view.

This file system cache can be emptied out at any moment, which will only impact performance temporarily as it is being refilled automagically. It will also enable you to use one database and many web servers as your application grows, they will simply all have a local cache on the file system.

Large volumes of data will eventually take their toll on performance. MS SQL 2008 has a specialized way of storing binary data in the file system:

http://msdn.microsoft.com/en-us/library/cc949109.aspx

I would employ the similar approach too for your project too.

You can create a FILES table that will keep information about files such as original names for example. To safely store files on the disk rename them using for example GUIDs. Store new file names in your FILES table and when user needs to download it you can easily locate it on disk and stream it to user.

Many people recommend against storing file attachments (usually this applies to images) in blobs in the database. Instead they prefer to store a pathname as a string in the database, and store the file somewhere safe on the filesystem. There are some merits to this:

  • Database and database backups are smaller.
  • It's easier to edit files on the filesystem if you need to work with them ad hoc.
  • Filesystems are good at storing files. Databases are good at storing tuples. Let each one do what it's good at.

There are counter-arguments too, that support putting attachments in a blob:

  • Deleting a row in a database automatically deletes the associated attachment.
  • Rollback and transaction isolation work as expected when data is in a row, but not when some part of the data is on the filesystem.
  • Backups are simpler if all data is in the database. No need to worry about making consistent backups of data that's changing concurrently during the backup procedure.

So the best solution depends on how you're going to be using the data in your application. There's no one-size-fits-all answer.

I know you tagged your question with MySQL, but if folks reading this question use other brands of RDBMS, they might want to look into BFILE when using Oracle, or FILESTREAM when using Microsoft SQL Server 2008. These give you the ability store files outside the database but access them like they're part of a row in a database table (more or less).

In my opinion storing files in database is bad idea. What you can store there is id, name, type, possibly md5 hash of file, and date inserted. Files can be uploaded in to folder outside public location. Also you should be concern that it is not advised to keep more than 1000 files in one folder. So what you have to create new folder each time file id is increased by 1000.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM