Image storage: Database or File system?

From my experience, storing images in a database is a lot easier to manage than storing them in the file system. Here are a few of the benefits that I love the most:

  • Related information is automatically kept in sync.
  • It’s easier to backup your data when it’s all in one place.
  • It’s easier to maintain an independent development environment.
  • You can add additional file servers without having to deal with on-the-fly file replication.

Of course, the classic argument against storing images in a database is that it is slow. Retrieving images from a database takes longer than retrieving an image from the file system. At least, that’s what they say.

I decided to do a benchmark test for myself to see just how much of a difference it makes in performance. The results were surprising.

First, I created three different sized thumbnails of this picture and stored them in a directory on my file system. Then I took the same three images and stored them in a mySQL database. I then measured the total time for Firefox to display each image. I ran each benchmark 10 times for better accuracy.

Here are the results:

File system

  • Large – average of 7.87E-05 seconds
  • Medium – average of 7.77E-05 seconds
  • Small – average of 6.65E-05 seconds

Database

  • Large – average of 6.68E-05 seconds
  • Medium – average of 6.69E-05 seconds
  • Small – average of 5.90E-05 seconds

Wow! The images retrieved from the database were actually displayed faster than those retrieved from the file system!

Of course, we still haven’t proved this is scalable. I only had 3 rows in my table. What would happen if you have 120,000 images in your database? There’s one way to find out. Test it.

I inserted my picture into the database table 120,000 times (yes, that took a while). When I ran my benchmark test again, the average time was 6.07E-05 seconds! So much for being “slow”.

I’m not sure what would happen if my table contained over 120,000 images. For now I’m not too concerned – we’re still a few weeks away from reaching that milestone.

  • (Disclaimer, I usually store images in the database for low volume sites.)

    The reason for the conventional wisdom about "speed" for storing images as files isn't so much about the latency, but about concurrency. Good http servers will handoff serving a file to the kernel level, allowing them to merrily send hundreds of files at a time.

    But when you find you need the concurrency, put a caching layer in front of your DB.

  • Just for fun, you might want to play with using something like http://jakarta.apache.org/jmeter to test many concurrent connections.

  • But of course don't get distracted by learning how to use a tool and slack off from your project. 😛

  • joshfraz

    Daniel,

    Excellent point. Thanks for the link. I'll play with that when I finally have some free time. I think you're right that something like memcache would take care of the concurrency issues. I'll test it and see.

  • By the way, I just mentioned JMeter because it's cross platform and java. However, after playing with it last night – OUCH, it's painful to use.

  • joshfraz

    Daniel, do you know of any other tools that don't hurt so much?

  • Casey

    Absolutely do not store images in the DB if you plan to scale up!!! We have millions and millions of images and if they were in the DB there would be some serious problems. I know photobucket stores their images on NFS and basically everyone has pointers to the images in the FS in the DB with whatever metadata you need to keep track of them. I can tell you more about this in person sometime! We are still on for that hike– i think next weekend is looking good!

  • Josh Fraser

    Casey,

    I knew this would be a controversial post. I love it! It was actually a conversation with the PhotoBucket guys that prompted me to start looking into this more. At PhotoBucket they avoid going to the database at all costs. This makes sense for them because they are a photo sharing site. We're not. All we need to store are some small thumbnails. Our product would be more comparable to Facebook than PhotoBucket. Facebook makes extensive use of mySQL databases and have managed the scalability just fine. I'd love to talk to you more about it in person. A couple things to keep in mind that might make our situation slightly different from yours:

    – we are only storing thumbnails in the DB (average size of 40KB)

    – we are only planning on using this method until it makes sense to add a dedicated image server

    – since we frequently display the same images we can really speeds things up by using caching

  • KPS

    Interesting tests, but it doesn't really give you an acurate idea of what would happen in a 'real word' environment.

    Testing the speed of 1 'user' getting 1 image from a database and/or file system containing 3 images, or 1 'user' getting 1 images from a database/file system containing 120,000 images doesn't really make a difference, because it's still only 1 user makming 1 connection to retirve 1 image.

    Try testing 1,000 users getting 1,000 images from your database / file system, better still try 1,000 users getting the SAME image fro your database / file system. I think then you'll see slightly different results!

    And never use a DB to store your images if you plan on having thousands or millions of images! For each entry in your DB you could be multiplying the space used 10 fold by having images cluttering up the DB, and even though DBs can cope with terabytes of storage, there is the danger that one day you WILL discover the edge of the Universe!

    With file storage you simply slap another 10 Terabytes onto your stroage and off you go again! 😀

  • KPS,

    You're totally right. I really should write a follow-up post on this. The big caveat for me is memcache. It's okay to store profile pics in a DB, but don't do it unless you have a really good caching system up front.

    Josh

  • Harry

    Hello, can you explain on what the differences between file storage and database storage? as in how these two storage format are used….Im kind a confused with these two…Hope you would clear my doubt…
    Thank you in advance….

    • The biggest caveat I would add since writing this is that a good caching system (like memcache) is absolutely essential if you are going to be storing images in the database. If you have good caching, storing smaller images in the database is not a bad option. If you're going to serving up millions of images, you might want to go ahead and use the filesystem to avoid having to shard your database later on to keep it fast. The filesystem is also a better option for storing larger images.

      • Harry

        Thanks for your information. But I have another question, what if I'm storing large text document? file system or database system storage is better?

        • It's a series of trade-offs. How many files do you have? Do you need to be able to search them? Do you have meta-data that needs to be stored with them? My bias is usually towards sticking data in the database because it makes them easier to search, sort and connect meta-data. That said, if you don't need that ability or if you have more than a few thousand files, the filesystem might be the way to go.

          • Harry

            I'm glad that you clear my doubt, really thank you, thank you and thank you again…..you really a great guy…

          • Hendry

            Isn't it easier to search if you chuck the file in the filesystem? You can easily use indexing engine to crawl these files, whereas if you stored them in the database, it wouldnt be that easy

          • On second thoughts, search is pretty irrelevant to this discussion. In either case, you're probably searching the meta data for the image and not the actual image itself. Meta data is easier to search if it's stored in a database, but that's another topic.

  • Shawn

    I need display 1000 small images, in, and only in, a fixed sequence, i.e., image1, image2, image3…,

    It's easy for FS; it's fast for FS because of the sequential access instead of SQL quering.

    It's fast for DB if I have a "good cache at front of DB"; It is fast for DB because there're no many images.

    So… which one has a tiny advantage?

    Thanks in advance!