Background Processing for Image Uploads with Rails on Heroku

Today I pushed my first significant contribution at my new job into production! We had a little bug/inefficiency in our app involving processing for image uploads, and my task was to fix it!

Basically, in this one section of our app, we were handling image uploads with carrierwave and processing/caching/storing all on the same server that handles every other client/api request. As you can imagine, having the server stop what it’s doing to handle an image doesn’t make for a great user experience, especially if it’s a huge image that is taking quite a while to process.

Luckily for me, this problem had already been solved before in a different section of the app! A while back before I joined the company there was a similar but more severe problem with image uploading crashing the server, so another contributor set up a working solution to solve that issue. So, in order to complete my task at hand, I dug way back in the commit history to see what changed in order to set up the original solution…

The first file I examined when I identified the commits behind the solution was the Gemfile. I found that the following gems were added:

gem 'carrierwave_backgrounder'
gem 'redis-rails'
gem 'resque'
gem 'unicorn'

Okay, so at least something looks familiar here. Thanks to my Ruby Project Week project For-Closure, I already had some experience with carrierwave, so I took to the google to find out how carrierwave_backgrounder worked along with it. Thanks to carrierwave_backrounder’s great documentation it was pretty easy to figure it out. I especially appreciated their opening statement:

“I like CarrierWave. That being said, I don’t like tying up app instances waiting for images to process.

This gem addresses that by offloading processing or storage/processing to a background task. We currently support Delayed Job, Resque, Sidekiq, SuckerPunch, Girl Friday, Qu, and Queue Classic.” – carrierwave_backgrounder docs

Sounds like exactly what I need! And hey, would you look at that, I see there’s another one of those mysterious new gems right in the supported background task list! I guess Resque is a background task of some sort!

After skimming the docs for carrierwave_backgrounder (we’ll get back to how to set it up later), I went back to the google to look up Resque. Upon finding the docs, my first major discovery was that Resque is pronounced like “rescue” …I had been thinking to myself that it was “resk,” lol.

My second major discovery was that I had no idea how to use Resque. Unfortunately the docs were talking all about jobs and queues and workers, none of which I had any experience with in Rails or any framework, period. I knew what a queue was, but my only experience with them was implementing a queue data structure algorithm in JavaScript. I had no idea how to actually use one in a web app, let alone in production. I had hit a wall.

Luckily for me, again, this problem had already been solved once before! I knew that Resque was already in use in production for certain areas of our app, so all I figured all I had to do to figure out how to use it was to look around in the codebase for all the mentions of resque. Project-wide search to the resque! (rescue, haha, so clever)

I found Resque in use in the following places:

  • Inside the controller methods that receive uploaded images:
Resque.enqueue(ProcessImages, @item.id)
  • In the carrierwave_backgrounder initializer:
  CarrierWave::Backgrounder.configure do |c|
     c.backend :resque, queue: :carrierwave
  end
  • In the Resque initializer (/config/initializers/resque.rb):
rails_root = ENV['RAILS_ROOT'] || File.dirname(__FILE__) + '/../..'
rails_env = ENV['RAILS_ENV'] || 'development'
Resque.redis = ENV['REDISTOGO_URL'] unless rails_env == 'development'

Plus a couple other places (like in routes.rb to set up admin/resque control panel access), but these three areas are all that were necessary to get my background process queue up and running.

From these three areas, I learned three things. A: Resque needs to be initialized with a connection to redis. B: Carrierwave_backgrounder needs to be configured to use resque. C: Resque.enque() is the method that actually puts something into the Resque queue. Great. So what is redis?

I won’t go too deep into this, but redis is a lightweight NoSQL database kinda similar to MongoDB. However, redis has no collections or anything so it’s only key-value pair based and apparently faster than MongoDB. Anyways, right here is where I hit another wall because as you can see, it turned out that our app was set up to only connect to redis while in production… This is a topic for another day, but I spent a good 3-4 hours in a rabbit hole figuring out how to get redis and Resque to play nicely on localhost. I eventually got it to work, but hot dayum it was a pain. The whole redis thing was set up super easily on Heroku with the Redis To Go addon, so I probably could have just replicated the previous solution in the new area and pushed to production and have it work, but that would mean I’d be pushing a potentially non-functional fix to production without even testing it first. I had to get it working locally first so I could be sure I wouldn’t break anything. (A staging environment would have been nice in this situation… I’m still in the process of getting that set up correctly though)

Okay, back to Resque. I know what redis is and how to set it up. I should probably go and finish setting up carrierwave_backgrounder, seeing as I’ve already configured it! The following steps are from carrierwave_backgrounder’s awesome documentation.

  • First, include carrierwave_backgrounder in the existing carrierwave uploader class
class ImageUploader < CarrierWave::Uploader::Base
  include ::CarrierWave::Backgrounder::Delay
  #etc...
end
  • Next, add the process_in_background method to the image model after the uploader attachment:
mount_uploader :image, ImageUploader
process_in_background :image

And… that’s it! Carrierwave and Carrierwave_backgrounder are both super easy to use.

Now that our carrierwave_backgrounder is set up, there’s still one question remaining. What is that ProcessImages object that’s getting passed in to Resque.enqueue()? Well, turns out ProcessImages is a Resque “job.” After digging around in the documentation and on stackoverflow, I’d say in my own words the definition of a “job” is just an object that gets put in a queue and has a callback that gets executed when dequeued. Behold, the definition of the ProcessImages job class:

class ProcessImages
  @queue = :process_images

  def self.perform(item_id)
    item = Item.find_by(id: item_id)
    item.item_images.each do |item_image|
      image = item_image.image
      image.cache!
      image.store!
    end
  end
end

According to the Resque docs, a job class must have a queue and a perform method. The queue name doesn’t matter since queues are created on the fly. The .perform() method is the callback that will get executed when the job is dequeued. In this case, the ProcessImages job takes an item_id, finds the item, and then caches and stores each image associated with that item. Background processing jobs aren’t too scary after all.

At this point I was able to replicate everything I needed in order to get my new background processing job up and running along with the existing process. For local testing, however, I was still missing an important part. This was where my excursion down the long rabbit hole of redis and resque began… long story short, I still needed a worker to watch the queue and process jobs. I had no idea what this even meant, so it was a frustrating process to find answers to my questions on stackoverflow. In the process I learned how to set up a local redis server, and a local worker. I had three tabs going in my vagrant box: One for the rails server, one for the redis server, and one for the resque worker. I still don’t really understand how rake works, but the command that I eventually discovered that did the trick was this:

QUEUE=* rake environment resque:work

Yeah, it’s still pretty magical to me too. I’ll have to come back to this later and explain how it really works once I actually know… I also don’t even know what unicorn is! I didn’t need it for running locally, and it was already set up in production, so I didn’t touch it! Maybe I’ll talk more about unicorn eventually too.

Anyways, in case it’s hard to tell from this post, I’m really enjoying my new job. I’m learning a lot, and quickly. Plus I’m getting paid!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s