What is a thread in Ruby?
Threads make your Ruby programs do multiple things at the same time.
Things like:
- Reading multiple files
- Handling multiple web request
- Making multiple API connections
As a result of using threads, you’ll have a multi-threaded Ruby program, which is able to get things done faster.
But one warning…
In MRI (Matz’s Ruby Interpreter), the default way to run Ruby applications, you will only benefit from threads when running i/o bound applications, in other words, applications that interact with a network or a local file.
This limitation exists because of the GIL (Global Interpreter Lock).
Note that this isn’t a limitation of the language itself, but of the interpreter which actually reads, parses & executes your code. Ruby was designed many years ago (1995) when multi-threaded & multi-core CPUs weren’t the norm so now we’re stuck with that because the interpreter is only able to run one “real” (hardware-based) thread at any given time.
Alternative Ruby interpreters like JRuby or Rubinius take full advantage of multi-threading.
Rubinius is deprecated since 2020 but JRuby is very much under active development & used in big production apps.
So, what are threads?
Threads are workers, or units of execution.
Every process has at least one thread & you can create more on demand.
I know you want to see a code example.
But first, we need to talk about the difference between CPU-bound & I/O bound applications.
I/O Bound Applications
An i/o bound app is one that needs to wait for an external resource:
- an API request
- database (query results)
- a disk read
A thread can decide to stop while it waits for a resource to be available. This means that another thread can run and do its thing and not waste time waiting.
One example of an i/o bound app is a web crawler.
For every request, the crawler has to wait for the server to respond, and it can’t do anything while waiting.
But if you are using threads…
You could make 4 requests at a time, let each request wait for the response & handle the responses as they come back, this will let you fetch pages faster than if you made these same request without threads.
Now it’s time for your code example.
Creating Ruby Threads
You can create a new Ruby thread by calling Thread.new
.
Make sure to pass in a block with the code this thread needs to be running.
Thread.new { puts "hello from thread" }
Pretty easy, right?
However.
If you have the following code you will notice that there is no output from the thread:
t = Thread.new { puts 10**10 } puts "hello"
The problem is that Ruby doesn’t wait for threads to finish.
You need to call the join
method on your thread to fix the code above:
t = Thread.new { puts 10**10 } puts "hello" t.join
If you want to create multiple threads you can put them inside an array & call join
on every thread.
Example:
threads = [] 10.times { threads << Thread.new { puts 1 } } threads.each(&:join)
We still need to look at a few important concepts, but during our exploration of Ruby threads you may find the documentation useful:
https://ruby-doc.org/3.2.2/Thread.html
Threads and Exceptions
If an exception happens inside a thread it will die silently without stopping your program or showing any kind of error message.
Here is an example:
Thread.new { raise 'hell' }
For debugging purposes, you may want your program to stop when something bad happens. To do that you can set the following flag on Thread
to true:
Thread.abort_on_exception = true
Make sure to set this flag before you create your threads 🙂
Thread Pools
Let’s say you have hundreds of items to process, starting a thread for each of them is going to destroy your system resources.
It would look something like this:
pages_to_crawl = %w( index about contact ... ) pages_to_crawl.each do |page| Thread.new { puts page } end
If you do this you would be launching hundreds of connections against the server, so that’s probably not a good idea.
One solution is to use a thread pool.
Thread pools allow you to control the number of active threads at any given time.
You could build your own pool, but I wouldn’t recommend it. In the following example we are using the concurrent-ruby gem to do this for you.
If you’re coming back to Ruby after a long hiatus you may remember Celluloid, that is no longer maintained & concurrent-ruby is now the recommended concurrency gem.
require 'concurrent' require 'excon' base_url = "http://example.com/" pages_to_crawl = %w( index about contact products ) def process_page(url) p Excon.get(url).status end pool = Concurrent::FixedThreadPool.new(5) pool.post do pages_to_crawl.map { process_page(base_url + _1) } end pool.shutdown pool.wait_for_termination
This time only 5 threads will be running, and as they finish they will pick the next item.
Race Conditions and Other Hazards
This may sound all very cool but before you go out sprinkling threads all over your code you must know that there are some problems associated with concurrent code.
For example, threads are prone to race conditions.
A race condition is when things happen out of order and make a mess.
Another problem that can happen is a deadlock. This is when one thread holds exclusive access (using a locking system like a mutex) to some resource and never releases it, which makes it inaccessible to all the other threads.
To avoid these issues, it’s best to avoid raw threads and stick with some gem that already takes care of the details for you.
More Threading gems
We already used celluloid for our thread pool, but there are many other concurrency-focused gems that you should check out:
- https://github.com/grosser/parallel
- https://github.com/chadrem/workers
- https://github.com/ruby-concurrency/concurrent-ruby
Ok that’s it, hopefully you learned a thing or two about Ruby threads!
If you found this article useful please share it with your friends so they can learn too 🙂