What is a data structure?
A data structure is a specific way to organize & access data.
Examples include:
- Arrays
- Binary trees
- Hashes
Different data structures excel at different tasks.
For example, hashes are great if you’re looking to store data that looks like a dictionary (word & definition), or a phone book (person name & number).
Knowing what data structures are available, and the characteristics of each of them, will make you a better Ruby developer.
That’s what you’ll learn in this article!
Understanding Arrays
The array is the first data structure that you learn about when you start reading about programming.
Arrays use a contiguous chunk of memory where objects are stored one after another without gaps between them.
Unlike in lower-level programming languages, like C, Ruby does all the hard work of managing your memory, expanding the maximum array size & compacting it when you delete elements.
Uses:
- As a base for more advanced data structures
- To gather results from running a loop
- Collecting items
You’ll find arrays everywhere, like the split
& chars
methods, which break down a string into an array of characters.
Example:
out = [] 10.times { |i| out << i } out # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
The following table shows you how the different array operations behave as the size of the array increases.
If you aren't familiar with time complexity notation read this article.
Time complexity for arrays:
Operation | Complexity |
---|---|
push | O(1) |
pop | O(1) |
access | O(1) |
find | O(n) |
delete | O(n) |
Why is this helpful?
Because it tells you the performance characteristics of arrays.
If you're doing a lot of find
operations on a HUGE array that's going to be slow...
But if you know what indexes to use, then an array is going to be fast because of the O(1)
time complexity.
Choose your data structure with this criteria:
- Performance characteristics => What are you doing with the data? How big is your dataset?
- Shape & form of your data => What kind of data are you working with? Could you re-organize your data so it fits a better data structure?
The Hash Data Structure
Do you have a mapping between country codes & country names?
Or maybe you just want to count stuff.
That's exactly what hashes are helpful for!
A hash is a data structure where every value has a key & this key can be anything, like a string, an integer, a symbol, etc.
How does it work?
A hash converts your key into a number (using the hash
method in Ruby) & then uses that number as the index. But you don't need to understand that to be able to use hashes in your Ruby programs.
Uses:
- Counting characters in a string
- Mapping words to definitions, names to phone numbers, etc.
- Find duplicates inside an array
Example:
"aaabcd" .each_char .with_object(Hash.new(0)) { |ch, hash| hash[ch] += 1 } # {"a"=>3, "b"=>1, "c"=>1, "d"=>1}
Time complexity:
Operation | Complexity |
---|---|
store | O(1) |
access | O(1) |
delete | O(1) |
find (value) | O(n) |
Hashes are one of the most helpful data structures when it comes to performance because of the constant O(1)
store, delete & access time.
Find in the context of a hash means that you're looking for a specific value.
Stacks
A stack is like a stack of plates, you put one plate on top of another & you can only remove the plate on top.
This is more useful than it sounds at first!
Uses:
- Replaces recursive methods with a regular loop
- Keep track of work left to do, leaving the most recent on top
- Reverse an array
Example:
stack = [1,2,3,4,5] (1..stack.size).map { stack.pop } # [5, 4, 3, 2, 1]
Yes, you can use reverse
instead.
This is only an example to show you this particular characteristic of stacks.
Time complexity:
Operation | Complexity |
---|---|
push | O(1) |
pop | O(1) |
find | --- |
access | --- |
Notice that stacks (and queues) only have two operations, insert
& delete
, or push
& pop
.
While it's possible to search inside a stack, it's very rare.
How to Use Binary Trees
Most Ruby developers have probably heard about binary trees but never used one.
Why is that?
First, we don't have a built-in binary tree implementation.
Second, a binary tree is not that helpful for everyday programming challenges, unlike arrays & hashes which you use ALL the time.
But binary trees are a very interesting data structure.
In fact, there are many variations, like the Trie (covered in next section), multiway trees like the B-Tree (used in databases) & the Heap.
Uses:
- Data compression
- Routing tables
- Abstract syntax trees (AST)
Example:
# https://github.com/jamesconant/bstree require 'bstree' root = Bstree::Node.new(5) root.insert(2) root.insert(7) root.search(3) # nil
Time complexity:
Operation | Complexity |
---|---|
insert | O(log n) |
delete | O(log n) |
find | O(log n) |
access | --- |
A balanced binary tree is when all nodes have two children & all leaves have the same level
If a tree becomes unbalanced, the performance degrades to O(n)
.
In a self-balanced binary tree (like the Red-Black Tree, or AVL tree), every operation takes time proportional to the height (or level) of the tree.
Notice how there is no access time because to access a node you first have to find it...
In that case, you'll have O(log n)
for access.
But if keep a reference (as a variable) to a specific node, that would be O(1)
access time.
The Trie Data Structure
A trie is a specialized tree-like data structure.
It's helpful for working with words, and then quickly searching for words that start with a prefix, or search for the full word.
Uses:
- Word games
- Spelling checker
- Autocomplete suggestions
Example:
# https://github.com/gonzedge/rambling-trie require 'rambling-trie' trie = Rambling::Trie.create('words.txt') trie.include?('chocolate') # true trie.include?('salmon') # true
Time complexity:
Operation | Complexity |
---|---|
add | O(k) |
include? | O(k) |
words | O(k) |
In this table, I use k
to denote the size of the input string, while n
is used to denote the size of the data structure itself.
So for the word apple
, k
would be 5.
Summary
You have learned about common data structures, their main uses & characteristics, and how to use them in Ruby.
When you apply this new knowledge you'll be able to solve problems faster!
Could you share this post for me if you found it helpful?
Thank you 🙂
There is an error in hash find complexity. Its O(1) on average.
Hi, thanks for your comment!
I just clarified what I mean by
find
.It’s the find method in Ruby, it iterates over all the hash keys, so that’s
O(n)
.WRT array popping, it’s important to note that what Ruby calls pop() is removing the last element, so it’s essentially acting as a stack. Removing the first element, so that it would be treated as a queue, would be O(n), since it would have to shift all the other elements down by 1. (Unless it did something fancy under the hood like maintaining an offset, but the generic abstract data type array doesn’t do that.)
Thanks for your comment 🙂