A prefix tree (also known as a trie) is a data structure that helps you organize a word list & quickly find words that start with a specific prefix.
For example, you can find all the words in your dictionary that start with the letters “ca”, such as “cat” or “cape”.
Look at this picture:
This is a prefix tree.
You can follow from the root (*
) to a marked node (like e
and t
) to find words.
In this article you will learn how to implement your own prefix tree in Ruby & how to use it to solve problems!
Prefix Tree Implementation
To implement this in Ruby I decided to use a Node
class with a few attributes:
- The value (one character)
- The “word” variable. A true / false value which tells you if this is a valid word
- The “next” array. This stores all the characters (as
Node
objects) that come after this one in the tree
Here’s the code:
class Node attr_reader :value, :next attr_accessor :word def initialize(value) @value = value @word = false @next = [] end end
Now we need a class to hold the root node & the methods for working with these nodes.
Let’s look at the Trie
class:
class Trie def initialize @root = Node.new("*") end end
Inside this class we have the following methods:
def add_word(word) letters = word.chars base = @root letters.each { |letter| base = add_character(letter, base.next) } base.word = true end def find_word(word) letters = word.chars base = @root word_found = letters.all? { |letter| base = find_character(letter, base.next) } yield word_found, base if block_given? base end
Both methods break down a given word into an array of characters (using the chars
method).
Then we navigate the tree starting at the root & either find a character or add it.
Here are the supporting methods (also inside the Trie
class):
def add_character(character, trie) trie.find { |n| n.value == character } || add_node(character, trie) end def find_character(character, trie) trie.find { |n| n.value == character } end def add_node(character, trie) Node.new(character).tap { |new_node| trie << new_node } end
To add a character we check if it already exists (using the find
method). If it does then we return the node.
Otherwise we create it & return the new node.
Then we also have the include?
method:
def include?(word) find_word(word) { |found, base| return found && base.word } end
Now we are ready to start using our new data structure and see what we can do with it 🙂
Trie Uses & Examples
Let's start by adding some words to our tree:
trie = Trie.new trie.add_word("cat") trie.add_word("cap") trie.add_word("cape") trie.add_word("camp")
You can check if a word is included in the tree like this:
p trie.include?("cape") # true p trie.include?("ca") # false
So what are some uses for this data structure?
- Solving word games
- Spell checking
- Autocomplete
You will need a good dictionary to load into your tree.
I found these which may be useful to you:
- https://raw.githubusercontent.com/first20hours/google-10000-english/master/20k.txt
- https://raw.githubusercontent.com/dwyl/english-words/master/words_alpha.txt
Finding Prefixed Words
So in the code example I showed you before we have implemented the add
& find
operations.
But we also want a find_words_starting_with
method.
We can do this using the "Depth First Search" (DFS) algorithm. We also need a way to keep track of the word we are looking at.
Remember that our nodes only have one character each, so we need to reconstruct the actual strings by walking over the tree.
Here's a method that does all of that:
def find_words_starting_with(prefix) stack = [] words = [] prefix_stack = [] stack << find_word(prefix) prefix_stack << prefix.chars.take(prefix.size-1) return [] unless stack.first until stack.empty? node = stack.pop prefix_stack.pop and next if node == :guard_node prefix_stack << node.value stack << :guard_node words << prefix_stack.join if node.word node.next.each { |n| stack << n } end words end
We use two stacks here, one for keeping track of unvisited nodes (stack
) & another to keep track of the current string (prefix_stack
).
We loop until we have visited all the nodes & add the value of the node to the prefix_stack
. Each node holds the value for only one character, so we need to collect these characters to form a word.
The :guard_node
symbol is added to the prefix_stack
so we know when we are backtracking. We need this to remove characters from our string buffer (prefix_stack
) at the right time.
Then if node.word
is true it means we found a full word & we add it to our list of words.
Example of using this method:
t.find_words_starting_with("cap") # ["cap", "cape"]
If no words are found you will get an empty array:
t.find_words_starting_with("b") # []
This method can be used to implement an autocomplete feature.
Summary
You learned about prefix trees (also known as tries), a data structure used to organize a list of words into a tree. You can quickly query this tree to check if a word is valid & to find words that share the same prefix.
Don't forget to share this post so more people can learn!
return statement inside a block?
Yes, it works like in a regular method.
Example:
Nice article, and a handy use for trees. I actually did something very much like this, though I did it by shoving everything into a Hash (including functionality), but it aims to accomplish the same thing. This also has the added benefit of searching for “scrabble chains”, where you can take a word on a scrabble board, add a single letter to the front or the back, and it’ll still be a complete word.
https://pastebin.com/YJ6LiBdv – the script itself
https://pastebin.com/raw/ejKQvqMa – the dictionary it’ll pull from (there are some extra strings in there right at the top to explicitly demonstrate the capabilities of the chain searching beyond what the actual dictionary would have yielded)