Ruby regular expressions (ruby regex for short) help you find specific patterns inside strings, with the intent of extracting data for further processing.
Two common use cases for regular expressions include validation & parsing.
For example:
Think about an email address, with a ruby regex you can define what a valid email address looks like. In other words, your program will be able to tell the difference between a valid & invalid email address.
Ruby regular expressions are defined between two forward slashes to differentiate them from other language syntax. The most simple expressions match a word or even a single letter.
For example:
# Find the word 'like' "Do you like cats?" =~ /like/
This returns the index of the first occurrence of the word if it was found (successful match) or nil
otherwise. If we don’t care about the index we could use the String#include? method.
Another way to check if a string matches a regex is to use the match
method:
if "Do you like cats?".match(/like/) puts "Match found!" end
Now:
You are going to learn how to build more advanced patterns so you can match, capture & replace things like dates, phone numbers, emails, URLs, etc.
Character Classes
A character class lets you define a range or a list of characters to match. For example, [aeiou]
matches any vowel.
Example: Does the string contain a vowel?
def contains_vowel(str) str =~ /[aeiou]/ end contains_vowel("test") # returns 1 contains_vowel("sky") # returns nil
This will not take into account the amount of characters, we will see how to do that soon.
Ranges
We can use ranges to match multiple letters or numbers without having to type them all out. In other words, a range like [2-5]
is the same as [2345]
.
Some useful ranges:
- [0-9] matches any number from 0 to 9
- [a-z] matches any letter from a to z (no caps)
- [^a-z] negated range
Example: Does this string contain any numbers?
def contains_number(str) str =~ /[0-9]/ end contains_number("The year is 2015") # returns 12 contains_number("The cat is black") # returns nil
Remember: the return value when using `=~` is either the string index or `nil`
There is a nice shorthand syntax for specifying character ranges:
- \w is equivalent to [0-9a-zA-Z_]
- \d is the same as [0-9]
- \s matches white space (tabs, regular space, newline)
There is also the negative form of these:
- \W anything that’s not in [0-9a-zA-Z_]
- \D anything that’s not a number
- \S anything that’s not a space
The dot character .
matches everything but new lines. If you need to use a literal .
then you will have to escape it.
Example: Escaping special characters
# If we don't escape, the letter will match "5a5".match(/\d.\d/) # In this case only the literal dot matches "5a5".match(/\d\.\d/) # nil "5.5".match(/\d\.\d/) # match
Modifiers
Up until now we have only been able to match a single character at a time. To match multiple characters we can use pattern modifiers.
Modifier | Description |
---|---|
+ | 1 or more |
* | 0 or more |
? | 0 or 1 |
{3,5} | between 3 and 5 |
We can combine everything we learned so far to create more complex regular expressions.
Example: Does this look like an IP address?
# Note that this will also match some invalid IP address # like 999.999.999.999, but in this case we just care about the format. def ip_address?(str) # We use !! to convert the return value to a boolean !!(str =~ /^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/) end ip_address?("192.168.1.1") # returns true ip_address?("0000.0000") # returns false
Exact String Matching
If you need exact matches you will need another type of modifier. Let’s see an example so you can see what I’m talking about:
# We want to find if this string is exactly four letters long, this will # still match because it has more than four, but it's not what we want. "Regex are cool".match /\w{4}/ # Instead we will use the 'beginning of line' and 'end of line' modifiers "Regex are cool".match /^\w{4}$/ # This time it won't match. This is a rather contrived example, since we could just # have used .size to find the length, but I think it gets the idea across.
To match strictly at the start of a string and not just on every line (after a \n
) you need to use \A
and \Z
instead of ^
and $
.
Capture Groups
With capture groups, we can capture part of a match and reuse it later. To capture a match we enclose the part we want to capture in parenthesis.
Example: Parsing a log file
Line = Struct.new(:time, :type, :msg) LOG_FORMAT = /(\d{2}:\d{2}) (\w+) (.*)/ def parse_line(line) line.match(LOG_FORMAT) { |m| Line.new(*m.captures) } end parse_line("12:41 INFO User has logged in.") # This produces objects like this: #
In this example, we are using .match
instead of =~
.
This method returns a MatchData
object if there is a match, nil otherwise. The MatchData
class has many useful methods, check out the documentation to learn more!
If you want just a boolean value (true
/ false
) then you can use the match?
method, which is available since Ruby 2.4. This also faster than match
since Ruby doesn’t need to create a MatchData
object.
You can access the captured data using the .captures
method or treating the MatchData
object like an array, the zero index will have the full match and consequent indexes will contain the matched groups.
If you want the first capture group you can do this:
m = "John 31".match /\w+ (\d+)/ m[1] # 31
You can also have non-capturing groups. They will let you group expressions together without a performance penalty. You may also find named groups useful for making complex expressions easier to read.
Syntax | Description |
---|---|
(?:...) |
Non-capturing group |
(?<foo>...) |
Named group |
Example: Named Groups
m = "David 30".match /(?<name>\w+) (?<age>\d+)/ m[:age] # => "30" m[:name] # => "David"
A named group returns a MatchData
object which you can access to read the results.
Look Ahead & Look Behind
This is a more advanced technique that might not be available in all regex implementations. Ruby’s regular expression engine is able to do this, so let’s see how to take advantage of that.
Look ahead lets us peek and see if there is a specific match before or after.
Name | Description |
---|---|
(?=pat) | Positive lookahead |
(?<=pat) | Positive lookbehind |
(?!pat) | Negative lookahead |
(?<!pat) | Negative lookbehind |
Example: is there a number preceded by at least one letter?
def number_after_word?(str) !!(str =~ /(?<=\w) (\d+)/) end number_after_word?("Grade 99")
Ruby’s Regex Class
Ruby regular expressions are instances of the Regexp
class. Most of the time you won’t be using this class directly, but it is good to know 🙂
puts /a/.class # Regexp
One possible use is to create a regex from a string:
regexp = Regexp.new("a")
Another way to create a regexp:
regexp = %r{\w+}
Regex Options
You can set some options on your regular expression to make it behave differently.
Options | Description |
---|---|
i | ruby regex case insensitive |
m | dot matches newline |
x | ignore whitespace |
To use these options you add the letter at the end of the regex, after the closing /
.
Like this:
"abc".match?(/[A-Z]/i)
Formatting Long Regular Expressions
Complex Ruby regular expressions can get pretty hard to read, so it will be helpful if we break them into multiple lines. We can do this by using the ‘x’ modifier. This format also allows you to use comments inside your regex.
Example:
LOG_FORMAT = %r{ (\d{2}:\d{2}) # Time \s(\w+) # Event type \s(.*) # Message }x
Ruby regex: Putting It All Together
Regular expressions can be used with many Ruby methods.
- .split
- .scan
- .gsub
- and many more…
Example: Match all words from a string using .scan
"this is some string".scan(/\w+/) # => ["this", "is", "some", "string"]
Example: Extract all the numbers from a string
"The year was 1492.".scan(/\d+/) # => ["1492"]
Example: Capitalize all words in a string
str = "lord of the rings" str.gsub(/\w+/) { |w| w.capitalize } # => "Lord Of The Rings"
Example: Validate an email address
email = "test@example.com" !!email.match(/\A[\w.+-]+@\w+\.\w+\z/) # true
This last example uses !!
to convert the result into a boolean value (true
/ false
), alternatively you can use the match?
method in Ruby 2.4+ which already does this for you & it’s also faster.
Conclusion
Regular expressions are amazing but sometimes they can be a bit tricky. Using a tool like rubular.com can help you build your ruby regex in a more interactive way. Rubular also includes a Ruby regular expression cheat sheet that you will find very useful. Now it’s your turn to crack open that editor and start coding!
Oh, and don’t forget to share this with your friends if you enjoyed it, so more people can learn 🙂
Hi, nice post! It’s worth pointing out, I think, that you can also get named capture groups with the =~ notation:
As far as I know it only works this way around, and not this way:
Bernardo, some characters got stripped out in your comment, and the quotes got changed to curly quotes. It should be
Hey, thanks for you comment. I edited it so it should look right now 🙂
It’s always better to make your regular expression more specific e.g. searching for a word at the start of line and searching for a word in the line can make big performance difference, if majority of traffic is for unmatched input.