Have you ever built your own web server with Ruby?
We already have many servers, like:
- Puma
- Thin
- Unicorn
But I think this is a great learning exercise if you want to know how a simple web server works.
In this article, you will learn how to do this.
Step-by-step!
Step 1: Listening For Connections
Where do we start?
The first thing that we need is to listen for new connections on TCP port 80.
I already wrote a post about network programming in Ruby, so I’m not going to explain how that works here.
I’m just going to give you the code:
require 'socket' server = TCPServer.new('localhost', 80) loop { client = server.accept request = client.readpartial(2048) puts request }
When you run this code you will have a server that accepts connections on port 80. It doesn’t do much yet, but it will allow you to see what an incoming request looks like.
Note: To use port 80 in a Linux/Mac system you will need root privileges. As an alternative, you can use another port above 1024. I like 8080 🙂
An easy way to generate a request is to just use your browser or something like curl
.
When you do that you will see this printed in your server:
GET / HTTP/1.1 Host: localhost User-Agent: curl/7.49.1 Accept: */*
This is an HTTP request. HTTP is a plain-text protocol used for communication between web browsers and web servers.
The official protocol specification can be found here: https://tools.ietf.org/html/rfc7230.
Step 2: Parsing The Request
Now we need to break down the request into smaller components that our server can understand.
To do that we can build our own parser or use one that already exists. We are going to build our own so we need to understand what the different parts of the request mean.
This image should help:
The headers are used for things like browser caching, virtual hosting and data compression, but for a basic implementation we can ignore them & still have a functional server.
To build a simple HTTP parser we can take advantage of the fact that the request data is separated via new lines (\r\n
). We are not going to do any error or validity checking to keep things simple.
Here is the code I came up with:
def parse(request) method, path, version = request.lines[0].split { path: path, method: method, headers: parse_headers(request) } end def parse_headers(request) headers = {} request.lines[1..-1].each do |line| return headers if line == "\r\n" header, value = line.split header = normalize(header) headers[header] = value end def normalize(header) header.gsub(":", "").downcase.to_sym end end
This will return a hash with the parsed request data. Now that we have our request in a usable format we can build our response for the client.
Step 3: Preparing & Sending The Response
To build the response we need to see if the requested resource is available. In other words, we need to check if the file exists.
Here is the code I wrote for doing that:
SERVER_ROOT = "/tmp/web-server/" def prepare_response(request) if request.fetch(:path) == "/" respond_with(SERVER_ROOT + "index.html") else respond_with(SERVER_ROOT + request.fetch(:path)) end end def respond_with(path) if File.exists?(path) send_ok_response(File.binread(path)) else send_file_not_found end end
There are two things happening here:
- First, if the path is set to
/
we assume that the file we want isindex.html
. - Second, if the requested file is found, we are going to send the file contents with an OK response.
But if the file is not found then we are going to send the typical 404 Not Found
response.
Table Of Most Common HTTP Response Codes
For reference.
Code | Description |
---|---|
200 | OK |
301 | Moved permanently |
302 | Found |
304 | Not Modified |
400 | Bad Request |
401 | Unauthorized |
403 | Forbidden |
404 | Not found |
500 | Internal Server Error |
502 | Bad Gateway |
Response Class & Methods
Here are the “send” methods that are used in the last example:
def send_ok_response(data) Response.new(code: 200, data: data) end def send_file_not_found Response.new(code: 404) end
And here is the Response
class:
class Response attr_reader :code def initialize(code:, data: "") @response = "HTTP/1.1 #{code}\r\n" + "Content-Length: #{data.size}\r\n" + "\r\n" + "#{data}\r\n" @code = code end def send(client) client.write(@response) end end
The response is built from a template & some string interpolation.
At this point we just need to tie everything together in our connection-accepting loop
and then we should have a functional server.
loop { client = server.accept request = client.readpartial(2048) request = RequestParser.new.parse(request) response = ResponsePreparer.new.prepare(request) puts "#{client.peeraddr[3]} #{request.fetch(:path)} - #{response.code}" response.send(client) client.close }
Try adding some HTML files under the SERVER_ROOT
directory and you should be able to load them from your browser. This will also serve any other static assets, including images.
Of course a real web-server has many more features that we didn’t cover here.
Here is a list of some of the missing features, so you can implement them on your own as an exercise (practice is the mother of skill!):
- Virtual hosting
- Mime types
- Data compression
- Access control
- Multi-threading
- Request validation
- Query string parsing
- POST body parsing
- Browser caching (response code 304)
- Redirects
A Lesson on Security
Taking input from a user & doing something with it is always dangerous. In our little web server project, the user input is the HTTP request.
We have introduced a little vulnerability known as “path traversal”. People will be able to read any files that our web server user has access to, even if they are outside of our SERVER_ROOT
directory.
This is the line responsible for this issue:
File.binread(path)
You can try to exploit this issue yourself to see it in action. You will need to make a “manual” HTTP request, because most HTTP clients (including curl
) will pre-process your URL and remove the part that triggers the vulnerability.
One tool you can use is called netcat.
Here is a possible exploit:
$ nc localhost 8080 GET ../../etc/passwd HTTP/1.1
This will return the contents of the /etc/passwd
file if you are on a Unix-based system. The reason this works is because a double dot (..
) allows you to go one directory up, so you are “escaping” the SERVER_ROOT
directory.
One possible solution is to “compress” multiple dots into one:
path.gsub!(/\.+/, ".")
When thinking about security always put your “hacker hat” on & try to find ways to break your solution. For example, if you just did path.gsub!("..", ".")
, you could bypass that by using triple dots (...
).
Finished & Working Code
I know the code is all over the place in this post, so if you’re looking for the finished, working code…
Here’s the link:
https://gist.github.com/matugm/efe0a1c4fc53310f7ac93dcd1f041f6c#file-web-server-rb
Enjoy!
Summary
In this post, you learned how to listen for new connections, what an HTTP request looks like & how to parse it. You also learned how to build the response using a response code and the contents of the required file (if available).
And finally you learned about the “path traversal” vulnerability & how to avoid it.
I hope you enjoyed this post & learned something new! Don’t forget to subscribe to my newsletter on the form below, so you won’t miss a single post 🙂