Scaling Laravel Using AWS Elastic Beanstalk Part 1: Setting up Laravel
Our new product Mergebot is getting very close to the stage of being ready to put in the hands of our beta testers. As such I went through the process recently of setting up the Laravel app on our production architecture. We decided early on to go with Amazon Web Services (AWS), specifically Elastic Beanstalk, to host the app as it has a solid reputation for being a decent hosting provider and also offers us everything we need in terms of infrastructure (e.g. load balancers, auto scaling, relational database, custom VPC etc.).
When I started out I was surprised to find that there seemed to be quite a lack of public information available about setting up Laravel on Elastic Beanstalk. I had assumed that there must be plenty of companies running large scale apps using Laravel and, given the popularity of AWS and Elastic Beanstalk, I expected there to be quite a bit written about the process of setting it up. For once Google didn’t give me much concrete advice, and certainly not anything that was in-date (i.e. setting up Laravel 5.x apps on the current Elastic Beanstalk platform).
So in this short series I’m going to go through the process of setting up a Laravel 5.3 app on AWS Elastic Beanstalk. I’m also going to cover setting up some supporting services that large scale Laravel apps would require: setting up an external RDS MySQL database, setting up Redis on ElastiCache and configuring everything securely on a Virtual Private Cloud (VPC). A rough outline for this series, is as follows:
- Set up Laravel for a horizontally auto scaling architecture
- Set up supporting services such as VPC, RDS and ElastiCache
- Set up Elastic Beanstalk app and worker environments
The first thing we need to do is get our Laravel app ready to work on an architecture designed to scale horizontally. The idea with horizontal scaling is that when your server is struggling to cope with the load you add a new duplicate of the server to your cluster of servers, and spread the load evenly over the whole cluster of servers using a Load Balancer. This type of architecture is common for high demand apps and although scaling vertically (increasing the size/power of your server) may be fine early on, if demand continues to grow you will have no choice but to scale horizontally. Some of the advantages of horizontal auto scaling include:
- The flexibility to increase/decrease the load capacity of your app without any downtime (as opposed to vertical scaling)
- Reasonable affordability (i.e. you only pay for what you need, automatically increasing/decreasing as required)
- Protection against a single point of failure (e.g. health checks can automatically remove degraded servers and add new ones)
- Rolling updates and deployments (e.g. upgrade your servers in batches to avoid downtime)
- Potentially infinite possibility for scaling up
To gain all of the benefits of horizontal scaling, your app needs to be thoroughly decoupled so that resources can be shared across all of the servers in your cluster, whether you have 1 or 100 servers. What does it mean to decouple your app?
As an example, consider an app that allows you to login and upload photos. With a single server, the user session and any photos the user uploads can be stored on the server file system. However, with a horizontally scaled app you might have many app servers that need shared access to the user session and uploaded photos. So you need to decouple the file system from the app server. This is normally done by either using a shared storage volume or storing the data in an external service that can be accessed by all the servers in the cluster (e.g. a database or Redis cache).
We are going to make some tweaks to our Laravel app to make sure the app is properly decoupled and ready for horizontal auto scaling.
Thankfully Laravel comes with built-in features that make it relatively easy to decouple. These include:
- Using external file systems (e.g. AWS S3)
- Being able to store session data in Redis or a database
- Being able to store cache data in an external service (e.g. Redis, Memcached)
- Being able to offload queue handling to an external service (e.g. AWS Simple Queue Service, Beanstalkd)
In this series we are going to copy the architecture that we currently employ in Mergebot, namely storing files in S3 instead of the local file system (except for temporary or development files), storing session data in the database, using Redis for cache data, and using Elastic Beanstalk’s worker environment (which uses SQS) to handle queues (more on this later).
External File System
In Mergebot we only use the local file system for temporary files and storing files we need in our development environment. In our production environments we store files on S3, which is simply done in Laravel. As per Laravel filesystem docs we need to install the
league/flysystem-aws-s3-v3 package, then configure our S3 settings in
'disks' => [ // ... 's3' => [ 'driver' => 's3', 'key' => env('AWS_S3_KEY'), 'secret' => env('AWS_S3_SECRET'), 'region' => env('AWS_S3_REGION', 'us-east-1'), 'bucket' => env('AWS_S3_BUCKET', 'mergebot-' . env('APP_ENV')), ], ],
We can then input the correct environment variables in our
.env file when developing locally, or set the environment variables in Elastic Beanstalk (we’ll deal with this later in the series).
This allows us to use an “s3” disk in Laravel. For example:
$s3 = Storage::disk('s3'); $s3->put('example-file.sql.gz', $fileContents);
Storing sessions in the database is also well documented in the Laravel session docs. Simply run an artisan command to generate the table migration needed for the database, then run the migration:
php artisan session:table php artisan migrate
Then we can change the default session driver in
config/session.php or in your
.env/environment variables. Because all of our environments use the same config I have just changed the default in
'driver' => env('SESSION_DRIVER', 'database'),
Changing the cache to use Redis is a very similar process to changing the session driver. As per the Laravel cache docs we need to install the
predis/predis package then change the default driver in
'default' => env('CACHE_DRIVER', 'redis'),
The Redis connection info can be configured in
config/database.php and we need to tweak it a bit so that the default values can be overridden by environment variables:
'redis' => [ 'cluster' => env('REDIS_CLUSTER', false), 'default' => [ 'host' => env('REDIS_HOST', 'localhost'), 'password' => env('REDIS_PASSWORD', null), 'port' => env('REDIS_PORT', 6379), 'database' => 0, ], ],
For local development I recommend using the Homestead VM as it comes with Redis pre-installed and means you don’t need to install it on your local machine.
The final part of decoupling that needs to be done is a bit more complex. Setting up Laravel to use SQS queues via the docs is simple enough. Install the
aws/aws-sdk-php package and set up environment variables in the
config/queue.php file similar to what we’ve done previously:
'default' => env('QUEUE_DRIVER', 'sqs'), 'connections' => [ 'sqs' => [ 'driver' => 'sqs', 'key' => env('AWS_SQS_KEY'), 'secret' => env('AWS_SQS_SECRET'), 'prefix' => env('AWS_SQS_PREFIX'), 'queue' => env('AWS_SQS_QUEUE'), 'region' => env('AWS_SQS_REGION', 'us-east-1'), ], ],
Normally at this point you would run a queue listener
php artisan queue:work in a process manager like Supervisor to make sure the queue listener is always running. However Elastic Beanstalk does things a bit differently.
Elastic Beanstalk has the concept of worker environments which are separate from the normal web environments and designed specifically for dealing with jobs that have been added to an SQS queue. Servers in worker environments come with Amazon’s own SQS daemon to deal with incoming messages from an SQS queue, which is nice because it means we don’t have to set up our own daemon to deal with Laravel’s built-in queue listener. However, Laravel is not built to work with the SQS daemon so we need to install an extra package to get things working.
While we could manually implement an interface for the SQS daemon, I found that this laravel-aws-worker package from Denis Mysenko seems to work well. The SQS daemon is designed to
POST an incoming message from the SQS queue to our app (rather than the listener fetching a message from the queue as Laravel’s built-in listener does). The
laravel-aws-worker package adds an extra
/worker/queue path to our app to handle these incoming messages from the SQS daemon, and hands them off to Laravel to be processed. By default the SQS daemon will
POST messages to the base path
/ but we will change that in the configuration later when we set up the environment in Elastic Beanstalk.
Elastic Beanstalk Quirks
Now that our Laravel app is decoupled there are a few final tweaks we need to make to make sure our app works properly on Elastic Beanstalk.
Another aspect of Elastic Beanstalk that the
laravel-aws-worker handles is dealing with cron jobs and task scheduling in Laravel. We can’t have every server in our cluster configured with the same cron because each cron job would then be triggered multiple times when we only want it triggered once. Elastic Beanstalk deals with this automatically by allowing us to specify a
cron.yaml file in our worker environment which handles all of the cron complexities for us (again by
POSTing a request to the specified endpoint).
laravel-aws-worker package adds a second
/worker/schedule path to our app so that we can configure our
cron.yaml like this:
version: 1 cron: - name: "schedule" url: "/worker/schedule" schedule: "* * * * *"
Now the worker environment will
/worker/schedule every minute and the
laravel-aws-worker package will offload this to the Laravel scheduler.
Have a read of this blog post by Denis if you want to know more about how the
laravel-aws-worker package works.
One other issue you will find is that when you are behind a load balancer (or any kind of reverse proxy) things like sessions, redirects, URL generation etc. might break because the server isn’t receiving the traffic directly but instead receiving traffic from the load balancer. The load balancer might know your site is at the domain
example.com but the server receives traffic from the load balancer which might have some random hostname or IP. Thankfully AWS helps us out here by sending the correct info via
X-Forwarded headers. However Laravel is not configured to handle these
X-Forwarded headers so we need to install a package to tell Laravel what to do here.
This TrustedProxy package by Chris Fidao seems to work well. You simply need to install the
fideloper/proxy package, run through the set up and tweak the generated
config/trustedproxy.php config to allow the correct proxies:
return [ 'proxies' => '*', 'headers' => [ \Illuminate\Http\Request::HEADER_CLIENT_IP => 'X_FORWARDED_FOR', \Illuminate\Http\Request::HEADER_CLIENT_HOST => 'X_FORWARDED_HOST', \Illuminate\Http\Request::HEADER_CLIENT_PROTO => 'X_FORWARDED_PROTO', \Illuminate\Http\Request::HEADER_CLIENT_PORT => 'X_FORWARDED_PORT', ], ];
AWS does not document their load balancer IP addresses (as they are constantly changing), and so we must trust all possible addresses (
*). The default headers will work fine here.
At this stage our Laravel app should be suitable to be deployed on AWS Elastic Beanstalk and auto scaled horizontally. In the next article we will look at setting up supporting services in AWS such as a VPC, a Redis ElastiCache and our RDS MySQL database.
Have you ever hosted Laravel on Elastic Beanstalk? Do you have any experience scaling Laravel? Have you ever used Elastic Beanstalk or auto scaling on a different project? Let us know in the comments.