Thursday, April 8, 2010

Using puppet in UEC/EC2: Improving performance with Phusion Passenger

Now that we have an efficient process to start instances within UEC/EC2 and get them configured for their task by puppet we'll dive into improving the performance of the puppetmaster with Phusion Passenger.

Why?


The default configuration used by puppetmasterd is based on webrick which doesn't really scale well. One popular choice to improve puppetmasterd performance is to use mod passenger from the libapache2-mod-passenger package.



Apache2 setup


The configuration is based on the Puppet passenger documentation. It is available from the bzr branch as we'll use puppet to actually configure the instance running puppetmasterd.

The puppet module has been updated to make sure the apache2 and libapache2-mod-passenger packages are installed. It also creates the relevant files and directories required to run puppetmasterd as a rack application.

Passenger and SSL modules are enabled in the apache2 configuration. All of their configuration is done inside a virtual host definition. Note that the SSL options related to certificates and private keys files points directly to /var/lib/puppet/ssl/.

Apache2 is also configured to only listen on the default puppetmaster port by replacing apache2 default ports.conf and disabling the default virtual site.

Finally the configuration of puppetmasterd has been updated so that it can correctly process the certificate clients while being run under passenger.

Note that puppetmasterd needs to be run once in order to be able to generate its ssl configuration. This happens automatically when the puppetmaster package is installed since puppetmasterd is started during the package installation.



Deploying an improved puppetmaster


Log on the puppetmaster instance and update the puppet configuration using the bzr branch:
bzr pull --remember lp:~mathiaz/+junk/uec-ec2-puppet-config-passenger /etc/puppet/

Update the configuration:
sudo puppet --node_terminus=plain /etc/puppet/manifests/puppetmaster.pp

On the Cloud Conductor start a new instance with start_instance.py. If you're starting from scratch remember to update the start_instance.yaml
file with the puppetmaster CA and internal IP:
./start_instance.py -c start_instance.yaml AMI_NUMBER

Following /var/log/syslog on the puppetmaster you should see the new instance requesting a certificate:
Apr 8 00:40:08 ip-10-195-93-129 puppetmasterd[3353]: Starting Puppet server version 0.25.4
Apr 8 00:40:08 ip-10-195-93-129 puppetmasterd[3353]: 7d6b61a7-3772-4c41-a23d-471b417d9c47 has a waiting certificate request

Now that the puppetmasterd process is run by apache2 and mod-passenger you can check in /var/log/apache2/other_vhosts_access.logs.log the http requests made by the puppet client to get its certificate signed:
ip-10-195-93-129.ec2.internal:8140 10.195.94.224 - - [08/Apr/2010:00:40:06 +0000] "GET /production/certificate/7d6b61a7-3772-4c41-a23d-471b417d9c47 HTTP/1.1" 404 2178 "-" "-"
ip-10-195-93-129.ec2.internal:8140 10.195.94.224 - - [08/Apr/2010:00:40:08 +0000] "GET /production/certificate_request/7d6b61a7-3772-4c41-a23d-471b417d9c47 HTTP/1.1" 404 2178 "-" "-"
ip-10-195-93-129.ec2.internal:8140 10.195.94.224 - - [08/Apr/2010:00:40:08 +0000] "PUT /production/certificate_request/7d6b61a7-3772-4c41-a23d-471b417d9c47 HTTP/1.1" 200 2082 "-" "-"
ip-10-195-93-129.ec2.internal:8140 10.195.94.224 - - [08/Apr/2010:00:40:08 +0000] "GET /production/certificate/7d6b61a7-3772-4c41-a23d-471b417d9c47 HTTP/1.1" 404 2178 "-" "-"
ip-10-195-93-129.ec2.internal:8140 10.195.94.224 - - [08/Apr/2010:00:40:08 +0000] "GET /production/certificate/7d6b61a7-3772-4c41-a23d-471b417d9c47 HTTP/1.1" 404 2178 "-" "-"

Once check_csr is run by cron the certificate will be signed and the puppet client is able to retrieve its certificate:
ip-10-195-93-129.ec2.internal:8140 10.195.94.224 - - [08/Apr/2010:00:42:08 +0000] "GET /production/certificate/7d6b61a7-3772-4c41-a23d-471b417d9c47 HTTP/1.1" 200 2962 "-" "-"
ip-10-195-93-129.ec2.internal:8140 10.195.94.224 - - [08/Apr/2010:00:42:08 +0000] "GET /production/certificate_revocation_list/ca HTTP/1.1" 200 2450 "-" "-"

The puppet client ends up requesting its manifest:
ip-10-195-93-129.ec2.internal:8140 10.195.94.224 - - [08/Apr/2010:00:42:09 +0000] "GET /production/catalog/7d6b61a7-3772-4c41-a23d-471b417d9c47?facts_format=b64_zlib_yaml&facts=eNp [....] HTTP/1.1" 200 2354 "-" "-"



Conclusion


I've just outlined how to configure mod passeenger to run puppetmasterd which is a much more efficient setup than using the default webrick server. Most of the configuration is detailed in the files available in the bzr branch.

Wednesday, April 7, 2010

Using puppet in UEC/EC2: Node classification

In a previous article I discussed how to set up an automated registration process for puppet instances. We'll now have a look at how we can tell these instances what they should be doing.

Going back to the overall architecture the Cloud conductor is the component responsible for starting new instances. Of all the three components it's him that has the most knowledge about what an instance should be: it is the one responsible for starting a new instance after all.

Using S3 to store node definitions


We'll use the puppet external node feature to connect the Cloud conductor with the puppetmaster. The external node script -node_classifier.py - will be responsible for telling which classes each instance is supposed to have. Whenever a puppet client connects to the master the node_classifier.py script is called with the certificate name. It is responsible for providing a description of the classes, environments and parameters for the client on its standard output in a yaml format.

Given that the Cloud conductor creates a file with the certificate name for each instance it spawns we'll extend the start_instance.py script to store the node classification in the content of the file created in the S3 bucket.

You may have noticed that instances started by start_instance.py don't have an ssh public key associated with them. So we're going to create a login-allowed class that will install the authorized key for the ubuntu user.



Setup the puppetmaster to use the node classifier


We'll use the Ubuntu Lucid Beta2 image as the base image on which to build our Puppet infrastructure.

Start an instance of the Lucid Beta2 AMI using an ssh key. Once it's running write down its public and private DNS addresses. The public DNS address will be used to setup the puppetmaster via ssh. The private DNS address will be used as the puppetmaster hostname given out to puppet clients.

Log on the started instance via ssh to install and setup the puppet master:


  1. Update apt files:



    sudo apt-get update



  2. Install the puppet and bzr packages:



    sudo apt-get install puppet bzr



  3. Change the ownership of the puppet directory so that the ubuntu user can directly edit the puppet configuration files:



    sudo chown -R ubuntu:ubuntu /etc/puppet/



  4. On the puppetmaster check out the tutorial3 bzr branch:



    bzr branch --use-existing-dir lp:~mathiaz/+junk/uec-ec2-puppet-config-tut3 /etc/puppet/

    You'll get a conflict for the puppet.conf file. You can ignore the conflict as the puppet.conf file from the branch is the one that supports an external node classifier:
    bzr resolve /etc/puppet/puppet.conf



Edit the node classifier script scripts/node_classifier.py to set the correct location of your S3 bucket.

Note that the script is set to return 1 if the certificate name doesn't have a corresponding file in the S3 bucket. You may want to change the return code to 0 if you want to use the normal nodes definition. See the puppet external node documentation for more information.

The puppetmaster configuration in puppet.conf has been updated to use the external node script.

There is also the login-allowed class defined in the manifests/site.pp file. It sets the authorized key file for the ubuntu user.

On the puppetmaster edit manifests/site.pp to update the public key with your EC2 public key. You can get the public key from ~ubuntu/.ssh/authorized_key on the puppetmaster.

To bootstrap the new puppetmaster configuration run the puppet client:
sudo puppet --node_terminus=plain /etc/puppet/manifests/puppetmaster.pp

Note that you'll have to set the node_terminus to plain to avoid calling the node classifier script when configuring the puppetmaster itself. Otherwise the puppet run would fail since the puppetmaster certificate name (which defaults the to fqdn of the instance) doesn't have a corresponding file in the S3 bucket.

We have now our puppetmaster configured to look up the node classification for each puppet client.



Update start_instance.py to provide a node definition


It's time to update the Cloud conductor to provide the relevant node classification information whenever it starts a new instance.

Update the bzr branch on the Cloud conductor system:
bzr pull --remember lp:~mathiaz/uec-puppet-config-tut3

The start_instance.py script has been updated to write the node classification information when it creates the instance file in the S3 bucket. That information is actually set in the start_instance.yaml file under the node key. All of the node classification information expected by the puppetmaster from the external node classifier script is set under the node key in start_instance.yaml. See the puppet external node documentation for more information on the information that can be provided by the external node script.

Review the start_instance.yaml file to make sure the S3 bucket name, the puppetmaster server IP and CA certificate are still valid for your own setup.

Start an instance:
./start_instance.py -c start_instance.yaml AMI_NUMBER

Following /var/log/syslog you should see something similar to this:
Apr 7 19:15:37 domU-12-31-39-07-D6-52 puppetmasterd[1644]: 77ad2a3c-5d52-4ca7-9fea-b99b767b09d0 has a waiting certificate request

The instance has booted and registered with the puppetmaster.
Apr 7 19:16:01 domU-12-31-39-07-D6-52 CRON[2188]: (root) CMD (/usr/local/bin/check_csr --log-level=debug https://mathiaz-puppet-nodes-1.s3.amazonaws.com)
Apr 7 19:16:02 domU-12-31-39-07-D6-52 check_csr[2189]: DEBUG: List of waiting csr: 77ad2a3c-5d52-4ca7-9fea-b99b767b09d0
Apr 7 19:16:02 domU-12-31-39-07-D6-52 check_csr[2189]: DEBUG: Checking 77ad2a3c-5d52-4ca7-9fea-b99b767b09d0
Apr 7 19:16:02 domU-12-31-39-07-D6-52 check_csr[2189]: DEBUG: Checking url https://mathiaz-puppet-nodes-1.s3.amazonaws.com/77ad2a3c-5d52-4ca7-9fea-b99b767b09d0
Apr 7 19:16:03 domU-12-31-39-07-D6-52 check_csr[2189]: INFO: Signing request: 77ad2a3c-5d52-4ca7-9fea-b99b767b09d0

The puppetmaster checked if the client request is expected and signs it.
Apr 7 19:17:39 domU-12-31-39-07-D6-52 node_classifier[2240]: DEBUG: Checking url https://mathiaz-puppet-nodes-1.s3.amazonaws.com/77ad2a3c-5d52-4ca7-9fea-b99b767b09d0
Apr 7 19:17:39 domU-12-31-39-07-D6-52 node_classifier[2240]: INFO: Getting node configuration: 77ad2a3c-5d52-4ca7-9fea-b99b767b09d0
Apr 7 19:17:39 domU-12-31-39-07-D6-52 node_classifier[2240]: DEBUG: Node configuration (77ad2a3c-5d52-4ca7-9fea-b99b767b09d0): classes: [login-allowed]
Apr 7 19:17:39 domU-12-31-39-07-D6-52 puppetmasterd[1644]: Compiled catalog for 77ad2a3c-5d52-4ca7-9fea-b99b767b09d0 in 0.01 seconds

The puppetmaster compiled a manifest for the client according to the information provided by the node classifier script.

Make sure that the instance that has been started doesn't have any ssh key associated with it:
euca-describe-instances

Make a note of the instance ID and its public DNS name.

Login into the instance:


  1. Run euca-get-console-output instance_ID to get the ssh fingerprint. You may need to scroll back to get the fingerprints.




  2. Login into the instances using your EC2 public key:



    ssh -i ~/.ssh/ec2_key ubuntu@public_dns





Conclusion


The start_instance.py script is currently very simple and should be considered as a proof of concept.

Storing the node classification information into an S3 bucket makes it also easy to edit the content of the file. It also provides an easy way to get a list of the nodes that have been started by the Cloud Conductor as well as their classification.

If you look at the start_instance.py script you'll notice that the ACL on the S3 bucket is 'public-read'. That means anyone can read the list of your nodes as well as the list of classes and other node classification information for each of them. You may wanna use S3 private url instead.

We now have a puppet infrastructure where instances are started by a Cloud conductor in order to achieve a specific task. These instances automatically connect to the puppetmaster to get configured automatically for the task they've been created for. All of the instances configuration is stored in a reliable and scalable system: S3.

With instances being created on demand our puppet infrastructure can grow quickly. The puppetmaster can easily be responsible for managing hundreds of instances. Next we'll have a look at how improving the performance of the puppetmaster.