There’s a bunch of ways to spin up MapR in AWS. Primarily, I use ansible to do it. Why spin up instances and install them myself? Mainly it’s because I’m not always using AWS, and I need a repeatable way to install clusters under any circumstance, whether on-prem or cloud.

The lowest common demoninator in this case is to think of this as getting handed a bunch of machines, and installing MapR on them. Originally I built ansible roles to install MapR as a way of avoiding what was at the time a fussy procedure. Over time, it’s become complete enough that I can use the roles to deploy clusters of any size in any environment (sometimes with some tweaking).

The playbooks and roles do a few useful things, and they can be used together or separately. This post is intended to help you make sense of the repo should you find your way to it on my github.

About the Roles

They can set up AWS EC2 instances, execute a minimal pre-install cluster validation, install MapR with or without security (with security is the default), configure some ecosystem projects to work together, and it can run some smoke tests of the cluster after the installation and configuration completes.

Extensions and fixes are welcomed!

At the moment, I’m waffling between using Ansible 2.x’s json callback plugin and the default skippy. I really like the idea of getting JSON output that I can store and query later with Apache Drill, but I don’t love the lack of feedback as the plays execute - the json plugin doesn’t say anything at all until all the plays complete or they fail.

So you might not see any output for minutes at a time if the json plugin is enabled. You can change this by switching skippy in for json in the ansible.cfg provided.

Bootstrapping

The bootstrap playbook is simple as its only job is to spin up some Amazon EC2 instances. There’s nothing special about this process or these instances - it’s just the method I have settled on as the quickest way to get instances up to install MapR on. The instances are usually either Amazon Linux (not officially supported by MapR, but my experience has been that it works great) or CentOS. I don’t use Ubuntu much, so the playbooks don’t work with Ubuntu. I’d like to change that at some point, but I have not had the need to date.

I could use the Marketplace option for deploying MapR AMIs, but that costs more, and we end up in the odd situation where MapR is paying MapR to run MapR. Also, by spinning up generic OS AMIs, I get to test the installation process on a pretty frequent basis.

So that’s why this spins up vanilla CentOS or RHEL or Amazon Linux.

The bootstrap playbook needs configuration. You copy group_vars/all.example to group_vars/all and edit it. You need to make sure you set your security groups, keypair, regions, and so on correctly. You also should have installed awscli and run aws configure to establish your credentials.

Also note that you can decide how many instances to create, and whether to assign them public IPs. Public IPs are not required as long as you have a way to access your instance’s public IPs, such as via a VPN or ssh tunnel.

Usually, if you get some of these configuration parameters wrong, AWS will yell at you with an error message that helps you figure out what’s wrong.

To bootstrap some nodes:

ansible-playbook mapr_aws_bootstrap.yml > ansible-mapr_bootstrap_result.json

If this succeeds, you’ll have some new instances running, and you’ll have a new inventory file called inventory.py. This is a python script, which generates JSON that is used by ansible to populate the inventory. The inventory will have a number of groups that contain the hosts that will run services.

You can confirm that the instances are up and reachable by ansible after a few minutes (change the username as needed - AMZ linux uses ec2-user, CentOS uses centos):

ansible -i inventory.py -u centos -m ping all

Hopefully you have all green output from this.

Prerequisites

MapR requires some things to be in place before you install and run the cluster. They’re well documented, and actually automated in the latest installer.

You can run these prereqs by executing the tasks tagged with preinstall in the mapr_install playbook.

ansible-playbook -i inventory.py -u centos --tags preinstall mapr_install.yml >\
  ansible-mapr_preinstall_result.json

Pre-Install validation

My colleague John Benninghoff has a set of scripts that are useful for validating a cluster hardware and performance. The tasks under the validation tag will execute a subset of these:

Triad memory test (memory throughput)
fio for sequential disk throughput
rpctest for MapR RPC throughput

ansible-playbook -i inventory.py -u centos --tags validation mapr_install.yml >\
  ansible-mapr_validation_result.json

MapR Installation and Configuration

Now for the main event - installation. The mapr_install.yml playbook will also do the whole installation. Now we can invoke ansible-playbook with some options to skip the above steps and carry on with the installation:

ansible-playbook -i inventory.py -u centos \
  --skip-tags preinstall,validation mapr_install.yml >\
    ansible-mapr_install_result.json

Once that’s done, you can print out some of the URLs for your cluster using the included printurls.sh script, which is a simple wrapper around jq that reads the inventory output:

$ ./printurls.sh
[]
[
  "MCS: https://172.16.2.153:8443",
  "MCS: https://172.16.2.151:8443"
]
[
  "Solr Admin: http://172.16.2.152:8983/solr",
  "Solr Admin: http://172.16.2.153:8983/solr",
  "Solr Admin: http://172.16.2.151:8983/solr"
]
[
  "OpenTSDB: http://172.16.2.151:4242"
]
[
  "Hue: http://172.16.2.153:8888"
]
[
  "Mesos Master: http://172.16.2.153:8080",
  "Mesos Master: http://172.16.2.151:8080"
]
[
  "Resource Manager: https://172.16.2.153:8090",
  "Resource Manager: https://172.16.2.151:8090"
]

Powering Off

If you want to power off the instances when they’re not needed, you can use the following invocation:

ansible-playbook -i inventory.py mapr_aws_poweroff.yml

Powering On

When you want them back on, use:

ansible-playbook -i inventory.py mapr_aws_poweron.yml

Instance Termination

If you’re going to be using this cluster, go right ahead! It should be ready to go.

When you want to get rid of the cluster, you can terminate the instances as follows:

ansible-playbook -i inventory.py mapr_aws_terminate.yml

Be aware that if you’ve modified the volume configuration and you left out the delete_on_termation flag, the volumes will remain even when the instances are terminated. In this case, you’ll have to clean them up yourself. You’ll continue to be charged for the use of the storage, whether they’re attached to instances or not, so this is something to remain vigilant about.

Reading the JSON output

I’ll write post at some point where I’ll use Drill to query the output.

But for now, the main thing to be looking for in the output is that there are no failures.

If you have jq installed and you’re using the JSON output, you can do the following to check for failures:

jq '.stats[].failures' < ansible-mapr_preinstall_result.json

You should see a bunch of zeros - one for each host in the inventory.