Extending Altimeter

Altimeter is designed to be extendable. Adding new fields for resource types which are already scanned and graphed is usually straightforward. Adding new resource types which are not yet scanned and graphed involves a bit more work but also usually simple.

Background

When the Altimeter scan runs (bin/aws2json.py) it uses a set of ResourceSpec classes to gather data via AWS APIs (using boto3) and persist it into an intermediate JSON format which can be converted to RDF.

Each of these classes must define attributes and methods which define how exactly data is collected (what boto3 calls, massaging of response data), how the collected data relates in a graph (a Schema) and taxonomical information (resource type name - e.g. “instance”, resource service name - e.g. “ec2”).

Resource Specs

Each AWS resource type is represented by a subclass of altimeter.aws.resource.resource_spec.AWSResourceSpec.

Subclasses of AWSResourceSpec must implement:

  • An attribute service_name which should match the name of the resource’s associated service as represented in an ARN for this resource type. For instance for EC2 instances this is “ec2” to match the service portion of ec2 ARNs “arn:${Partition}:ec2:${Region}:${Account}:instance/${InstanceId}”.

  • An attribute type_name which should match the type name as represented in an ARN for this resource type. For instance, in altimeter.aws.resource.ec2.instance.EC2InstanceResourceSpec type_name is set to “instance” to match the ARN format “arn:${Partition}:ec2:${Region}:${Account}:instance/${InstanceId}”.

  • An attribute schema of type altimeter.core.graph.schema.Schema. The schema attribute describes how to map the output of the boto3 list/describe api call for this resource into fields which define how the resource is graphed.

  • A method list_from_aws() which performs the describe/list boto3 calls for this resource, compiles a dictionary with resource ARNs as keys and resource description dicts as values and wraps them in an altimeter.aws.resource.resource_spec.ListFromAWSResult object.

For each AWSResourceSpec class Altimeter gathers data using list_from_aws() and uses the provided Schema object to convert it to intermediate JSON.

At this point it’s instructive to look at an example, the below is an abbreviated version of altimeter.aws.resource.ec2.instance.EC2InstanceResourceSpec:

class EC2InstanceResourceSpec(EC2ResourceSpec):
    type_name = "instance"
    schema = Schema(
        TransientResourceLinkField("ImageId", EC2ImageResourceSpec),
        ScalarField("InstanceType"),
        AnonymousDictField("State", ScalarField("Name", "state")),
        ScalarField("PublicIpAddress", optional=True),
        ResourceLinkField("VpcId", VPCResourceSpec, optional=True),
        TagsField(),
    )

    @classmethod
    def list_from_aws(
        cls: Type[T], client: BaseClient, account_id: str, region: str
    ) -> ListFromAWSResult:
        paginator = client.get_paginator("describe_instances")
        instances = {}
        for resp in paginator.paginate():
            for reservation in resp.get("Reservations", []):
                for instance in reservation.get("Instances", []):
                    resource_arn = cls.generate_arn(account_id, region, instance["InstanceId"])
                    instances[resource_arn] = instance
        return ListFromAWSResult(resources=instances)

As mentioned above, the type_name is set to “instance” to match the ARN format of EC2 instances:

    type_name = "instance"

Note that the service_name attribute mentioned above is implemented in the parent class EC2ResourceSpec which is a direct subclass of AWSResourceSpec.

Next lets look at the list_from_aws() function. In general this function should generate a dictionary representing all resources of this type, where keys are ARNs and values are dictionaries containing information about the resource:

{'resource_1_arn': resource_1_dict,
 'resource_2_arn': resource_2_dict,
 ...}

Here is the implementation for our sample EC2ResourceSpec class:

    @classmethod
    def list_from_aws(
        cls: Type[T], client: BaseClient, account_id: str, region: str
    ) -> ListFromAWSResult:
        paginator = client.get_paginator("describe_instances")
        instances = {}
        for resp in paginator.paginate():
            for reservation in resp.get("Reservations", []):
                for instance in reservation.get("Instances", []):
                    resource_arn = cls.generate_arn(account_id, region, instance["InstanceId"])
                    instances[resource_arn] = instance
        return ListFromAWSResult(resources=instances)

This is a fairly typical implementation - it uses the appropriate list/describe boto3 function for this resource type and massages the data into a dictionary of ARNs to resource dictionaries. It then returns a ListFromAWSResult object which wraps this dictionary.

The dictionary in this case will look something like the below (abbreviated):

{ 'ec2-instance-1-arn':
  { 'AmiLaunchIndex': 123,
     'ImageId': 'string',
     'InstanceId': 'string',
     'InstanceType': 't1.micro|t2.nano|t2.micro|t2.small|t2.medium|...',
     'LaunchTime': datetime(2015, 1, 1),
     'PublicIpAddress': 'string',
     'State': {
         'Code': 123,
         'Name': 'pending|running|shutting-down|terminated|stopping|stopped'
     },
     'VpcId': 'string',
     'Tags': [
         {
             'Key': 'string',
             'Value': 'string'
         },
     ],
   },
  'ec2-instance-2-arn':
    { ...
    },
   ...
}

Finally lets examine the schema:

    schema = Schema(
        TransientResourceLinkField("ImageId", EC2ImageResourceSpec),
        ScalarField("InstanceType"),
        AnonymousDictField("State", ScalarField("Name", "state")),
        ScalarField("PublicIpAddress", optional=True),
        ResourceLinkField("VpcId", VPCResourceSpec, optional=True),
        TagsField(),
    )

Altimeter uses this schema definition to translate the output of list_from_aws() into intermediate JSON. The intermediate JSON contains graph relational information about the object and its attributes.

A Schema consists of a list of Field objects. In this case we are using a few different kinds of fields. For all but one (TagsField) the first argument is a key as returned by list_from_aws(). This argument is known as the source_key.

For example line 5 contains a ScalarField which uses the source_key “InstanceType”. ScalarField is the simplest kind of field, it is used when a value is a simple string with no relational data:

        ScalarField("InstanceType"),

Line 8 contains a slightly less simple field type - ResourceLinkField:

        ResourceLinkField("VpcId", VPCResourceSpec, optional=True),

A ResourceLinkField is a field where the value contains the ID of another resource which is also in the graph. Since we also graph VPCs using the VPCResourceSpec type, we can represent a link between an EC2 instance and its correseponding VPC using a ResourceLinkField on the parent (EC2ResourceSpec) referencing the child (VPCResourceSpec) as the second argument of the ResourceLinkField. Note that there is an additional keyword argument optional which indicates this field does not always appear.

The various types of fields and how they are used are documented in the API documentation, see altimeter.core.graph.field. Numerous examples of their use are also provided in the AWSResourceSpec subclasses in altimeter/aws/resource.

Modifying an Existing Resource Type

Note that the Schema of an AWSResourceSpec subclass does not necessarily include all the data that is in the output of list_from_aws(). Some keys are just not all that interesting to us at this time and are not worth graphing. In general we add new fields in the graph as we need them.

Adding a new field is as simple as adding a new Field object to the AWSResourceSpec’s schema attribute. For example, if we would additionally like to graph the subnet of an ec2 instance we could add a field to pull the SubnetId field which is provided in the describe_instances call in list_from_aws(). Since we have an AWSResourceSpec class for subnets (SubnetResourceSpec) we can use a ResourceLinkField. The schema now looks like:

    schema = Schema(
        TransientResourceLinkField("ImageId", EC2ImageResourceSpec),
        ScalarField("InstanceType"),
        AnonymousDictField("State", ScalarField("Name", "state")),
        ScalarField("PublicIpAddress", optional=True),
        ResourceLinkField("VpcId", VPCResourceSpec, optional=True),
        ResourceLinkField("SubnetId", SubnetResourceSpec, optional=True),
        TagsField(),
    )

As with the VPC field the new subnet field has the optional flag set as not all EC2 instances have subnets.

Testing Changes

The quickest way to test a change is to use the provided bin/scan_resource.py script. This script will run a scan of a single AWSResourceSpec class against a single specified AWS region using the currently configured AWS credential profile and write scan output to STDOUT. For example after the subnet addition above, running

bin/scan_resource.py EC2InstanceResourceSpec us-east-1

in an account 012345678901 with a single ec2 instance could return something like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
{
  "resources": [
    {
      "type": "aws:ec2:instance",
      "link_collection": {
        "simple_links": [
          {
            "pred": "instance_type",
            "obj": "t2.2xlarge"
          },
          {
            "pred": "state",
            "obj": "stopped"
          }
        ],
        "multi_links": null,
        "tag_links": [
          {
            "pred": "Name",
            "obj": "my-instance"
          }
        ],
        "resource_links": [
          {
            "pred": "vpc",
            "obj": "arn:aws:ec2:us-west-2:012345678901:vpc/vpc-01234567890abcdef",
          },
          {
            "pred": "subnet",
            "obj": "arn:aws:ec2:us-west-2:012345678901:subnet/subnet-abcdef01234567890",
          }
        ],
        "transient_resource_links": [
          {
            "pred": "image",
            "obj": "arn:aws:ec2:us-east-1:012345678901:image/ami-abcd1234",
          }
        ]
      }
    }
  ]
}

Adding a New AWSResourceSpec

Adding a new resource type to the graph consists of adding a new class which is a subclass of AWSResourceSpec.

These classes live in altimeter.aws.resource. They are organized by AWS service (e.g. “ec2”, “rds”, “s3”, etc).

The service packages (for instance altimeter.aws.resource.ec2) define a subclass of AWSResourceSpec which sets the service_name attribute. In the case of ec2 this is altimeter.aws.resource.ec2.EC2ResourceSpec:

"""AWSResourceSpec subclass for ec2 resources."""

from altimeter.aws.resource.resource_spec import AWSResourceSpec


class EC2ResourceSpec(AWSResourceSpec):
    """AWSResourceSpec subclass for ec2 resources."""

    service_name = "ec2"

If the resource you wish to add is not within a service already defined under altimeter.aws.resource you will need to create a class similar to the EC2ResourceSpec class above.

When adding a new AWSResourceSpec subclass there are a few things you will need to do:

  • As alluded to above your class should subclass an existing service-level AWSResourceSpec class, for example EC2ResourceSpec.

  • Set the type_name attribute using the name as presented in an ARN of the specific resource type (e.g. instance).

  • Implement list_from_aws() to gather data and produce a ListFromAWSResult.

  • Set the schema attribute to a Schema object with the appropriate fields.

  • Add your class to altimeter.aws.scan.settings.RESOURCE_SPEC_CLASSES. This contains a list of enabled scan resource types.

  • Test using bin/scan_resource.py.

  • Add tests for the new type in tests.unit.altimeter.aws.resource, preferably using moto. See examples in tests.unit.altimeter.aws.resource.ec2 for examples of such tests.