# AWS CDK Change Analyzer (C2A) - Engine

`@aws-c2a/engine` is a package that the toolkit consumes to extracts the difference between two 
CloudFormation templates and produce a report of changes, customizable with a rules language. 

## Table of Contents
1. [Overview](#Overview)
2. [Platform Mapping](#Platform-Mapping)
3. [Model Diffing](#Model-Diffing)
4. [Aggregations](#Aggregations)
5. [Rules Processing](#rules-processing)

## Overview

The C2A architecture revolves around 4 main axis.

1. [Platform mapping](#platform-mapping) defines the relationship between a platform and the
[InfraModel](../models/README.md#InfraModel).
2. [Model diffing](#model-diffing) then takes the normalized InfraModels and diffs them
through a similiary algorithm
3. [Aggregations](#aggregations) are then applied to the output of the model diffing
to generalize operations and components
4. A [rule set](#rules-parsing) is parsed and compared against the aggregations to isolate
specific behavior a user wants to target

![c2a - architecture](https://user-images.githubusercontent.com/26902818/124084162-9e19f800-da46-11eb-9c22-42b8f1cf1882.png)

## Platform Mapping

The `platform-mapping` directory holds parsers that transform an artifact into an
[InfraModel](../models/README.md#InfraModel).

### CloudFormation Parser

[template-anatomy]: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/template-anatomy.html

The CloudFormation parser takes any CloudFormation template and generates an
[InfraModel](../models/README.md#InfraModel)

The type of CloudFormation entity ([e.g. Resource, Parameter, Output][template-anatomy])
gets mapped to the type of _Component_. In the case of CloudFormation resources, in particular,
their type gets mapped to the _Component_'s subtype (i.e. an AWS Lambda Function resource
generates a _Component_ with type `Resource` and subtype `AWS::Lambda::Function`).

The CloudFormation parser builds instances of **CFEntity**'s subclasses, which have the
responsibility of properly building the respective _Components_, _Property Values_, and
outgoing _Dependency Relationships_.

![CFParser Component Diagram](https://user-images.githubusercontent.com/26902818/124102721-85b2d900-da58-11eb-92ac-9f7c579e9861.png)

The **CFRef** class extracts references to entities in an entity's declaration, from the
used intrinsic functions and resources' _DependsOn_ field.

The following image is an example of the created relationships:

![CFN Parser](https://user-images.githubusercontent.com/26902818/124098679-aaa54d00-da54-11eb-959a-82266d746428.png)

- References in intrinsic functions and in _DependsOn_ fields are transformed into Dependency Relationships
- Structural Relationships connect resources to their stack

### AWS CDK Parser

Parsing CDK-generated CloudFormation templates begins by using the [CloudFormation parser](#CloudFormation-Parser)
and adding a _Component_ for each CDK Construct (extracted from the CloudFormation resources metadata).
Afterwards, the stack _Component_ and its _Structural Relationships_ are removed and the CDK Construct
Components are connected to the corresponding CloudFormation resource Components, as seen here:

![CDK Parser](https://user-images.githubusercontent.com/26902818/124098672-aa0cb680-da54-11eb-9051-253934faaf34.png)

## Model Diffing

The process of diffing InfraModels is contained in the `model-diffing` directory.

In the context of AWS CDK/CloudFormation, this is where we extract the operations
(changes) that occurred between the old CloudFormation template and the new one.

The basic diff is created in `model-diffing/diff-creator.ts`. It groups components
of the same type and subtype and matches them based on their name and similarity.
This similarity is calculated by comparing the properties of each component,
in `model-diffing/property-diff.ts`.

Since detecting property operations and determining their similarity require the same
underlying logic, they are both done simultaneously in `model-diffing/property-diff.ts`.
A few notes on how this property diffing currently works:

* When calculating similarity, there is currently no distinction between arrays and sets,
so property array order is not considered. In other words, moving elements in an array
as no effect on similarity. However, _Move_ operations are still created if an element
at index 0 is matched with an element at index 1, for example.

* A weight is associated with a given similarity value, which is the number of primitive
values of the structure it applies to. Consider the following:
  ```json
  // BEFORE
  {
    "a": { "b": "string", "c": "string" },
    "d": "string"
  }

  // AFTER
  {
    "a": { "b": "string", "c": "string" },
    "d": "str"
  }
  ```
  In this example, we see that the only difference between the two states is the value
  of key `d`. For simplicity, let's define the similarity between the new and old value
  for key `d` to be `0.5`. The value of key `a` has not changed, thus has a similarity of
  `1`.
  
  We can calculate the similarity of the full properties by doing a weighted average.
  * `a` will have a weight of 4 (two keys and two values with similarity 1)
  * `d` will have a weight of 1 (because it has only 1 primitive value).
  
  The similarity for this example is `1 * (4/5) + 0.5 * (1/5) = 0.9`.

### Change Propagation

`change-propagator.ts` is responsible for taking the observed changes and propagating them:

* Modified properties with _componentUpdateType_ of `REPLACEMENT` or `POSSIBLE_REPLACEMENT`
generate an operation (change) of type _Replace_ for their component.

* Renamed _Components_ have an new _Replace_ operation.

* _Replace_ operations in _Components_ with incoming _Dependency Relationships_ generate an
Update _Operation_ to the source property of such relationships, indicating that a referenced
value may have changed.

## Aggregations

Aggregations are structures that group Operations (changes) in a tree-like structure. based
on their characteristics, according to a given structure. These are used to collapse changes
when presenting them in an interface. Take the following example:

![Aggregations Example](https://user-images.githubusercontent.com/26902818/124138218-54e59a80-da7e-11eb-8e8f-036af63da1f5.png)

These are resulting aggregations that narrow down operations by:
* type and subtype of the affected Component
* type of the operation
* target of the operation: full component or a property

The characteristics that should be grouped at each level, and how, are described in 
`aggregations/component-operation/module-tree.ts`. Aggregation modules define how to split
a group of operations and a module tree is a configuration of these modules that is used
to generate the aggregations.

## Rules Processing

Rules Processing is a core part of the engine, as it is what enables C2A to make decisions
on aggregations and behaviors that arise in the diff. The rules processing can be broken 
down into three main stages.

1. Defining scope. Scope definition is the most complex part of rules processing, and it
acts to define the candidates for all identifiers defined in the `let` bindings. These candidates
are determined through traversing the diff tree and obtaining matches to the query provided as the
value to an identifier. To learn more about `let` bindings, see [`@aws-c2a/rules`](../rules/README.md).

2. Verification. Verification happens after scope definition and mainly deals with conditions
specified in the `where` binding. All conditions have operators that will have a corresponding
handler in `rules/operator-handlers` directory. Verification is crucial for specificity and
drilling down to any type of behavior.

3. Extracting effect. Finally, in order to produce a meaningful change report, we attach any
of our verified candidates for a targeted component to a specific effect (high risk, auto 
approve, etc.).