Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
SPDX-License-Identifier: Apache-2.0

# Visualization - Grouping and Appearance Customization

In this notebook we'll examine how to use the `--group-by` feature of the `%%gremlin` magic to group vertices and how to customize the appearance of these groups. 

In addition, we will look at how to use the `--display-property`, `--edge-display-property`, and `--label-max-length` options to modify the appearances of node and edge labels.

**Note** This notebook builds on the visualization patterns explained in [Air-Routes-Gremlin](Air-Routes-Gremlin.ipynb) notebook, so if you are not familiar with how the visualzation feaure works we recommend reviewing that prior to beginning this notebook.

This notebook uses the air-routes dataset to demonstrate some of the different options available. If your cluster does not already contain this data then run the cell below to load the data into your Neptune cluster otherwise you can skip the next cell.

In [None]:
%seed --model Property_Graph --language gremlin --dataset airports --run

With the air-routes data now loaded we are ready to begin looking at how to group and customize our result visualization. 

## Query requirements for customizable visualizations

The functionality of the `--display-property` and `--edge-display-property` options is dependent on accessing the values corresponding to the vertex and/or edge property name keys specified. As such, we must enable them to find the property key/value pairs required by ensuring that they are included in the query's results set.

First, we need to ensure that our query is set up as a graph traversal. This must always start with the graph traversal source, `g.V()`. The traversal itself begins with the `has` step that retrieves our set of starting vertices, followed by the `outE` step, which finds all of the edges originating from those vertices. 

In addition, we will need an `elementMap` step to retrieve the edge properties in a map format that our visualizer can understand. `elementMap` representations of edges also return basic label and id informations about the vertices that the edges connect, so we do not need to explicitly return vertices at this time.

Putting all of these steps together, we might get a query like:

In [None]:
%%gremlin -de dist
g.V().has('code','SAF').outE().elementMap()

If we click on the vertices and nodes and open the Details view (bulleted list icon in the top right of the visualization), we can see that `outE` and `elementMap` have retrieved full sets of properties, but only for the edges in the graph. If we want to retrieve the same for vertices, we need to also include an `inV` traversal step to retrieve the set of the vertices that the edges are directed into.

The traversal is getting a little more complicated now, so we also need the `path` step to help keep track of the traversal's history. To ensure that we are getting the full property sets for all of the edges and vertices, `path` will be accompanied by a `by` modulator, which will post process the path elements per a specified return format.

There are a few options for what we can put in the `by` modulator:
- `path().by(valueMap(true))`
- `path().by(valueMap().with(WithOptions.tokens))`
- `path().by(elementMap())`

When choosing between the `valueMap` and `elementMap` steps, note that the requirements to properly visualize queries will differ slightly.

If using `valueMap(true)`, we will need to specify the `-p` query hint, followed by a comma-separated sequence of the same Gremlin traversal steps used in the query itself.

Run the three cells below to observe the effects of `-p` being missing and present for a `valueMap` return, alongside some of our customization options.

In [None]:
my_node_labels = '{"airport":"city"}'
my_edge_labels = '{"route":"dist"}'

In [None]:
%%gremlin -d $my_node_labels -l 20 -g country -de $my_edge_labels
g.V().has('code','LHR').outE().inV().path().by(valueMap(true)).limit(5)

In [None]:
%%gremlin -p v,oute,inv -d $my_node_labels -l 20 -g country -de $my_edge_labels
g.V().has('code','LHR').outE().inV().path().by(valueMap(true)).limit(5)

If we elect to use `elementMap` instead, we can forgo the `-p` option; `elementMap` already returns enough information about edges for path patterns to be determined automatically. Run the query below, and observe that the visualization returned is the same the one with `-p` and `valueMap` above.

In [None]:
%%gremlin -d $my_node_labels -l 20 -g country -de $my_edge_labels
g.V().has('code','LHR').outE().inV().path().by(elementMap()).limit(5)

`elementMap()` is clearly simpler to use than `-p` and `valueMap(true)`, and should be utilized wherever possible. But the two are not always interchangeable, and often times `-p` is needed to specify path patterns where `elementMap()` is insufficient. The choice will always depend on the individual query's requirements.

## Node Property Options

Visualizing the results of a query will result in a graph where each vertex contains an identifying property. By default, the property used is generated automatically from the label property. If desired, it can instead abide by the property (or set of label properties) specified using the `--display-property` or `-d` parameter, followed by the property name. 

Additionally, labels are truncated after exceeding a default maximum length. This maximum length value can be modified by using the `--label-max-length` or `-l` parameter, followed by the desired length in characters.


### Default Node Properties

By default, the property used is the value of each vertex's label property. Run the query below to observe the default labeling for results set of all vertices connected to Cozumel.

In [None]:
%%gremlin -p v,inv
g.V().hasLabel('airport').has('code','CZM').both().path()

The results show us only three distinct labels, corresponding to each of the label properties `airport`, `country`, and `continent`.

In some cases, a label property may not present may not be present in some or all of the vertices returned in the results set (i.e. T.label key/value pairs have not been returned in the results). 

This will result in a concatenated list of all of the vertex's properties being displayed instead, as we can observe with the following query.

In [None]:
%%gremlin -p v,inv
g.V().hasLabel('airport').has('code','CZM').both().path().by(valueMap())

### Specifying a Single Node Property for all Vertices

There may also be cases where it is desired to show a specific vertex property as the label on each graph node. This can be done by using the `--display-property` or `-d` parameter within the `%%gremlin` line magic. The property name needs to be a case sensitive match for the name in the vertex.

**Note** Finding the property name can be accomplished using the Details View and clicking on a vertex. This includes the `T.id` and `T.label` properties.

Let's run the following query to see the results of displaying the results set of all vertices connected to Cozumel, while also specifying that we want to display the `code` property on every vertex.

In [None]:
%%gremlin -p v,inv -d code
g.V().hasLabel('airport').has('code','CZM').both().path().by(valueMap(true))

Looking at the resulting visualized graphs, each individual node can now be identified by its distinct code.


### Using Different Node Properties for each Label

Instead of displaying the values of a single property, we can also choose to specify different properties to display for each type of label. This feature can be useful if different labels in the graph have different property sets, or if you want only a subset of vertices under certain labels to have the displayed property modified.

We will first need to define a JSON-format string variable in the following format, containing each label and its corresponding propreties to be displayed:

`display_var = '{"label_1":"property_1","label_2":"property_2"}'`

Let's try using this to define different display properties for the `airport`, `country`, and `continent` labels.

In [None]:
display_var = '{"airport":"code","country":"desc","continent":"desc"}'

Now, we can take the previous query and pass `display_var` into the displayed properties parameter via the notebooks line variable injection functionality.

In [None]:
%%gremlin -p v,inv -d $display_var
g.V().hasLabel('airport').has('code','CZM').both().path().by(valueMap(true))

In the resulting visualization, we can see that the `airport` vertices retain the `code` property values, while the `country` and `continent` nodes now display their more descriptive `desc` property values, as we desired.

### Specifying the Maximum Label Length

All node labels in the graph abide by a maximum length value, after which they are visually truncated. By default, this value is 10.

You may encounter scenarios where you want to modify the truncation length to either be shorter or longer than the default value. This can be done by adding the `--label-max-length` or `-l` parameter to your Gremlin query, followed by the desired length in characters.

Let's try to visualize a graph of all airports connected to Cozumel, while indicating that we want to display the full airport names on the nodes. Note that the label truncation length is left as default.

In [None]:
%%gremlin -p v,inv -d desc
g.V().hasLabel('airport').has('code','CZM').both().path().by(valueMap(true))

We can see that most of the labels are cut off too early to be adequately descriptive!

Let's try the same query again, this time incorporating the `-l` parameter, followed by a maximum length value of `60`.

In [None]:
%%gremlin -p v,inv -d desc -l 60
g.V().hasLabel('airport').has('code','CZM').both().path().by(valueMap(true))

Now, we can see the full name of every airport on the graph.

## Edge Property Options

Some graph visualizations of Gremlin magic queries will contain edges that are also accompanied by a visual label. The property displayed on each edge defaults to the edge's label property. 

We can specify what property values to display on the visualized edges in a similar fashion to how we customized node labels in the previous section. This is done by using the `--edge-display-property` or `-de` parameter, followed by the property name, or a json formatted string containing label-value:property-name pairs. 

### Specifying a single property for all edge labels

Using a single property name as the parameter value for `--edge-display-property` will result every edge label being changed to property. Again, note that you can find the properties available by using the details view and clicking on any edge in the graph visualization.

Let's observe the results of appending `-de dist` to a query that retrieves all routes from Austin, USA to Wellington, NZ. Note that we use `by(valueMap(true))` in the query to ensure that we are retrieving all of the edge properties.

In [None]:
%%gremlin -p v,oute,inv,oute,inv,oute,inv -de dist
g.V().has('airport','code','AUS').
 repeat(outE().inV().simplePath()).
 until(has('code','WLG')).
 limit(5).
 path().
 by(valueMap(true)).
 by(valueMap(true))

In both graphs, we can see that every edge is now indicated by its outgoing vertex via the `outV` property.

### Specifying edge label properties in JSON format

Making use of the cell variable injection feature, we can also define a JSON-format string variable containing our desired label-value:property-name pairs, and pass it into `-de`. This follows the format used when injecting JSON variables into `-d` and `-g`:

`display_edge_var = '{"label_1":"property_1","label_2":"property_2"}'`

Let's re-define the same label-value:property-name pair used in the last query, this time in the JSON string variable format.

In [None]:
display_edge_var = '{"route":"dist"}'

Then, we can pass `$display_edge_var` into `-de`, and observe that we get the same result.

In [None]:
%%gremlin -p v,oute,inv,oute,inv,oute,inv -de $display_edge_var
g.V().has('airport','code','AUS').
 repeat(outE().inV().simplePath()).
 until(has('code','WLG')).
 limit(5).
 path().
 by(valueMap(true)).
 by(valueMap(true))

This is also an example of a query where we can interchange `elementMap()` for `valueMap(true)`, removing the need for `-p` path pattern hints.

In [None]:
%%gremlin -de $display_edge_var
g.V().has('airport','code','AUS').
 repeat(outE().inV().simplePath()).
 until(has('code','WLG')).
 limit(5).
 path().
 by(elementMap()).
 by(elementMap())

## Grouping Options

When visualizing the results of a query the grouping of the vertices follows a couple of rules:

* If no grouping property is specified and the label for a vertex is returned then the vertices are grouped by label.
 * If the label does not exist in the results then all vertices are grouped into a single group
* To group vertices by a specific property use the `--group-by` or `-g` switch on the `%%gremlin` line magic. 
 * The name provided must be a case sensitive match to the property name of the vertex (i.e. `Name` will not group by the `name` property)
 * If the name specified does not match any property of a vertex then that vertex is grouped in a default group
* If you would like to not group by any property then the `--ignore-groups` flag will not group the vertices

### Default Grouping

Let's take a look at what our air-routes graph looks like if we run a query and use the default grouping. Try running the query below and clicking on the Graph tab to see all the vertices connected to Cozumel.

In [None]:
%%gremlin

g.V().has('airport','code','CZM').both().path()

From the results we see three groups each represented by a different color. Each of the vertices is added to a group based on it's label. 

Grouping also occurs when we use the `by()` modulators to specify additional return fields. Running the query below to see how the items are grouped when we find all connected vertices for Cozumel and return all their properties and tokens (T.id and T.label).

In [None]:
%%gremlin

g.V().has('code', 'ANC').both().path().by(valueMap().with(WithOptions.tokens))

From the resultant visualization we see that we still have three groups of vertices, grouped by the label. 

However, what would happen if we don't return the label in the results. Run the query below to see how the items are grouped when we find all connected vertices for Cozumel and return just their properties, not the label.

In [None]:
%%gremlin

g.V().has('code', 'ANC').both().path().by(valueMap())

From the results we see that all our vertices are in a single group now and colored the same. When our results do not contain the label for the vertex we must then specify the property we want to group by. 

### Specifying the property to group by

To specify the property to group by we use either the `--group-by` or `-g` switch followed by the property name. The property name needs to be a case sensitive match for the name in the vertex. 

**Note** Finding the property name can be accomplished using the Details View and clicking on a vertex. This includes the `T.id` and `T.label` properties.

Let's run the query below and see what our grouping looks like if we find all routes from Cozumel and group them by their `country`.

In [None]:
%%gremlin -p v,inv -g country
g.V().hasLabel('airport').has('code','CZM').out('route').path().by(valueMap())

As we see our results are now split into three groups (MX/US/CA) based on the country property. 

Let's run the query below and see what our grouping looks like if we find all connections to Cozumel and group them by their `country`.

In [None]:
%%gremlin -p v,inv -g country
g.V().hasLabel('airport').has('code','CZM').both().path().by(valueMap())

The resultant graph looks very similar to the previous one except that their is now four groups instead of three. Unlike in the previous example where we only returned `route` connections, this query returned all connected vertices so our resultset came back with a `continent` and `country` vertex in addition to the `airport` vertices. Neither the `country` nor `continent` vertices have a `country` property. When a vertex does not contain the property specified to group by then that vertex is put into a default group. This same behavior occurs when you use the incorrect casing for the property name, as shown in the query below.

In [None]:
%%gremlin -p v,inv -g Country
g.V().hasLabel('airport').has('code','CZM').out('route').path().by(valueMap())

### Grouping on different properties for each label

While the `continent` and `country` vertices cannot be grouped on properties exclusive to `airport` vertices, we do have the ability to add additional properties with which to form new groups. To do this, we will make use of Jupyter's built-in line magic variable injection to pass in a variable containing a JSON-format string, where we can specify what property we want to use to group each of the individual vertex labels. 

The group-by variable must be defined in following format: 

`groups_var = '{"label_1":"property_1","label_2":"property_2"}'`

Let's try defining group-by with individual properties for `airport`, `continent`, and `country`.

In [None]:
groups_var = '{"airport":"country","country":"desc","continent":"code"}'

Then, we can rerun our previous query, this time with `$groups_var`, and see that the `continent` and `country` vertices now belong to their own groups based on the specified properties. 

**Note** we also need to use `valueMap(true)` so that the label names and relevant properties can be accessed for group matching.

In [None]:
%%gremlin -p v,inv -g $groups_var
g.V().hasLabel('airport').has('code','CZM').both().path().by(valueMap(true))

### Ignoring Grouping

In certain situations you may want your visualization to contain no groups. This can be accomplished by adding the `--ignore-groups` flag to your query as shown by running the next cell.

In [None]:
%%gremlin -p v,inv --ignore-groups
g.V().hasLabel('airport').has('code','CZM').out('route').path().by(valueMap())

Now that we know how to group our vertices together let's take a look at how to customize the appearance of these groups, and other aspects of the graph.

## Appearance Customization

The Amazon Neptune Notebooks use an open source library called [Vis.js](https://github.com/visjs) to assist with drawing the graph diagrams. Vis.js provides a rich set of customizable settings. The documentation for most of the visualization settings used in this notebook can be found [here](https://visjs.org/) and in particular the graph network drawing documentation can be found [here](https://visjs.github.io/vis-network/docs/network/). 

To see the current settings used by your notebook you can use the `%graph_notebook_vis_options` line magic command. Try running the cell below. 

To change any of these settings create a new cell and use `%%graph_notebook_vis_options` to change them (note the two percent signs indicating a cell magic).

To customize the appearance of node groups, we want to use the [groups](https://visjs.github.io/vis-network/docs/network/groups.html#) options. There is a nearly endless amount of customization you can make to the groups using the options provided, but we will demonstrate some of the most common ones in the next few sections.

### Specifying Group Colors

Specifying the colors of groups is probably one of the most common customizations performed. To accomplish this we specify the options using the `%%graph_notebook_vis_options` magic as shown below. For each of the associated group names we use the exact property value followed by the options you would like to use for that group.

**Note** Finding the exact property value for the group name can be accomplished by looking at the data returned in the Console tab.

Run the next two cells to set the colors for our three groups to red for the airports in Canada, green for the airports in the US, and blue for the airports in Mexico. In the case of color, the values can be specified by name, RGBA value, or Hex value.

In [None]:
%%graph_notebook_vis_options
{
 "groups": {
 "['CA']": {"color": "red"},
 "['MX']": {"color": "rgba(9, 104, 178, 1)"}, 
 "['US']": {"color": "#00FF00"}
 }
}

In [None]:
%%gremlin -p v,inv -g country
g.V().hasLabel('airport').has('code','CZM').out('route').path().by(valueMap())

### Specifying Group Shapes

In addition to specifying the color of the groups you are also able to customize the shape of the vertex using one of the following options. 

The types with the label inside of it are: `ellipse`, `circle`, `database`, `box`, `text`.

The ones with the label outside of it are: `diamond`, `dot`, `star`, `triangle`, `triangleDown`, `hexagon`, and `square`

Run the cells below to see what our visualization looks like with shapes specified.

In [None]:
%%graph_notebook_vis_options
{
 "groups": {
 "['CA']": {"color": "red", "shape": "box"},
 "['MX']": {"color": "rgba(9, 104, 178, 1)" , "shape": "oval"}, 
 "['US']": {"color": "#00FF00", "shape": "star"}
 }
}

In [None]:
%%gremlin -p v,inv -g country
g.V().hasLabel('airport').has('code','CZM').out('route').path().by(valueMap())

### Specifying Group Icons and Images

In addition to specifying shapes icons and images can also be specified to represent our groups. 

Icons can either be from `Ionicons` or `FontAwesome` version 4 or 5. Icons are specified by setting the `shape` to `icon` and then specifyign the `code` for the icon.

Images must be either local to the notebook or publically available on the internet. Images are specified by setting the `shape` to `image` or `circularImage` and then setting the `image` property to the address of the image to display.

Running the two cells below will display the Mexico group using an image of the Mexican flag and the US group using an airplane icon from FontAwesome.

In [None]:
%%graph_notebook_vis_options
{
 "groups": {
 "['CA']": {"color": "red"},
 "['MX']": {"shape": "image", 
 "image":"https://cdn.countryflags.com/thumbs/mexico/flag-round-250.png"},
 
 "['US']": {
 "shape": "icon",
 "icon": {
 "face": "FontAwesome",
 "code": "\uf072",
 "color": "#00FF00"
 }
 }
 }
}

In [None]:
%%gremlin -p v,inv -g country
g.V().hasLabel('airport').has('code','CZM').out('route').path().by(valueMap())

### Specifying Group Sizes

We can also specify a custom value for a group's node size. This is accomplished by setting the `size` property of the group.

**Note** Only shapes that do not have the label inside them are impacted by this property.

In [None]:
%%graph_notebook_vis_options
{
 "groups": {
 "['CA']": {"color": "red", "size": 3},
 "['MX']": {"shape": "image", 
 "image":"https://cdn.countryflags.com/thumbs/mexico/flag-round-250.png"
 , "size": 50},
 
 "['US']": {
 "shape": "icon",
 "icon": {
 "face": "FontAwesome",
 "code": "\uf072",
 "color": "#00FF00"
 }, "size": 37
 }
 }
}

In [None]:
%%gremlin -p v,inv -g country
g.V().hasLabel('airport').has('code','CZM').out('route').path().by(valueMap())

### Specifying Parallel Same-Direction Edge Behavior

There is one more customization option, which allows for parallel, same-direction edges to either be drawn on top of each other (default), or spaced apart.

Let's start by running the following cell to create such a scenario in the air-routes graph.

In [None]:
%%gremlin
g.addV("airport").property(id,"10000").property("type","airport").property("code","FOO").property("country", "BAR").
 addE("route").property(id,"1").from(V("365")).to(V("10000")).property("dist",1000).
 addE("route").property(id,"2").from(V("365")).to(V("10000")).property("dist",2000).
 addE("route").property(id,"3").from(V("365")).to(V("10000")).property("dist",3000).

Now, run the next cell to visualize the new node and edges.

In [None]:
%%gremlin -g country -d code -de dist
g.V().hasLabel('airport').has('code','CZM').outE().inV().has('code','FOO').path().by(elementMap())

Notice that three different edges are returned in the results, but the visualized graph only appears to have one edge.

This is a result of the default value `straightCross` used for the `edges`->`smooth`->`type` vis setting. All of the parallel edges technically exist in the graph visualization, but the `straightCross` setting draws the edges on top of each other, obscuring all but one.

To fix this, we can set the edge smoothing type to `dynamic`. Run the two cells below to observe the result.

In [None]:
%%graph_notebook_vis_options
{
 "edges": {
 "color": {
 "inherit": false
 },
 "smooth": {
 "enabled": true,
 "type": "dynamic"
 },
 "arrows": {
 "to": {
 "enabled": true,
 "type": "arrow"
 }
 },
 "font": {
 "face": "courier new"
 }
 }
}

In [None]:
%%gremlin -g country -d code -de dist
g.V().hasLabel('airport').has('code','CZM').outE().inV().has('code','FOO').path().by(elementMap())

With the `dynamic` setting, all of the edges are now spaced out and clearly visible.

Lastly, clean up the dummy data we inserted for this example.

In [None]:
%%gremlin
g.V().hasId('10000').drop()

## Conclusion

What we have demonstrated in this notebook is only a small set of the options and combinations available for customizing the appearance of groups within the notebook. Please refer to the [Vis.js groups](https://visjs.github.io/vis-network/docs/network/groups.html#) documentation for a complete list of the options available.