# SQL Support and Limitations The SQL-Gremlin compiler turns your SQL queries into Gremlin traversals and runs them against your favorite TinkerPop 3 enabled graph database. ## Basic Query Form The driver current support `SELECT` statements of the general form: SELECT [ DISTINCT ] { * | [, ]* } FROM [ WHERE ] [ GROUP BY { [, ]* } ] [ HAVING ] [ ORDER BY { [ DESC ] [, [ DESC ] ]* } ] [ LIMIT limitNumber ] projectItem: [ agg ]* [ . ]* [ [ AS ] columnAlias ] Queries without a `FROM` clause or only using `VALUES` in the `FROM` clause are not supported. A `tableExpression` must specify 1 or more tables as a comma separated list or using `JOIN` keywords. See the [Joins](#joins) section for more information on supported join operations. A `projectItem` in `SELECT` can be a reference to a column, or aggregation expression using the supported aggregation functions listed in [Operators and Functions](#operators-and-functions). A `booleanExpression` is the same but must resolve to a `boolean` value. To order by a value, it must be part of the `SELECT` list. Group by and order by using column aliases is currently not supported. Type Conversion is not supported, and thus `CAST` is not supported. Set operations `UNION`, `INTERSECT` and `EXCEPT` are not supported. Grouping operations using `CUBE`, `ROLLUP` or `GROUPING SETS` are not supported. Ordering using `NULLS FIRST` and `NULLS LAST` or by referencing column ordinals is not supported. ## Identifiers Identifiers are the names of tables, columns, and column aliases in an SQL query. Quoting is optional but unquoted identifiers must start with a letter and can only contain letters, digits, and underscores. Quoted identifiers start and end with double quotes. They may contain virtually any character. To include a double quote in an identifier, use another double quote to escape it. The maximum identifier length, quoted or unquoted, is 128 characters. Identifier matching is case-sensitive and identifiers that match a reserved SQL keyword must be quoted or use fully qualified names. ## Joins The driver only support `INNER JOIN` on two vertices that are connected by an edge. When looking at vertices you will see `_IN_ID` or `_OUT_ID`. Basic literal comparisons are supported in `WHERE` and `HAVING` clauses in conjunction with the `INNER JOIN` clause. Foreign keys are not generated and not exposed in JDBC metadata at this time. ## Data Types The driver recognizes the following SQL data types: - `BOOLEAN` - Boolean literals must be `TRUE` or `FALSE` - `TINYINT` - `SMALLINT` - `INTEGER` or `INT` - `BIGINT` - `DECIMAL` - `REAL` or `FLOAT` - `CHAR` and `VARCHAR` - String literal must be enclosed in single-quotes. - `DATE` ## Operators and Functions Supported operators are listed below. Arithmetic, string, conditional and date operators and functions are currently not support. ### Comparison Operators - `value1 value2` where `op` is one of : `=`, `<>`, `<`, `>`, `<=` or `>=` - `NOT` in conjunction with a comparison operator is not currently supported ### Logical Operators - `boolean1` OR `boolean2` - `boolean1` AND `boolean2` - NOT `boolean2` is supported only when used with a boolean valued column in `WHERE`/`HAVING` filters - NOT `boolean2` is supported with expressions and boolean values in `SELECT` clause ### Aggregate Functions - `AVG(numeric)` - `COUNT(*)` - `MAX(value)` - `MIN(value)` - `SUM(numeric)` Currently, `COUNT( [DISTINCT] numeric)`, `COUNT( [DISTINCT] value)`, and `SUM( [DISTINCT] numeric)` are not supported. # Schema Collection Property graph databases are schemaless, however SQL requires that the connected database has schema. Because of this, sql-gremlin collects what is effectively the schema of the graph. This is done by sampling the graph, collecting all vertex and edge labels. The vertex and edge labels become the tables in the graph. The properties of the vertices and edges become the columns in the graph. In addition to the properties of the vertices and edges, connections between vertices and edges are collected and become columns. This means that a vertex with an edge labelled as *MY\_EDGE* that goes into it will have `MY_EDGE_IN_ID` as a column, and a vertex with an edge labelled as *MY\_OTHER\_EDGE* that goes out of it will have `MY_OTHER_EDGE_OUT_ID` as a column. `MY_EDGE` will have the vertex is goes into as `_IN_ID` as a column and `MY_OTHER_VERTEX` will have the vertex it goes out of as `_OUT_ID` as a column. Edges and columns also have their own id as a column with the name `