# 1. Unicode Support

The DocumentDB ODBC Driver supports the Unicode ODBC interface. It does not support the non-Unicode interface.

## 1.1. Unicode encoding at ODBC Layer

The definition of the Unicode at the ODBC layer is defined as SQLWCHAR. There are platform specific implementations.

- *Windows*: The SQLWCHAR is defined as wchar_t (2-byte)
- *MacOS (iODBC)*: The SQLWCHAR is defined as wchar_t (4-byte)
- *Linux (unixODBC)*: The SQLWCHAR is defined as unsigned short (2-byte)

In terms of our driver, the entry point is the entry_point.cpp/h. For each API entry point, it calls the equivalent in the `odbc` namespace.

Here we pass the SQLWCHAR pointer to/from entry_point and odbc. 

However, from the `odbc` namespace to the rest of the driver, we encode data in UTF-8 encoding. The reason being is that all the backend API (JNI and MongoCXX) use UTF-8 encoding. So to minimize the number of conversions, we choose to use UTF-8 encoding at this layer.

```mermaid
graph TD
    A(BI Tool) -- SQLWCHAR --> B(ODBC Driver Adapter)
    subgraph Driver [ODBC Driver]
    B -- SQLWCHAR --> C(entry_point.cpp)
    C -- SQLWCHAR --> D(odbc.cpp)
    D -- char* UTF-8 --> E(JNI API)
    D -- char* UTF-8 --> F(MongoCXX API)
    end
    E -- char* UTF-8 --> G[(DocumentDB Server)]
    F -- char* UTF-8 --> G
```

## 1.2. Logging

Log files are encoded in UTF-8 format with BOM (byte-order-mark).

## 1.3. Literals

When creating and comparing literals with the C/C++ data type of `[const] char*`, ensure to use the literal prefix: `u8`. For example : `u8"你好"`.

When creating use the `NewSqlWchar(L"你好")` function. When comparing literals in SQLWCHAR , first convert to string (`utility::SqlWcharToString()`) and then compare as UTF-8 literal `u8"你好"`.