NAME neptune-export.sh export-pg-from-queries - Export property graph to CSV or JSON from Gremlin queries. SYNOPSIS neptune-export.sh export-pg-from-queries [ --alb-endpoint ] [ {-b | --batch-size} ] [ --clone-cluster ] [ --clone-cluster-correlation-id ] [ --clone-cluster-instance-type ] [ --clone-cluster-replica-count ] [ {--cluster-id | --cluster | --clusterid} ] [ {-cn | --concurrency} ] {-d | --dir} [ --disable-ssl ] [ {-e | --endpoint} ... ] [ --export-id ] [ {-f | --queries-file} ] [ --format ] [ --include-type-definitions ] [ --janus ] [ --lb-port ] [ --log-level ] [ --max-content-length ] [ --merge-files ] [ --nlb-endpoint ] [ {-o | --output} ] [ {-p | --port} ] [ --partition-directories ] [ --per-label-directories ] [ --profile ... ] [ {-q | --queries | --query | --gremlin} ... ] [ {--region | --stream-region} ] [ --serializer ] [ --stream-large-record-strategy ] [ --stream-name ] [ {-t | --tag} ] [ --timeout-millis ] [ --two-pass-analysis ] [ --use-iam-auth ] [ --use-ssl ] OPTIONS --alb-endpoint Application load balancer endpoint (optional: use only if connecting to an IAM DB enabled Neptune cluster through an application load balancer (ALB) – see https://github.com/aws-samples/aws-dbs-refarch-graph/tree/master/src/connecting-using-a-load-balancer#connecting-to-amazon-neptune-from-clients-outside-the-neptune-vpc-using-aws-application-load-balancer). This option may occur a maximum of 1 times This option is part of the group 'load-balancer' from which only one option may be specified -b , --batch-size Batch size (optional, default 64). Reduce this number if your queries trigger CorruptedFrameExceptions. This option may occur a maximum of 1 times --clone-cluster Clone an Amazon Neptune cluster. This option may occur a maximum of 1 times --clone-cluster-correlation-id Correlation ID to be added to a correlation-id tag on the cloned cluster. This option may occur a maximum of 1 times --clone-cluster-instance-type Instance type for cloned cluster (by default neptune-export will use the same instance type as the source cluster). This options value is restricted to the following set of values: db.r4.large db.r4.xlarge db.r4.2xlarge db.r4.4xlarge db.r4.8xlarge db.r5.large db.r5.xlarge db.r5.2xlarge db.r5.4xlarge db.r5.8xlarge db.r5.12xlarge db.t3.medium r4.large r4.xlarge r4.2xlarge r4.4xlarge r4.8xlarge r5.large r5.xlarge r5.2xlarge r5.4xlarge r5.8xlarge r5.12xlarge t3.medium This option may occur a maximum of 1 times --clone-cluster-replica-count Number of read replicas to add to the cloned cluster (default, 0). This option may occur a maximum of 1 times This options value must fall in the following range: 0 <= value <= 15 --cluster-id , --cluster , --clusterid ID of an Amazon Neptune cluster. If you specify a cluster ID, neptune-export will use all of the instance endpoints in the cluster in addition to any endpoints you have specified using the endpoint options. This option may occur a maximum of 1 times This option is part of the group 'endpoint or clusterId' from which at least one option must be specified -cn , --concurrency Concurrency – the number of parallel queries used to run the export (optional, default 4). This option may occur a maximum of 1 times -d , --dir Root directory for output. This option may occur a maximum of 1 times This options value must be a path to a directory. The provided path must be readable and writable. --disable-ssl Disables connectivity over SSL. This option may occur a maximum of 1 times -e , --endpoint Neptune endpoint(s) – supply multiple instance endpoints if you want to load balance requests across a cluster. This option is part of the group 'endpoint or clusterId' from which at least one option must be specified --export-id Export id This option may occur a maximum of 1 times -f , --queries-file Path to JSON queries file (file path, or 'https' or 's3' URI). This option may occur a maximum of 1 times --format Output format (optional, default 'csv'). This options value is restricted to the following set of values: json csv csvNoHeaders neptuneStreamsJson neptuneStreamsSimpleJson This option may occur a maximum of 1 times --include-type-definitions Include type definitions from column headers (optional, default 'false'). This option may occur a maximum of 1 times --janus Use JanusGraph serializer. This option may occur a maximum of 1 times --lb-port Load balancer port (optional, default 80). This option may occur a maximum of 1 times This options value represents a port and must fall in one of the following port ranges: 1-1023, 1024-49151 --log-level Log level (optional, default 'error'). This options value is restricted to the following set of values: trace debug info warn error This option may occur a maximum of 1 times --max-content-length Max content length (optional, default 50000000). This option may occur a maximum of 1 times --merge-files Merge files for each vertex or edge label (currently only supports CSV files for export-pg). This option may occur a maximum of 1 times --nlb-endpoint Network load balancer endpoint (optional: use only if connecting to an IAM DB enabled Neptune cluster through a network load balancer (NLB) – see https://github.com/aws-samples/aws-dbs-refarch-graph/tree/master/src/connecting-using-a-load-balancer#connecting-to-amazon-neptune-from-clients-outside-the-neptune-vpc-using-aws-network-load-balancer). This option may occur a maximum of 1 times This option is part of the group 'load-balancer' from which only one option may be specified -o , --output Output target (optional, default 'file'). This options value is restricted to the following set of values: files stdout devnull stream This option may occur a maximum of 1 times -p , --port Neptune port (optional, default 8182). This option may occur a maximum of 1 times This options value represents a port and must fall in one of the following port ranges: 1-1023, 1024-49151 --partition-directories Partition directory path (e.g. 'year=2021/month=07/day=21'). This option may occur a maximum of 1 times --per-label-directories Create a subdirectory for each distinct vertex or edge label. This option may occur a maximum of 1 times --profile Name of an export profile. -q , --queries , --query , --gremlin Gremlin queries (format: name="semi-colon-separated list of queries" OR "semi-colon-separated list of queries"). --region , --stream-region AWS Region in which your Amazon Kinesis Data Stream is located. This option may occur a maximum of 1 times --serializer Message serializer – (optional, default 'GRAPHBINARY_V1D0'). This options value is restricted to the following set of values: GRAPHSON GRAPHSON_V1D0 GRAPHSON_V2D0 GRAPHSON_V3D0 GRAPHBINARY_V1D0 GRYO_V1D0 GRYO_V3D0 GRYO_LITE_V1D0 This option may occur a maximum of 1 times --stream-large-record-strategy Strategy for dealing with records to be sent to Amazon Kinesis that are larger than 1 MB. This options value is restricted to the following set of values: dropAll splitAndDrop splitAndShred This option may occur a maximum of 1 times --stream-name Name of an Amazon Kinesis Data Stream. This option may occur a maximum of 1 times -t , --tag Directory prefix (optional). This option may occur a maximum of 1 times --timeout-millis Query timeout in milliseconds (optional). This option may occur a maximum of 1 times --two-pass-analysis Perform two-pass analysis of query results (optional, default 'false'). This option may occur a maximum of 1 times --use-iam-auth Use IAM database authentication to authenticate to Neptune (remember to set the SERVICE_REGION environment variable). This option may occur a maximum of 1 times --use-ssl Enables connectivity over SSL. This option is deprecated: neptune-export will always connect via SSL unless you use --disable-ssl to explicitly disable connectivity over SSL. This option may occur a maximum of 1 times EXAMPLES bin/neptune-export.sh export-pg-from-queries -e neptunedbcluster-xxxxxxxxxxxx.cluster-yyyyyyyyyyyy.us-east-1.neptune.amazonaws.com -d /home/ec2-user/output -q person="g.V().hasLabel('Person').has('birthday', lt('1985-01-01')).project('id', 'first_name', 'last_name', 'birthday').by(id).by('firstName').by('lastName').by('birthday');g.V().hasLabel('Person').has('birthday', gte('1985-01-01')).project('id', 'first_name', 'last_name', 'birthday').by(id).by('firstName').by('lastName').by('birthday')" -q post="g.V().hasLabel('Post').has('imageFile').range(0, 250000).project('id', 'image_file', 'creation_date', 'creator_id').by(id).by('imageFile').by('creationDate').by(in('CREATED').id());g.V().hasLabel('Post').has('imageFile').range(250000, 500000).project('id', 'image_file', 'creation_date', 'creator_id').by(id).by('imageFile').by('creationDate').by(in('CREATED').id());g.V().hasLabel('Post').has('imageFile').range(500000, 750000).project('id', 'image_file', 'creation_date', 'creator_id').by(id).by('imageFile').by('creationDate').by(in('CREATED').id());g.V().hasLabel('Post').has('imageFile').range(750000, -1).project('id', 'image_file', 'creation_date', 'creator_id').by(id).by('imageFile').by('creationDate').by(in('CREATED').id())" --concurrency 6 Parallel export of Person data in 2 shards, sharding on the 'birthday' property, and Post data in 4 shards, sharding on range, using 6 threads bin/neptune-export.sh export-pg-from-queries -e neptunedbcluster-xxxxxxxxxxxx.cluster-yyyyyyyyyyyy.us-east-1.neptune.amazonaws.com -d /home/ec2-user/output -q person="g.V().hasLabel('Person').has('birthday', lt('1985-01-01')).project('id', 'first_name', 'last_name', 'birthday').by(id).by('firstName').by('lastName').by('birthday');g.V().hasLabel('Person').has('birthday', gte('1985-01-01')).project('id', 'first_name', 'last_name', 'birthday').by(id).by('firstName').by('lastName').by('birthday')" -q post="g.V().hasLabel('Post').has('imageFile').range(0, 250000).project('id', 'image_file', 'creation_date', 'creator_id').by(id).by('imageFile').by('creationDate').by(in('CREATED').id());g.V().hasLabel('Post').has('imageFile').range(250000, 500000).project('id', 'image_file', 'creation_date', 'creator_id').by(id).by('imageFile').by('creationDate').by(in('CREATED').id());g.V().hasLabel('Post').has('imageFile').range(500000, 750000).project('id', 'image_file', 'creation_date', 'creator_id').by(id).by('imageFile').by('creationDate').by(in('CREATED').id());g.V().hasLabel('Post').has('imageFile').range(750000, -1).project('id', 'image_file', 'creation_date', 'creator_id').by(id).by('imageFile').by('creationDate').by(in('CREATED').id())" --concurrency 6 --format json Parallel export of Person data and Post data as JSON