CosmosDB graph formats

When working with CosmosDB GraphAPI it helps to understand how the engine stores or manipulates the inputs, so I will try to explain the data formats.

Inserting a vertex:

The gremlin query below will insert a new vertex in the database. Below you can see the message that this statement will issue to the server:

1
g.addV('person').property('firstName', 'Thomas').property('lastName', 'Andersen').property('age', 44).property('PartitionKey','123')

Payload:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
{
"label": "person",
"firstName": [{
"_value": "Thomas",
"id": "5267ec4b-a39e-4d77-8dea-668cb36307bc"
}],
"lastName": [{
"_value": "Andersen",
"id": "2e5271a6-ddd8-48b9-8ff6-be41e19f82f8"
}],
"age": [{
"_value": 44,
"id": "1c9a57cc-3324-4a0c-b4c3-d494fbb3fb81"
}],
"PartitionKey": "123",
"id": "a9b57684-16bf-47d9-8761-570bab43ca7b"
}

Inserting an edge:

The gremlin query below will insert a new edge in the database between two pre-existing vertices. Below you can see the message that this statement will issue to the server:

1
g.V().hasLabel('person').has('firstName', 'Thomas').addE('knows').to(g.V().hasLabel('person').has('firstName','John')).property('since','2017')

Payload

1
2
3
4
5
6
7
8
9
10
11
12
{
"label": "knows",
"since": "2017",
"id": "b9ae257a-ab49-487b-a165-57bd5330168b",
"_sink": "1f5c36bd-e2bb-4527-8819-28c3ff9d2316",
"_sinkLabel": "person",
"_sinkPartition": "456",
"_vertexId": "a9b57684-16bf-47d9-8761-570bab43ca7b",
"_vertexLabel": "person",
"_isEdge": true,
"PartitionKey": "123"
}

Read a vertex with Gremlin:

Gremlin statement to retrieve a single vertex from the database:

1
g.V().has('person','firstName','Thomas')

Payload:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
{
"id": "1f5c36bd-e2bb-4527-8819-28c3ff9d2316",
"label": "person",
"type": "vertex",
"properties": {
"firstName": [
{
"id": "c0727504-9fcb-4c0a-b4fb-ba9894393262",
"value": "John"
}
],
"lastName": [
{
"id": "fb341b08-c033-499f-8344-ca7b1bafc988",
"value": "Doe"
}
],
"age": [
{
"id": "bf4b713c-8e5c-4b4a-bdbe-f81048d627bc",
"value": 44
}
],
"PartitionKey": [
{
"id": "1f5c36bd-e2bb-4527-8819-28c3ff9d2316|PartitionKey",
"value": "456"
}
]
}
}

Read an edge with Gremlin:

Gremlin statement to retrieve a single edge from the database:

1
g.E().hasLabel('knows')

Payload:

1
2
3
4
5
6
7
8
9
10
11
12
{
"id": "b9ae257a-ab49-487b-a165-57bd5330168b",
"label": "knows",
"type": "edge",
"inVLabel": "person",
"outVLabel": "person",
"inV": "1f5c36bd-e2bb-4527-8819-28c3ff9d2316",
"outV": "a9b57684-16bf-47d9-8761-570bab43ca7b",
"properties": {
"since": "2017"
}
}

Takeaways

The first thing to notice is that the nodes and edges in the graph are actually JSON documents with specific formating around its properties, and some predefined properties for metadata.

There are minor differences in format when inserting versus when you recieve the data back - pay attention to your serialization.

The edge lives in the source vertex partition. This is a subte detail that can help you when you are designing your graph and you need to decide the directionality of your edges.

Share Comments