CosmosDB Graph Read optimization with ReadDocument

Like we saw before, you can use the DocDB API or Gremlin interchangeably for inserting data, and you can also do the same for reads. Complex queries or traversals will still need Gremlin, but for simple reads you could (and probably should) use the DocDB API - and by simple reads, I mean reads where you know the ParititonKey and the Id of the node you need to retrieve.

Let’s look at a simple example:

g.V().hasLabel(‘person’).has(‘id’,’1f5c36bd-e2bb-4527-8819-28c3ff9d2316’).has(‘PartitionKey’,’456’)

Will produce the following result, by using 2.3 RU in about 0.2s

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
{
"id": "1f5c36bd-e2bb-4527-8819-28c3ff9d2316",
"label": "person",
"type": "vertex",
"properties": {
"firstName": [
{
"id": "c0727504-9fcb-4c0a-b4fb-ba9894393262",
"value": "John"
}
],
"lastName": [
{
"id": "fb341b08-c033-499f-8344-ca7b1bafc988",
"value": "Doe"
}
],
"age": [
{
"id": "bf4b713c-8e5c-4b4a-bdbe-f81048d627bc",
"value": 44
}
],
"PartitionKey": [
{
"id": "1f5c36bd-e2bb-4527-8819-28c3ff9d2316|PartitionKey",
"value": "456"
}
]
}
}

We can obtain the same result by using the ReadDocument API, by providing the id and PartitionKey values. This, will return the following result, consuming 1RU in 0.01s

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
{
"label": "person",
"firstName": [
{
"_value": "John",
"id": "c0727504-9fcb-4c0a-b4fb-ba9894393262"
}
],
"lastName": [
{
"_value": "Doe",
"id": "fb341b08-c033-499f-8344-ca7b1bafc988"
}
],
"age": [
{
"_value": 44,
"id": "bf4b713c-8e5c-4b4a-bdbe-f81048d627bc"
}
],
"PartitionKey": "456",
"id": "1f5c36bd-e2bb-4527-8819-28c3ff9d2316",
"_rid": "wiw1AJZQfAD9AgAAAAAAAA==",
"_self": "dbs/wiw1AA==/colls/wiw1AJZQfAA=/docs/wiw1AJZQfAD9AgAAAAAAAA==/",
"_etag": "\"5e00061a-0000-0000-0000-5a90dd460000\"",
"_attachments": "attachments/",
"_ts": 1519443270
}

Note that the format of the result is a bit different than the document we get from the Graph API, but with a simple serialization you should be able to get back your object.

Also important, is the difference in RU usage and latency where Graph API is twice as expensive in terms of compute and takes 10 times longer. To understand why that is, we need to look a bit deeper at how Gremlin is executed in CosmosDB.
Using a traffic inspector, you can check the calls being made to CosmosDB when issuing a Gremlin query. Here’s the request for our example above:

1
{"query":"SELECT N_1 FROM Node N_1  JOIN DfirstName IN N_1['firstName']  WHERE (N_1.label = 'person' AND DfirstName._value = 'John') AND IS_DEFINED(N_1._isEdge) = false"}

So under the hoods, Gremlin gets translated in the SQL API and uses that to query and filter the results, which obviously ads complexity and cost to the requests.

Share Comments