5 things you (probably) didn't know about CosmosDB

2019-06-10

CosmosDB

Azure, CosmosDB

The more I learn about CosmosDB the more I discover new things that are amazing about it… Today I learned something new (see 2. below) and I wanted to compile and share a list of things that I found interesting and maybe not common knowledge about CosmosDB.

1. Partition Key

If you worked on a more than basic CosmosDB implementation, you will know that the way you model your data has a tremendous effect on the performance you can get out of CosmosDB. and of course, one of the biggest challenges is choosing the right PartitionKey. Internally, CosmosDB will hash the values from your PartitionKey property and transparently use those values to create logical partitions that are essential in scalability and performance of your database.

What you probably didn’t know is that CosmosDB only reads the first 100 bytes of your PartitionKey value when performing the hash, which means if you have a very large partition key, you might end up with some unpredictable cases. To address this edge case, CosmosDB introduced support for large partition keys. You can opt into this features for new containers that you create.

2. Multi-API

You probably already heard about the Multi-API capability of CosmosDB already. It’s one of the signature features as it allows you to use CosmosDB for a variety of scenarios and be a good candidate for replacing a variety of other solutions like MongoDB, Cassandra, even the good old Table Storage account.

What is less known is the fact that (in some cases) you can combine these APIs in a single instance of CosmosDB. One of the use-cases I’ve been using this feature with success is when using CosmosDB as a Graph database. Gremlin is the main API you use to manipulate the graph database but in some cases it is cumbersome and frankly not as performant as the plain SQL API. What CosmosDB allows you to do is create a Graph collection and use either Gremlin or SQL to manipulate the data. In my case, I’m doing all the inserts using the DocumentDB API, execute simple reads with the SQL API and complicated traversals using Gremlin to avoid multiple queries and joins.

3. Built in TTL

CosmosDB has a built-in capability of data expiry though a “ttl” property. You can set your TTL at a container lever or individually on the item level. For full details you can always refer to the docs

This feature, combined with the low latency guarantee makes CosmosDB a great fit for a cache database like Redis

4. Billing

Most know that CosmosDB is a fully managed product and performance and cost is measured in Request Units per second (RUs). Essentially you provision a certain amount of RUs and you pay for them regardless of how much you use.
You might also know that it’s really easy to programmatically increase or decrease your provisioned RUs for a very elastic and cost efficient experience.

However, what you might not know is that billing is measured hourly, and the way it’s done is you get billed for the highest number of RUs you had provisioned at any point during that hour.

5. Try for free

One of the biggest barriers for entry into CosmosDB was its cost. Even if it’s just $23/month for the smallest instance possible, it’s been sometimes hard to get started.

Many might now that you can use the CosmosDB emulator for testing locally and even run tests in Azure DevOps, but maybe you didn’t know that you can use CosmosDB free of charge for 30 days at a time.

The free offer gives you 30 days of CosmosDB usage with a maximum 25 containers and 10.000 RUs of throughput which should be enough to test even the most demanding situations.
At any time during the 30 days, or once the 30 days pass, you can delete the account and create a new one for another 30 days.

To access this account you will have to login with a MS account and you will get a new directory and subscription that only has access to this particular account, so it’s also really convenient to manage. It just doesn’t get better than this.