The fundamental pricing unit in cosmos is called Request Unit. The RequestUnit is a measure of throughput and it is an abstraction over compute, memory and IOPS required to serve a request. The idea with the model is that number of request units required for an operation is deterministic so a query will always cost the same. The cost of every operation is returned back to the client via the SDK or response headers so you can always track and manage your cost.
So, what happens if you’re trying to use more than you have provisioned? Every time you issue a request that would consume more resources than available, Cosmos will reject the call and return details about how long you should wait before issuing the query again for best chance of execution.
If you’re using the Cosmos SDK you are going to benefit from a built-in automatic retry policy that will try and seamlessly rerun your query according to the timeouts recommended in Cosmos. If Cosmos still can’t fulfill the request, the SDK will eventually give up (default policy is to retry 5 times) and you will get an “Request rate too large” exception that you will need to handle manually.
With this functionality, often times you don’t even have to worry about exceeding the provisioned resources, assuming you are not getting long-running or very high spikes in load.
Read More