Introducing Targeted Routing
13 Jun 2014In out last post we discovered what taking over from the mongos will look like. We can pull the routing logic inside of the mongos process and use special command to enable detecting of chunk changes. This carries with it a large risk that even a small bug in the driver can cause data to be lost (due to writing to the wrong shard) or getting the wrong query results (due to reading from the wrong shard). As a driver writer we want to be ultra conservative and minimize those risks to the greatest extent possible.
Taking a Step Back
That puts us right back at square one, or does it. Lets go back to the first block post in this series and look at the asynchronous request processing time line. Remember that this picture is showing the stages of a messages and time moves forward as we look down the image.
The problem we encountered is the amount of time that the red and purple requests wait to be serviced by the mongos. This problem was exacerbated by the fact that the low latency connection was between the application and mongos, since they are on the same host. What if we move the mongos onto the same machine as the shard server? What would that look like?
This is already looking better. Lets walk through the processing to get into the details.
- First the driver sends the green request. This request is for the first shard so the driver sends the request to the top mongos.
- The next frame shows the top mongos sending the request to the shard. We will note while we are here that this is a local connection so we have minimized the amount of time the mongos needs to wait for the request to be sent and received. We also see that the application is sending a red request. In this case the request is for the last shard so the driver targets the request at the bottom mongos.
- The third frame shows the request coming back from the first shard and another purple request being sent to the top mongos. The purple request won’t have to wait long for the attention of the mongos since the previous request is already making its way back. The red request is also being sent to the last shard.
- The last frame shows the green request coming back to the application and the red request being received by the mongos. The purple request has already been sent to the first shard.
When we compare this to the first picture we see two things that are improving the performance of the system.
- The driver is opening more connections to mongos processes but in an intelligent way. The driver is now responsible for pipe-lining requests to a mongos as close to the shard as possible. We already know based on our YCSB performance benchmarks that the driver excels at keeping MongoDB processes feed with work even when under heavy contention.
- The movement of the mongos next to the shard has reduced the amount of time that any stalled requests have to wait.
Targeted Routing
We call this ‘targeted’ routing. Targeted routing is designed to maximize the performance of a sharded cluster while ensuring data integrity. Lets look at some error conditions to see how they can be handled.
Handling Routing Errors
The basic strategy for handling errors in the driver’s targeting logic is to have the mongos silently fix the problem. Lets look at another picture.
In the above diagram we see that the driver has sent our red request to the wrong shard. It should have been sent to the bottom mongos for delivery to the last shard but instead has been sent to the top mongos. The great thing is that the mongos does not know we made a mistake and simply corrects it by routing the request to the right shard. In practice, as long as we can make the right routing decision for the vast majority of the requests we should still see a performance improvement.
We will extend this error handling to cases where the driver cannot target a single shard. In those cases we will fallback to using the closest mongos process for performing the required scatter/gather operation.
In our next post we will look at what a targeted routing deployment looks like and how to enable it in the driver’s extensions.