Amazon EC2
Lately, I’ve been spending a lot of time working with Amazon EC2 instances. There’s a lot to like about them, and perhaps I’ll give a longer presentation about them (from a user’s point of view), but one thing that has me a bit nervous is the network latency.
I’ve got a web app on one machine, and two Mongo DB instances on two other machines. The Mongo DB instances are configured as a master/slave replica-set. What this means is that my web app writes to one of the instances and reads from the other.
Works great!
Latency
But, once in a while, the app takes almost 4 seconds to do a read. What’s up with that? At first I thought there was some strange concurrency problem in my app, but after a bit of instrumentation, it seems clear that the slowdown happens right around the network call to Mongo to get some data.
Right now, of this app, the Mongo DB has hardly any data in it. Almost none. It’s also using a lot of indexes. Sometimes the queries take about 0.0020 seconds. Sometimes, 4.137 seconds. Most of the time, I get the sub-second query time. But sometimes, I get the 4+ or 3+ or even 2+.
This morning, my boss was getting the 4+ time in the office, while I was getting 0.0020 from home. Except sometimes.
I’m not liking this too much. Then again, I suppose that if we’re going to get serious about this sort of operational architecture, we might think about upgrading to “reserved” instances, though I’m not sure what that buys us in terms of network latency.
So, I’m a bit worried about this.
However
I let Amazon choose which zone each of my instances (of the three mentioned above) are running in. Perhaps that’s part of the issue. I’ll try and remedy that and see if things improve.
Edit: Turns out, it was a problem with the JVM Mongo driver and replica sets. Updated driver, problem gone.