Autocomplete response time under 100ms in the cloud

Autocomplete response time under 100ms in the cloud

A significant part of the perceived quality of the applications we use is responsiveness. Sadly, even for a desktop software, it's a frequent event of UI locking up while certain tasks are completed. Bad news: our CPUs aren't becoming significantly faster for a single-thread tasks (I fondly remember days when each time I bought a new machine, I'd get a 3x speedup in tasks that mattered to me). Good news: our CPUs are becoming wider, though. We can perform more independent tasks at the same time. Bad news: that doesn't matter much for our responsiveness.

Autocomplete, as implemented in SQL Studio, requires the following tasks to be done, in order:

  • Sending changes from the document in our browser to our server back-end over the Internet
  • Reconstructing the whole document (possibly multi-megabyte) on the server
  • Building lexical tokens from the document text (lexical parsing)
  • Building the abstract syntax tree from the lexical tokens (semantic parsing) taking into account that current statement is probably broken - user is typing and statement is unfinished
  • Performing the static analysis and building the list of active variables, objects and items at each statement part
  • For the context at the caret position, provide the intelligent list of options for autocomplete, taking into account the (possibly qualified) word we are completing
  • Sending those items back to the browser

If this were a desktop application, lot of those items would be unnecessary. In particular, document text would be ready to use all the time and we wouldn't need to spend time either on pushing it through the network, nor on serializing or deserializing the data.

Now, what goes on our cloud instances is interesting. Initially, we hoped to get rid of most of the complexity and used AWS Lambda. Unfortunately, that doesn't work well for our use-case. Lambdas' responsiveness is unpredictable. You might get an instant response or you might wait for a few seconds until the new instance is spun up. Also, Lambdas are completely stateless. You can keep things in RAM between invocations, kind of - but the more we tried to optimize response time, the more hacky everything looked and it became clear we are trying to fit our problem to Lambda solution and it didn't fit all that well. Lambdas are great for loads that require a few seconds or minutes of processing time and it's optimized for scaling, not for response time.

We ended up using the AWS Fargate. With it, we get a fleet of containers and can easily scale it up or down.

Current implementation

Sending changes

Right now, in our early beta stage, we are using the dumb approach - we are pushing a POST call to our server on each keystroke. It's an expensive operation, requiring new connection to the server every single time. Very soon we're going to use websockets instead.

Reconstructing full document text

This is completely finished. Even though changes are sent over the Internet and we're not using websockets - so changes can come out of order - we can reliably have the latest document on the server.

Parsing & analysis

This is the core of many of our current and planned solutions. Our very own parser generator is producing a very fast parser, optimized for SQL. We can parse roughly 1MB of text per second.

Our clients have databases of very differing complexity. Many have just a few dozen tables, but current record holder had a 60 thousand tables (not counting other objects) in their Oracle database. As we want all our clients to have the same instant responsiveness and real-time database analysis feedback regardless of the database complexity, we employ caching. It also reduces the load on complex databases as all the users of our applications get the metadata fetched from our cache instead of querying the (often slow) database server whenever they connect. Our caching cluster is Elasticache (AWS Redis cluster) and it's fast enough.

Future optimizations

As we are well under a 100ms threshold already, with our brute-force approach and simple caching strategy.

As our user base grows, depending on the plan and number of users we are going to have a more fine-grained instance allocation, so some users will be sharing the same instance, while others will have a dedicated instance (or even a cluster of instances) just for themselves. This will allow us to cache parser and analysis data in RAM and avoid (de)serialization altogether.