Vespa Product Updates, May 2020: Improved Slow Node Tolerance,
Multi-Threaded Rank Profile Compilation, Reduced Peak Memory at Startup, Feed Performance Improvements, & Increased Tensor Performance
<p><a href="https://www.linkedin.com/in/kraune/">Kristian Aune</a>, Tech Product Manager, Verizon Media<br/></p><p>In the <a href="https://yahoodevelopers.tumblr.com/post/617301826343239680/vespa-product-updates-april-2020-improved">April updates</a>, we mentioned Improved Performance for Large Fan-out Applications, Improved Node Auto-fail Handling, CloudWatch Metric Import and CentOS 7 Dev Environment. This month, we’re excited to share the following updates:<b><br/></b></p><p><b>Improved Slow Node Tolerance</b></p><p>To improve query scaling, applications can <a href="https://docs.vespa.ai/documentation/performance/sizing-search.html">group content nodes</a> to balance static and dynamic query cost. The largest Vespa applications use a few hundred nodes. This is a great feature to optimize cost vs performance in high-query applications. Since Vespa-7.225.71, the <a href="https://docs.vespa.ai/documentation/reference/services-content.html#dispatch-policy">adaptive dispatch policy</a> is made default. This balances load to the node groups based on latency rather than just round robin - a slower node will get less load and overall latency is lower.</p><p><b>Multi-Threaded Rank Profile Compilation</b></p><p>Queries are using a <a href="https://docs.vespa.ai/documentation/ranking.html">rank profile</a> to score documents. Rank profiles can be huge, like machine learned models. The models are compiled and validated when deployed to Vespa. Since Vespa-7.225.71, the compilation is multi-threaded, cutting compile time to 10% for large models. This makes content node startup quicker, which is important for rolling upgrades.</p><p><b>Reduced Peak Memory at Startup</b></p><p><a href="https://docs.vespa.ai/documentation/attributes.html">Attributes</a> is a unique Vespa feature used for high feed performance for low-latency applications. It enables writing directly to memory for immediate serving. At restart, these structures are reloaded. Since Vespa-7.225.71, the largest attribute is loaded first, to minimize temporary memory usage. As memory is sized for peak usage, this cuts content node size requirements for applications with large variations in attribute size. Applications should keep memory at less than 80% of AWS EC2 instance size.</p><p><b>Feed Performance Improvements</b></p><p>At times, batches of documents are deleted. This subsequently triggers compaction. Since Vespa-7.227.2, compaction is blocked at high removal rates, reducing overall load. Compaction resumes once the remove rate is low again. </p><p><b>Increased Tensor Performance </b></p><p><a href="https://docs.vespa.ai/documentation/tensor-user-guide.html">Tensor</a> is a field type used in advanced ranking expressions, with heavy CPU usage. Simple tensor joins are now optimized and more optimizations will follow in June.</p><p>…</p><p>About Vespa: Largely developed by Yahoo engineers, <a href="https://github.com/vespa-engine/vespa">Vespa</a> is an open source big data processing and serving engine. It’s in use by many products, such as Yahoo News, Yahoo Sports, Yahoo Finance, and the Verizon Media Ad Platform. Thanks to feedback and contributions from the community, Vespa continues to grow.</p><p>We welcome your contributions and feedback (<a href="https://twitter.com/vespaengine">tweet</a> or <a href="mailto:info@vespa.ai">email</a>) about any of these new features or future improvements you’d like to request.</p>