Vespa Product Updates, May 2019: Deploy Large Machine Learning Models, Multithreaded Disk Index Fusion, Ideal State Optimizations, and Feeding Improvements
<p><a href="https://www.linkedin.com/in/kraune/">Kristian Aune</a>, Tech Product Manager, Verizon Media<br/></p><p><b></b></p><p>In a <a href="https://yahoodevelopers.tumblr.com/post/183774556468/vespa-product-updates-march-2019-tensor-updates">recent post</a>, we mentioned Tensor updates, Query tracing and coverage. Largely developed by Yahoo engineers, <a href="https://github.com/vespa-engine/vespa">Vespa</a> is an open source big data processing and serving engine. It’s in use by many products, such as Yahoo News, Yahoo Sports, Yahoo Finance, and the Verizon Media Ad Platform. Thanks to feedback and contributions from the community, Vespa continues to evolve.</p><p>For May, we’re excited to share the following feature updates with you:<br/></p><p><b></b></p><p><b>Multithreaded Disk Index Fusion</b></p><p>Content nodes are now able to sustain a higher feed rate by using multiple threads for disk index fusion. <a href="https://docs.vespa.ai/documentation/proton.html#disk-index-fusion">Read more</a>.</p><p><b>Feeding Improvements</b></p><p>Cluster-internal communications are now multithreaded out of the box, for high throughput feeding operations. This fully utilizes a 10 Gbps network and improves utilization of high-CPU content nodes.<br/></p><p><b>Ideal State Optimizations</b><br/></p><p>Whenever the content cluster state changes, the ideal state is calculated. This is now optimized (faster and runs less often) and state transitions like node up/down will have less impact on read and write operations. Learn more in <a href="https://docs.vespa.ai/documentation/elastic-vespa.html">the dynamic data distribution documentation</a>.<br/></p><p><b>Download Machine Learning Models During Deploy</b><br/></p><p><b></b></p><p>One procedure for using/importing ML models to Vespa is to put them in the application package in the <a href="https://docs.vespa.ai/documentation/reference/application-packages-reference.html">models</a> directory. Applications where models are trained frequently in some external system can refer to the model by URL rather than including it in the application package. This use case is now documented in <a href="https://docs.vespa.ai/documentation/deploying-remote-models.html">deploying remote models</a>, and solves the challenge of deploying huge models.</p><p>We welcome your <a href="https://github.com/vespa-engine/vespa/blob/master/CONTRIBUTING.md">contributions</a> and feedback (<a href="https://twitter.com/vespaengine">tweet</a> or <a href="mailto:info@vespa.ai">email</a>) about any of these new features or future improvements you’d like to request.<br/></p>