Open-sourcing UltraBrew Metrics, a Java library for instrumenting very large-scale applications
<p><b>By <a href="https://www.linkedin.com/in/ag33k/">Arun Gupta</a></b></p><figure data-orig-width="1489" data-orig-height="313" class="tmblr-full"><img src="https://66.media.tumblr.com/f75a509453083e1cdd4d5dedc29a6f49/tumblr_inline_pnllryR87t1wxhpzr_540.png" alt="image" data-orig-width="1489" data-orig-height="313"/></figure><p><b></b></p><p>Effective monitoring of applications depends on high-quality instrumentation. By measuring key metrics for your applications, you can identify performance characteristics, bottlenecks, detect failures, and plan for growth.</p><p><b></b></p><p>Here are some examples of metrics that you might want about your applications:</p><ul><li>How much processing is being done, which could be in terms of requests, queries, transactions, records, backend calls, etc.</li><li>How long is a particular part of the code taking (ie, latency), which could be in the form of total time spent as well as statistics like weighted average (based on sum and count), min, max, percentiles, and histograms.</li><li>How many resources are being utilized, like memory, entries in a hashmap, length of an array, etc.</li></ul><p><b></b></p><p>Further, you might want to know details about your service, such as:</p><ul><li>How many users are querying the service?</li><li>Latency experience by users, sliced by users’ device types, countries of origin, operating system versions, etc.<br/></li><li>Number of errors encountered by users, sliced by types of errors.<br/></li><li>Sizes of responses returned to users.<br/></li></ul><p>At Verizon Media, we have applications and services that run at a very large-scale and metrics are critical for driving business and operational insights. We set out to find a good metrics library for our Java services that provide lots of features but performs well at scale. After evaluating available options, we realized that existing libraries did not meet our requirements:<br/></p><ul><li><a href="https://github.com/ultrabrew/metrics/blob/master/CONCEPTS.md#tag-key-and-tag-value">Support for dynamic dimensions (ie, tags)</a></li><li><a href="https://github.com/ultrabrew/metrics/blob/master/CONCEPTS.md#monoid">Metrics need to support associative operations</a><br/></li><li>Works well in very high traffic applications<br/></li><li>Minimal garbage collection pressure<br/></li><li><a href="https://github.com/ultrabrew/metrics/blob/master/CONCEPTS.md#reporter">Report metrics to multiple monitoring systems</a><br/></li></ul><p><b></b></p><p>As a result, we built and open sourced <a href="https://github.com/ultrabrew/metrics">UltraBrew Metrics</a>, which is a Java library for instrumenting very large-scale applications.</p><p><b>Performance</b><br/></p><p>UltraBrew Metrics can operate at millions of requests per second per JVM without measurably slowing the application down. We currently use the library to instrument multiple applications at Verizon Media, including one that uses this library 20+ million times per second on a single JVM.</p><p>Here are some of the techniques that allowed us to achieve our performance target:</p><ul><li>Minimize the need for synchronization by:</li><ul><li>Using Java’s Unsafe API for atomic operations.<br/></li><li>Aligning data fields to L1/L2-cache line size.<br/></li><li>Tracking state over 2 time-intervals to prevent contention between writes and reads.<br/></li></ul><li>Reduce the creation of objects, including avoiding the use of Java HashMaps.<br/></li><li>Writes happen on caller thread rather than dedicated threads. This avoids the need for a buffer between threads.<br/></li></ul><p><b></b></p><p><b>Questions or Contributions</b></p><p>To learn more about this library, please visit our <a href="https://github.com/ultrabrew/metrics">GitHub</a>. Feel free to also <a href="https://twitter.com/ultrabrew">tweet</a> or <a href="https://groups.google.com/forum/#!forum/ultrabrew/">email us</a> with any questions or suggestions.</p><p><b></b></p><p><b>Acknowledgments </b></p><p><b></b></p><p>Special thanks to my colleagues who made this possible:</p><ul><li><a href="https://www.linkedin.com/in/mattioikarinen/">Matti Oikarinen</a></li><li><a href="https://www.linkedin.com/in/mikamannermaa/">Mika Mannermaa</a></li><li><a href="https://www.linkedin.com/in/smrutiranjan/">Smruti Ranjan Sahoo</a></li><li><a href="https://www.linkedin.com/in/iruotsal/">Ilpo Ruotsalainen</a></li><li><a href="https://www.linkedin.com/in/christopher-larsen-6724255/">Chris Larsen</a> </li><li><a href="https://www.linkedin.com/in/rosaliebartlett">Rosalie Bartlett</a></li><li>The Monitoring Team at Verizon Media<br/></li></ul>