DevOps teams creating an e-commerce site typically start by downloading open source code from GitHub. For example, they might download open source code for NGINX, a popular web server for high-traffic websites. From there, the team can optimize the software for performance and package it for deployment.
But there is an alternate approach for developers using cloud instances with Intel hardware. This approach can significantly accelerate time to value: starting with Intel Optimized Cloud Stacks.
To create Intel Optimized Cloud Stacks, Intel starts with the same open source software available to developers on GitHub. Intel then creates application-specific software optimizations. These optimizations accelerate performance and are tuned for Intel hardware on specific cloud instances.
The optimizations take advantage of Intel’s advanced acceleration capabilities for cryptography, artificial intelligence (AI), and other CPU-intensive workloads. Intel then works with partners like Bitnami to build, compile, and package the software. Using Intel Optimized Cloud Stacks as a starting point for workloads like NGINX, WordPress, and MySQL can save developers a significant amount of time. It can also greatly enhance performance.
Figure 1 illustrates Intel Optimized Cloud Stacks for MySQL, an open source relational database management system (RDBMS). Intel starts with the standard open source image and enhances it with software optimizations, including security. The software is then tuned for an Intel instance on a target public cloud. Users choose either a single-node or a multi-node version, depending on their use case: static or dynamic web hosting, e-commerce, social media, banking applications, or others.
How are Intel Optimized Cloud Stacks different?
Intel Optimized Cloud Stacks differ from other open source packages in several ways. To start, Intel uses unique software components, and it tunes these components to use the advanced instruction set capabilities of Intel platforms. Intel also optimizes the components to use the capabilities of public cloud instances. By upgrading and tuning components, Intel enables organizations to deploy applications with higher performance and in less time than if they used the standard packages available on GitHub.
Tables 1 and 2 display optimized components included in Intel stacks. Table 1 outlines the Intel Optimized Cloud Stacks for WordPress, whereas Table 2 outlines the Intel Optimized Cloud Stacks for NGINX.
To enable higher-speed response times for websites, Intel uses asynchronous NGINX instead of standard NGINX in both the WordPress and NGINX images. Similarly, to help ensure communications security over networks while delivering high performance, Intel uses OpenSSL in asynchronous mode instead of standard OpenSSL.
Both the NGINX and WordPress offerings include Intel Multi-Buffer Crypto, which accelerates the packet processing of IP security (IPsec). IPsec is a framework of related protocols that help secure communications at the packet processing layer. The stacks also include the Intel QAT engine, which accelerates and compresses cryptographic workloads by offloading the data to hardware that optimizes those functions. Both of these components can help improve the performance of the NGINX engine, enabling data center servers to respond efficiently to high traffic levels.
Intel differentiates its Intel Optimized Cloud Stacks in the way it curates the components. They are selected to harness Intel’s advanced hardware-acceleration capabilities for specific cloud instances.
Intel also manages the heavy lifting of building, compiling, and packaging the application software. This allows DevOps teams to easily launch Intel Optimized Cloud Stacks from the public CSP site. This is in contrast to running through multiple memory and CPU cycles preparing software packages.
Optimized software components
The following sections take a deeper look at some of the optimized software included in Intel Optimized Cloud Stacks and how these components can help increase performance.
The OpenSSL project provides an open source implementation of the Secure Sockets Layer (SSL)/Transport Layer Security 3 (TLS3) protocols needed to authenticate and encrypt network connections between computers. SSL/ TLS3 encrypt personal information sent by applications like e-mail and social media. Performance is critical because of ever-increasing amounts of traffic within data centers. Asynchronous OpenSSL is an important contribution to this project that addresses this need for improved performance.
Asynchronous OpenSSL is a non-blocking approach that supports a parallel-processing model at the cryptographic level for SSL/TLS protocols. This, in turn, allows for other types of optimizations. Two major benefits of asynchronous OpenSSL are increased single-flow throughput, leading to high performance, and fewer contexts, thus reducing context management overhead. By including asynchronous OpenSSL in Intel Optimized Cloud Stacks, developer applications can benefit from superior server performance.
Intel QuickAssist Technology (Intel QAT) engine for OpenSSL
OpenSSL performance can be further improved with the Intel QAT engine, which is supported by select Intel architecture platforms. Intel QAT has been expanded to provide software-based acceleration of cryptographic operations through instructions in the Intel Advanced Vector Extensions 512 (Intel AVX-512) family. Again, by including Intel QAT in Intel Optimized Cloud Stacks, applications can benefit from enhanced server performance. The Intel QAT engine for OpenSSL is deployable today for on-premises users, and it can provide benefits in the cloud as CSPs roll out instances that expose it.
Asynchronous mode for NGINX
Open source NGINX is a lightweight, high-performance server that can serve content to a high volume of connections. It is used by high-demand websites because of its asynchronous architecture and its ability to handle heavy loads.
Intel extends NGINX by enabling it to work with asynchronous-mode OpenSSL. Intel also incorporates Intel QAT acceleration.
Running NGINX with asynchronous OpenSSL contributes to better server performance.
Intel Multi-Buffer Crypto for IPsec Library
Intel Optimized Cloud Stacks use Intel Multi-Buffer Crypto for IPsec Library instructions that provide crypto acceleration of packet-processing frameworks like IPsec and TLS. Intel also provides algorithmic and software enhancements that enable operation chaining, advanced cryptographic pipelining, multi-buffering, and function-stitching innovations. By including Intel Multi-Buffer Crypto for IPsec Library in Intel Optimized Cloud Stacks, improved performance is possible.
Web server performance
You can use the TLS 1.2 version web server benchmark to compare crypto performance with and without the Intel QAT engine using IFMA crypto instructions. Intel testing demonstrates the advantage of using the Intel QAT engine. As shown in Figure 3. crypto acceleration was 7x faster for the NGINX TLS web server when the Intel QAT engine was used.
Intel creates application-specific software optimizations for Intel hardware in use cases beyond web services. For example, the Intel oneAPI AI Analytics Toolkit (Al Kit) uses optimizations to accelerate end-to-end Al analytics workflows in the Python ecosystem.
As an example, Intel used two optimized software components from the Al Kit and trained a model to predict income based on education. The Intel Distribution of Modin was employed for data ingestion and extract, transform, and load (ETL) functions, while the Intel Extension for Scikit-learn was used for model training and prediction.
In Figure 3, you can see the optimized software components boost ETL by 38x, in addition to machine-learning (ML) prediction and fitting with ridge regression being improved 21x compared to using the stock libraries.
Tips for benchmarking performance
When evaluating performance, be aware that benchmark tests can be deceiving. Some companies might claim that their processors run specific workloads in public clouds faster than on Intel processors. But these claims are often cherry-picked, focusing on workloads rarely encountered in a production environment. These “synthetic” benchmarks aren’t meaningful in the real world. These companies might also compare fully optimized Arm binaries against unoptimized x86 images. Consider requesting real-world performance results rather than relying on synthetic benchmarks.
A top-down microarchitecture similarity analysis delivers testing and validation that more closely resembles the real world. In Figure 4, production workloads cluster at the lower left, reflecting both front-end and back-end bound execution that results in higher sensitivity to both microarchitecture and memory latency.
The workload similarity analysis of CPU performance-monitor data shown in Figure 4 also highlights the substantial differences in performance when comparing various real-world workload proxies to benchmark tests mathematically. Standard Performance Evaluation Corporation (SPEC) benchmarks cluster components into microbenchmarks that lack the front-end boundedness of real applications. Even non-SPEC benchmarks must be tweaked so that they can reflect a proxy of the real workload. SPEC benchmarks and similar microbenchmarks primarily use CPU-bound indicators that are gated by clock speed and/or core count. As a result, SPEC benchmarks tend to measure only the frequency product from the simple cores. But there are other factors-cache and memory latencies, non-uniform memory access (NUMA), and input/output (1/0), for example-that are far more meaningful in real-world usages. As Figure 4 illustrates, a proxy-based approach is the better way to gauge the benefits of Intel Optimized Cloud Stacks.
Gain higher performance in less time
Intel Optimized Cloud Stacks are designed to help developers and architects get the most from the cloud. They can save development and deployment time. Their built-in software optimizations and tuning enable workloads to harness cutting edge Intel technologies and public cloud capabilities. As a result, organizations can meet performance demands successfully, whatever they might be.