January 22, 2025

Bring-Your-Own-Code with Maverick-2

#BLOGS

ELAD RAZ

Imagine spending 90% of your development time not advancing science, but simply adapting code to run on modern accelerator hardware. This is the reality for many HPC teams today, with scientists and engineers spending months—sometimes years—retrofitting applications instead of driving innovation. But what if you could eliminate that overhead entirely?

For years, the supercomputing industry has been caught in a balancing act: on one side, the promise of extraordinary speedups through accelerated computing; on the other, the steep barriers of application porting to achieve them. While powerful accelerators like GPUs have opened the door to unprecedented performance, they have also forced developers to navigate this gauntlet of porting, optimizing, and rewriting code. The result? Countless hours are spent retrofitting applications instead of advancing science, engineering, and innovation. This challenge has left many HPC users struggling to harness the full potential of modern hardware.

The Challenge: Accelerating HPC Workloads Without Sacrificing Flexibility

As discussed in my recent article, traditional GPUs have been a game-changer for parallel processing in HPC, yet they come with an Achilles' heel: the reliance on proprietary programming models and domain-specific languages (DSLs). Porting applications to GPUs often requires rewriting code from the ground up, adopting new languages like CUDA, oneAPI, or ROCm, and optimizing for GPU-specific memory hierarchies and processing paradigms. This effort is further complicated by the difficulty of debugging highly parallel code and ensuring correct execution across thousands of threads, each potentially following a different execution path. Developers must have a deep understanding of the architectural intricacies of GPUs and their hardware specific optimizations to extract peak performance. Without these specialized skills, even minor inefficiencies can result in significant performance degradation. This process creates steep barriers to entry. Many organizations face months or even years of development effort to adapt and optimize code, requiring significant dependence on highly specialized developers with deep expertise in GPU programming. Additionally, the process often results in vendor lock-in, which limits flexibility and prevents organizations from futureproofing their systems.

Behind these challenges, lie substantial real and opportunity costs for the time and resources they consume. Every HPC code rewrite or porting effort has a cascade of invisible costs. Studies and industry estimates suggest that developers may spend anywhere from 45% to 90% of their total development time managing overhead tasks such as context switching, memory management, and optimizing data transfers. Specifically:

Context Switching: Developing HPC applications often involves juggling multiple frameworks, workflows, and execution models. The mental overhead of shifting between them can siphon off 20% to 40% of a developer’s productive time.
Memory Management: Manually optimizing memory transfers between hosts and accelerators, allocating buffers, and ensuring efficient data movement can devour 15% to 30% of a developer’s efforts.
Data Transfer Optimization: Profiling, debugging, and improving data pathways for minimal latency and maximum throughput can command an additional 10% to 20% of valuable engineering time.

These percentages quickly add up, turning HPC acceleration into an uphill battle before a single result is produced. The net effect is longer time-to-science, delayed insights, and higher operational costs. Compounding this challenge is the fact that teams responsible for developing the scientific models-typically researchers–are often distinct from the teams tasked with optimizing the code for accelerators. This separation adds friction, as it requires extensive coordination between domain experts and highly specialized performance engineers. Transitioning from model development to efficient execution on accelerators becomes not only a technical hurdle but also an organizational one, further increasing costs and delaying outcomes.This separation adds friction, as it requires extensive coordination between domain experts and highly specialized performance engineers. Transitioning from model and code development to efficient execution on accelerators becomes not only a technical hurdle but also an organizational one, further increasing costs and delaying outcomes. In practical terms, if overhead tasks are consuming half or more of your developers’ productive hours, you’d effectively need to double (or even triple) the size of your engineering team just to maintain the same pace—an unsustainable expense that Maverick-2 eliminates out of the box.

BYOC: Breaking Down Barriers with Maverick-2

For too long, organizations have been forced to accept these prohibitive costs and extensive development efforts when adapting their applications to accelerator architectures. NextSilicon’s novel hardware and software solution — Maverick-2 intelligent compute architecture (ICA) — changes this dynamic entirely. Our patented approach and intelligent algorithms mean that Maverick-2 is designed to self-optimize, adapting to the unique characteristics of your application without forcing you to adopt new languages or rewrite significant amounts of code. With the “bring-your-own-code” (BYOC) approach means you can continue using your existing codebase, leaving Maverick-2 to handle the complexity behind the scenes. Once you achieve initial results, users still have the flexibility to manually modify and optimize their code for further fine-tuning or specialized performance improvements, ensuring complete control over the final outcome. By eliminating the need for specialized porting and optimizations, your team can focus on maximizing results for insight, innovation and discovery.

Developers no longer must rewrite code to fit the hardware. With Maverick-2, the code you already have runs unmodified, allowing your team to reclaim valuable engineering time that would otherwise be lost to hardware-specific adjustments. As performance accelerates, your software retains its portability, avoiding vendor lock-in and proprietary DSLs. This flexibility ensures that your HPC investments maintain long-term value, evolving as your needs and objectives grow.

At the heart of Maverick-2 is a self-optimizing intelligence that continually analyzes runtime behavior and automatically tunes performance as the application executes. Gone are the days of guesswork and manual optimizations required by traditional GPU-based acceleration. By managing memory access patterns, data distribution, and workload parallelism automatically, the ICA empowers developers to concentrate on their core objectives—advancing research, solving complex problems, and driving meaningful innovation—rather than wrestling with technical bottlenecks.

These advantages translate directly into faster results, reduced overhead, and significant cost savings, including delivering four times the performance-per-watt of traditional GPUs. Liberated from extensive code adaptation and reliance on specialized HPC experts, your organization can accelerate R&D pipelines and reach insights sooner. Whether you seek shorter development cycles for rapid time-to-market or strive to pioneer scientific breakthroughs, the time once spent adapting your code now fuels deeper exploration.

Ultimately, the combination of near-zero-code-change portability, self-optimizing intelligence, and efficient resource utilization drives higher productivity and more sustainable innovation. With developers no longer mired in overhead tasks, they can hone algorithms, refine modeling techniques, and envision entirely new solutions to today’s most pressing computational problems. Over time, this approach fosters a more iterative, forward-looking HPC landscape —one where you can continually raise the bar with confidence that your accelerator technology will adapt to your evolving applications.

A Future of Unleashed Potential

For too long, HPC users have been forced into a false trade-off: either embrace the complexity of new architectures and languages or miss out on performance gains. With Maverick-2, we’re removing the artificial boundaries between code and acceleration.

By lowering the barriers to HPC acceleration, we will enable the next wave of breakthroughs in climate modeling, drug discovery, and financial risk analysis. Ultimately, this has the potential to reshape industries and create entirely new market opportunities in the process. BYOC is about more than just running your existing applications—it’s about transforming how you think about HPC. By putting your existing code first, you regain control of your time, budget, and strategic direction.

Join the Revolution

As the HPC community embarks on delivering the future, Maverick-2 and BYOC are setting a new standard for accelerating workloads. Free from the burden of code rewrites and vendor lock-in, organizations can confidently chart a path forward. Together, we envision creating an ecosystem where anyone—scientist, engineer, researcher, or business leader—can tap into the full potential of HPC without compromise.

It’s time to rethink what’s possible. With Maverick-2 and BYOC, NextSilicon is giving you the freedom to accelerate your insights, your discoveries, and your innovation—on your terms.

Stay Connected

Stay tuned as we share more details in future blogs and announcements via our Nextletter. Sign up here

About the author:

Elad Raz is the founder and CEO of NextSilicon, a company pioneering a radically new approach to HPC architecture that drives the industry forward by solving its biggest, most fundamental problems.

RELATED INSIGHTS