Beyond the Von Neumann Wall: Dataflow for Next-Gen Seismic

Recently, at the Rice Energy HPC & AI Conference, conversations around subsurface imaging kept returning to the same fundamental tension: the seismic imaging workloads that guide billions of dollars in exploration underscore the ongoing need for rapid architectural innovation. As our team engaged with geophysicists, operators, and HPC teams working on these challenges daily, one message was unmistakable. Incremental improvements won't close the gap between the HPC and AI capabilities they need and the hardware delivering them.

The global energy transition is well underway. Renewable sources are expanding, electrification is accelerating, and investment in sustainable infrastructure is reaching new heights. Yet the reality is clear. Oil, natural gas, and their derivatives will remain essential to the global economy for decades to come. Transportation, manufacturing, petrochemicals, and countless industries depend on these resources during the transition period and beyond.

This is precisely why responsible, efficient, and sustainable extraction of hydrocarbon resources matters more than ever. The high-performance systems running seismic simulations, often worth billions of dollars, must evolve to meet this challenge. The challenge isn't simply to extract more, it's to extract smarter. Reducing environmental impact, maximizing reservoir recovery, minimizing operational risk, and making every barrel count. At the heart of this effort lies high-performance computing and the seismic simulations that guide exploration and production decisions with multi-billion-dollar implications.

But the computational infrastructure supporting these critical workflows is approaching a fundamental limit. The scale of this challenge is staggering. A recent high-resolution full-waveform inversion for a North Sea survey, executed on a French national supercomputer, consumed approximately 28 million CPU-core hours (equivalent to running 49,152 processor cores continuously for 24 days). Even on Oak Ridge's previous Summit supercomputer, a single full waveform inversion (FWI) 3D seismic imaging run required approximately 25,000 GPU-hours using 384 NVIDIA GPUs (enough energy to power one average U.S. home for 15 months). For operators running these computations on cloud infrastructure or dedicated processing centers, running tests at scale translates to substantial operational costs and energy consumption.

Yet operators routinely see their expensive GPU clusters delivering only 25-50% of theoretical performance on seismic workloads, not because the hardware is inadequate, but rather that seismic applications are notoriously memory-bound. This creates a fundamental architectural mismatch that requires an entirely new approach to overcome. Powerful cores are left idling while waiting for data to arrive from memory.

The Computational Foundation of Modern Energy Exploration

Seismic imaging represents one of the most computationally demanding applications in all of scientific computing. When geophysicists seek to understand the subsurface (mapping hydrocarbon reservoirs, identifying fault structures, characterizing rock properties), they rely on sophisticated wave propagation algorithms that simulate how acoustic and elastic energy travels through the Earth.

These simulations employ a range of advanced numerical methods to solve partial differential equations (PDEs) across three-dimensional grids. The industry utilizes techniques such as Discontinuous Galerkin and Spectral Elements (SEM), but one of their traditional staples is stencil-based high-order finite differences (FD) methods. These solvers are mathematically complex, creating intense demands on memory bandwidth and computation that push HPC systems to their limits.

The computational demands are staggering. While Reverse Time Migration (RTM) is the standard for improving the final image, the process of generating the underlying earth model via Full-Waveform Inversion (FWI) is exponentially more taxing. In fact, FWI can be hundreds or even thousands of times more expensive than RTM. As exploration moves to deeper water, more complex geology, and unconventional reservoirs, the need for high-fidelity models drives these computational requirements to new heights.

These simulations work by dividing the Earth's subsurface into a massive 3D discrete grid of points. To calculate what happens inside this grid, the simulation needs data from the surrounding points. For the level of accuracy geophysicists require, each point needs information from 4 to 16 neighbors in every direction (left/right, forward/back, up/down), depending on the target accuracy of the simulation. In more advanced scenarios (e.g. dispersion-minimizing schemes), more diagonal neighbours might be required as well.

"After discretizing the equations accordingly, the resulting computational operations themselves are straightforward, mostly additions, multiplications, or divisions." The real challenge is the sheer volume of data movement. For every calculation, the processor or accelerator must load dozens of data points from memory, perform a handful of operations, but then repeat them billions of times.

The bottleneck isn't the computation; it's the data movement. As a result, seismic simulations can spend most of their time waiting. Waiting for data to move from memory to the processor, then waiting again for the next batch of data.

For the past decade, the industry has turned to GPU acceleration. And for good reason. GPUs have offered a path to dramatically higher floating-point throughput at attractive performance-per-dollar ratios. Today, the largest seismic processing centers run on thousands of GPUs, processing petabytes of survey data to generate the subsurface images that drive exploration decisions.

The GPU Paradox in Seismic Computing

Despite massive GPU deployments, the oil and gas industry faces an uncomfortable reality. As AI investment by large vendors increases, support for high-precision FP64 FLOPs has plateaued in favor of supporting mixed and lower precision compute for AI. As a result, the industry is paying an AI tax in both watts and dollars. Compounding this is that once GPUs are deployed, they're often achieving only a fraction of theoretical peak performance, while operational costs and complexity continue to climb.

The root cause lies in a fundamental mismatch between GPU architecture and the nature of stencil computations.

Memory Bandwidth Saturation: Stencil codes are inherently memory-bound, not compute-bound. The ratio of floating-point operations to memory operations is relatively low; each computed value requires loading data from multiple neighboring points. Modern datacenter GPUs are designed for compute-intensive workloads where each piece of data loaded from memory can be used in many calculations (many AI operations like batch normalization, RNNs, and transformer inference are also memory-bound). This fundamental mismatch means the expensive compute units sit idle while waiting for data. Modern GPUs deliver floating-point throughput for FP64, the high numerical precision geophysics demands. However, their memory systems can't feed data to the compute units fast enough. The result is that expensive silicon is starving, sitting idle, waiting for data.

Code Portability and Developer Burden: Achieving acceptable performance on GPUs requires extensive optimization. Rewriting algorithms in CUDA or HIP, hand-tuning memory layouts, managing complex hierarchies of shared memory, and optimizing for specific GPU generations. Each new GPU architecture demands new optimization cycles. Seismic processing companies maintain armies of HPC specialists whose primary job is wrestling with GPU complexity and overhead rather than advancing geophysical science.

Power and Cooling Challenges: The latest generation of data center GPUs consumes 1000-1400 watts per chip, with requirements expected to double in the near future. When these GPUs achieve only 25-50% utilization on seismic workloads, the energy cost per useful computation increases proportionally. For operators running thousands of GPU-based systems around the clock, power consumption has become a primary cost driver. Operators running 24/7 processing centers often face power bills that rival the cost of the hardware itself. This is an increasingly problematic metric for an industry under scrutiny for its environmental footprint.

Vendor Lock-In and Software Dependencies: The CUDA ecosystem's dominance creates strategic risk. Seismic processing workflows depend on proprietary frameworks, limiting flexibility and negotiating leverage. When algorithms must be ported to new platforms, the effort can consume years and millions of dollars in engineering time.

Why Seismic Computing Hits the Von Neumann Wall

These GPU challenges are symptoms of a deeper architectural problem, one that dates back 80 years to the foundations of digital computing itself.

The Von Neumann architecture, which underlies virtually all modern processors including GPUs, enforces a rigid separation between memory and processing. Data must be fetched from memory, decoded into instructions, executed by processing units, and results written back. This sequential, repetitive cycle creates an inherent bottleneck.

Traditional processors and GPUs organize silicon around instruction control. Fetching instructions from memory, decoding them, predicting branches, managing cache coherency, and scheduling execution. In a processor optimized for general-purpose computing, this overhead makes sense. But for seismic codes that execute the same stencil operation billions of times in a predictable pattern, this control machinery becomes pure overhead.

The result is architectural inefficiency: in a typical GPU, only a small fraction of the silicon area and power budget goes toward the actual arithmetic operations (the floating-point units performing the wave equation calculations). The majority manages instruction flow and data movement. This is the overhead that provides no value when the computation pattern is known in advance and invariant across billions of grid points.

For stencil-based seismic codes, this architecture is particularly punishing. Each timestep requires the same fundamental operations applied across billions of grid points, yet the hardware spends the vast majority of its energy and cycles managing the process rather than executing it. Data makes countless round-trips between processing cores and memory hierarchies, consuming power and creating latency at every step.

Simply building bigger GPUs or faster memory doesn't solve this fundamental problem, it just moves the bottleneck. The Von Neumann architecture has reached its scaling limits for memory-bound, stencil-dominated scientific workloads.

A New Paradigm: Dataflow Computing

What if computing architecture could be fundamentally reimagined, not as a faster version of 1940s design principles, but as an entirely new approach optimized for how modern scientific codes actually execute?

Dataflow computing inverts the traditional paradigm. Rather than fetching instructions and moving data to processing units, data itself drives computation. Operations execute the moment their input data becomes available, with results flowing directly to downstream operations without returning to main memory. Think of it as a factory assembly line for computation: data flows through a series of specialized units, each performing its operation and immediately passing results to the next stage. No waiting for a central controller to issue instructions, no round-trips to a central memory store between operations. Data moves once, computation happens where the data resides, and the silicon is overwhelmingly dedicated to useful mathematical work.

This architectural shift addresses the core inefficiencies that plague seismic computing on traditional processors by prioritizing a more fluid execution model. By eliminating instruction overhead, the system removes the need for constant instruction fetch and decode cycles because the computation pattern is already known and invariant. This is further enhanced by a direct data flow where values computed at one grid point move immediately to neighboring calculations instead of making redundant round-trips through memory hierarchies.

Consequently, the architecture achieves a high degree of silicon efficiency since the vast majority of chip area is dedicated to actual computation rather than the control machinery required by standard von Neumann designs. While the concept of dataflow computing has been explored in academic research for decades, the modern breakthrough lies in our ability to implement it at scale through the compiler sophistication and runtime intelligence necessary to make it practical for real-world scientific workloads.

The Path Forward

The computational challenges facing seismic imaging aren't insurmountable, but they can't be solved by incremental improvements to existing architectures. The von Neumann bottleneck is fundamental. More cache, faster memory, bigger GPUs. These approaches merely move the problem around rather than solving it. What's needed is architectural innovation. A ground-up rethinking of how computation and data movement interact, designed specifically for the memory-intensive patterns that dominate scientific computing. Dataflow computing offers this path.

By inverting the relationship between data and computation, it directly addresses the inefficiencies that leave expensive GPU compute units starving for data. The question isn't whether this approach works in theory, it's how to implement it in practice, at scale, for production workloads.

In our next post, we'll explore exactly that: how NextSilicon's Maverick-2 brings dataflow computing to seismic workflows, and what this means for the future of efficient energy exploration.

Stay Connected

Stay tuned as we share more details in future blogs and announcements via our Nextletter.

About the Author:

Elad Raz is the founder and CEO of NextSilicon, a company pioneering a radically new approach to HPC architecture that drives the industry forward by solving its biggest, most fundamental problems.

STAY UP
TO SPEED

Subscribe to NextLetter for exclusive insights on our latest innovations, performance benchmarks, white papers, event highlights, and success stories. Stay in the know!