Documentation

Reference guide for FrameSight reports and metrics.

Runners

A Runner is a dedicated Windows GPU machine operated by Exploding Frame. Each runner replays your RenderDoc captures and extracts per-draw performance counters. You do not install or manage runners. They are part of the FrameSight infrastructure and are always available to your benchmarks.

When you create a benchmark you select which runner(s) to target. Each selected runner becomes one column in the benchmark matrix, so you can compare the same capture across multiple GPU configurations side by side.

Runner status

The public fleet page shows every runner with its current status (online, offline, or maintenance) and hardware specs (GPU model, VRAM, driver version). Check this page if a job is stuck in queued. The target runner is likely offline or in maintenance.

Need a specific GPU configuration that is not in the fleet? Contact us and we will look into adding it.

Captures (.rdc files)

A capture is a RenderDoc .rdc file recorded with RenderDoc. It contains a complete recording of one frame of GPU work — every draw call, every state change, every buffer binding. FrameSight replays the capture on the runner's GPU to extract timing, resource usage, and pipeline counters.

Recording a capture in 5 steps

Download RenderDoc. renderdoc.org/builds — Windows or Linux. Install (~150 MB). FrameSight is tested against RenderDoc 1.40 and newer.
Launch your application under RenderDoc. Open RenderDoc, go to the Launch Application tab, point Executable path at your game's.exe, fill working directory + command-line arguments as you would normally, then click Launch. Your game starts with a small overlay in the corner.
Reach the frame you want to capture. Play through to your worst-case scenario — heaviest fight, densest scene, particle-heavy moment. Take your time, the overlay only matters at the exact moment of step 4.
Press F12. RenderDoc captures the next frame. The overlay flashes; the captured frame appears in RenderDoc's captures list. The hotkey is configurable in Tools → Settings if F12 conflicts.
Save the .rdc and upload it. In RenderDoc, right-click the captured frame → Save → pick a location. You get a single.rdc file (typically 50 MB to 2 GB depending on resources). Drop it into a FrameSight benchmark scene.

Capture what matters. Pick the frame where your game is heaviest: highest draw count, peak VRAM, heavy shader work. A capture of a menu or a loading screen generates a report but the numbers won't reflect your real performance ceiling.

Resolution matters. Capture at the resolution you ship at (or higher). Pixel work scales quadratically with resolution; a 1080p capture of a 4K target under-counts pixel-pipeline pressure.

Avoid in-engine debug menus on screen. Imgui / debug overlays add extra draw calls + state changes to the captured frame and can mask real bottlenecks. Hide them before pressing F12.

One frame is enough. FrameSight extracts per-event GPU counters; a single representative frame holds everything the analysis needs. You don't need to capture a sequence.

Scenes

A scene is a logical group of captures of the same in-game moment. Use one scene per gameplay scenario: boss fight, menu, world map, cinematic. The matrix view then compares captures of that scene across hardware configs and software versions, so a perf regression is one click away.

Tip: a benchmark can hold multiple scenes. When you upload a new capture into an existing scene, the old one is kept as a previous version — useful for tracking perf evolution between game builds. The header letter (A, B, C, …) on each cell maps to the upload order.

Frame Stats

The Frame Stats section shows aggregate counts for the captured frame.

Total GPU time: End-to-end GPU frame time in milliseconds. This is the headline performance number, mapped directly to frame rate (1000 / ms = FPS).
Caveat: RenderDoc replays a captured frame outside the engine's normal scheduling, so replay GPU time is usually higher than what the same frame costs at runtime (no command-buffer overlap, cold caches, replay bookkeeping). Useful corollary: if a replay runs cleanly on a target GPU, the live game is almost guaranteed to do better. Treat replay numbers as a conservative ceiling, not a verdict.
Draw call count: Total number of DrawIndexed / Draw calls. High counts point to CPU overhead or missing instancing.
Dispatch count: Number of compute dispatches. High counts point to heavy post-processing or physics on the GPU.
Vertex count: Total vertices submitted across all draw calls. Dominated by high-poly meshes or unculled geometry.
Primitive count: Triangles rendered. The ratio of primitives to pixels is a quick geometry-vs-fill proxy.

Frame Budget

FrameSight evaluates your frame against a fixed 30 fps budget (33.33 ms). The bottleneck donut in the report shows your current corrected GPU time as a percentage of that budget, with target markers at 60 fps (50 % of the 30 fps budget) and 120 fps (25 %).

The headroom indicator below the chart is the slack you have before missing 30 fps on that hardware. Negative headroom = you're already over budget on this GPU. Positive but small = sensitive to any drift in worst- case frames (particles, foliage, weather).

The corrected total is your raw GPU time minus the per- event RenderDoc replay overhead (which doesn't exist in a real run). See the "raw / overhead removed" line under the Frame GPU time card in any report.

GPU Counters

Hardware performance counters are read directly from the GPU via RenderDoc's counter API (D3D12 / Vulkan backend). Availability depends on GPU vendor and driver version.

Occupancy (%): Ratio of active wavefronts to the hardware maximum. Low occupancy (< 50%) points to register pressure or large LDS allocation limiting parallelism.
IPC (instructions per clock): ALU throughput proxy. A low IPC relative to the shader's theoretical peak signals stalls (memory, texture, or dependency).
Memory bandwidth (GB/s): Read + write bandwidth to VRAM. A value close to the card's peak bandwidth is a strong signal of a memory-bound workload.
L1 / L2 cache hit rate: Higher is better. Low hit rates on texture or buffer reads amplify memory bandwidth pressure.

Bottleneck & Throughput

The Throughput Analysis section identifies which pipeline stage is the primary bottleneck for the captured frame. FrameSight computes a utilisation score (0 to 1) for each stage:

Vertex (VS): geometry throughput. Vertex count against peak triangles/s.
Rasterise / early-Z: pixel throughput. Rendered pixels against peak pixels/s.
Pixel (PS): shader ALU utilisation in the fragment stage.
Texture (TP): texture unit utilisation.
Memory (MEM): VRAM bandwidth against peak.
Compute (CS): compute shader throughput.

The stage with the highest utilisation score is reported as the bottleneck. A balanced workload shows similar scores across stages. An extreme outlier is an optimisation target.

Pass Treemap

The Pass Treemap visualises GPU time by render pass as a proportional area chart. Each block represents one render pass. Its area is proportional to the GPU time that pass consumed.

Cells are coloured by call kind (draw / instanced / indirect / dispatch) so the visual mix reads at a glance: a sea of indirect cells looks different from a mostly-draw scene without needing to check the legend.

Click a tile to drill into the pass and see its individual draw calls, shader programs, and resource bindings.

Density view

The header above the treemap carries a Colour by toggle. Switching to Density recolours every cell by its draw-call density (calls per millisecond). A pale large cell is time-heavy but call-efficient. A small dark cell packs many calls into little time and usually means CPU-side submission overhead. Switch back to Kind to return to the default colouring.

VRAM Breakdown

Shows how GPU memory is distributed across resource types. The pie chart at the top of the section gives the high-level shape; the resources treemap below it lets you drill into individual allocations sized by megabyte.

Textures: Sampled images (diffuse, normal, shadow maps, render targets). Usually the dominant category in content-heavy titles.
Mesh: Vertex and index buffers used as geometry input by the IA stage, plus mesh-shader / GPUScene structured buffers (Nanite-style pipelines). Classified authoritatively when a draw binds the buffer through the Input Assembler; falls back to a name heuristic for mesh-shader pipelines that bypass IA. The 📏 / ◆ badges in the resources treemap mark per-resource hits.
Buffers: Constant buffers, structured / storage buffers, indirect argument buffers, and any other buffer that is not used as geometry input.
Render targets: Framebuffer attachments not shared with the texture category.
Other: Catches the gap between the sum of named allocations above and the GPU's reported total VRAM. Includes pool / queue metadata, internal allocator slack, and anything Vulkan-internal the extractor cannot name. A large “Other” slice on an otherwise small frame points to allocator overhead.

Memory Efficiency

The Memory efficiency card under the Efficiency section breaks down where every memory access landed in the cache hierarchy. The pie chart has three slices.

Hit L1 texture (sage): Share of memory accesses served by the on-chip L1 texture cache. Fastest tier. A bigger sage slice means less round-trip traffic to VRAM, so the GPU spends more time computing and less time waiting on memory.
Hit L2 (sand): Accesses that missed L1 but were served by the larger L2 cache. Still on-chip. Cheaper than DRAM but slower than L1.
Hit DRAM (peach): True cache miss. The access went all the way out to VRAM. A large peach slice combined with a high DRAM bandwidth value usually correlates with the bottleneck banner reading memory-bound.
DRAM bandwidth: Total bytes the frame moved through VRAM, optionally split into read / write components. Compare against the GPU's theoretical peak (printed in the GPU Counters section) to see how close the workload is to memory saturation.

Practical rule: a green-heavy pie with low DRAM bandwidth means the working set fits in cache and the workload is well laid out. A peach-heavy pie with high DRAM bandwidth points at large textures, missing mips, or scattered access patterns; tackle those before optimising shaders.

Texture Analysis

Per-texture statistics for every texture sampled during the frame.

Format: DXGI / Vulkan format (e.g. BC7_UNORM, R8G8B8A8_SRGB). Block-compressed formats (BC1 to BC7, ASTC) cut bandwidth dramatically against uncompressed formats.
Compression ratio: Ratio of uncompressed size to actual on-disk size. BC7 typically achieves 4× over RGBA8. RGBA8 shows 1× (no compression).
Bandwidth saved: Estimated bytes saved per frame relative to RGBA8 at the same resolution. High values mean the compression is paying off.
Recommendations: FrameSight flags uncompressed textures above a size threshold and suggests a suitable BC/ASTC format based on the texture's alpha channel and perceptual quality requirements.
Oversize (📏 badge): A texture is flagged oversize when its pixel count exceeds 4× the screen footprint of every draw that sampled it. Common cause: shipping a 4K asset that only ever appears at a 1080p (or smaller) on-screen size. Fix with a smaller authoring resolution or a mip LOD.

Pixel Pipeline

Overdraw and fill-rate analysis.

Overdraw factor: Average number of times each output pixel was written (shaded). An overdraw of 2× means each pixel was shaded twice on average. Values above 3 to 4× in a non-transparent pass point to missing early-Z or poor front-to-back sorting.
Fill rate (Gpixels/s): Pixels shaded per second across all draw calls. Compared against the GPU's theoretical peak to compute pixel-pipeline utilisation.
Depth complexity: Distribution of per-pixel depth-write count. A bimodal distribution (many pixels at 1×, some at 10×+) points to transparent or particle overdraw.

Draws (Hot Draw Calls)

The top-N most expensive draw calls ranked by GPU time. For each:

GPU time (µs)
Vertex shader + pixel shader identifiers
Primitive count and instance count
Render target format + dimensions
Whether the draw is indirect (ExecuteIndirect / MultiDraw)

Indirect draws (shown as Execute events) are GPU-driven: the draw arguments live in a buffer. FrameSight resolves the buffer at capture time to show real counts.

Grade

The Grade at the top of the report (A / B / C / D / F) summarises how clean the captured frame is across every analysed dimension. The grade is the visible signal. A 0 to 100 numeric score backs it for the curious, surfaced in the Grade card's tooltip and at the very bottom of the report.

The grade is derived from the warnings list. Each warning carries a severity (info / medium / high / critical); the grade letter falls out of the worst severities present. A frame with no warnings grades A; a frame with several criticals grades F.

A high grade is not a free pass. It means “no analysed signal is alarming”, not “this frame is fast”. Always cross-read with the Total GPU time, the bottleneck stage, and the pass treemap before drawing performance conclusions.