nelvp is a cross-platform assembly language for vertex programs in the NeL 3D engine. The CVPParser class parses nelvp source text into an internal representation that drivers translate to hardware-specific formats: ARB_vertex_program for OpenGL, vs_1_1 for Direct3D, or EXT_vertex_shader for legacy ATI hardware. The OpenGL 3.3+ driver includes a nelvp-to-GLSL converter (driver_opengl3_nelvp.cpp) that translates nelvp programs to GLSL 330 at compile time, packing the used constant registers into a per-VP UBO (see GLSL 330 Translation and GL3 Interoperability).
nelvp is based on the ARB_vertex_program / NV_vertex_program instruction set.
Source files:
nel/include/nel/3d/vertex_program_parse.h - Parser and instruction structuresnel/src/3d/vertex_program_parse.cpp - Parser implementation!!VP1.0
# Instructions go here
END
!!VP1.0 (or !!VP1.1; both are accepted, no behavioral difference).END.# and run to the end of the line.;.All instructions operate on 4-component (xyzw) vectors unless noted as scalar.
| Opcode | Syntax | Description |
|---|---|---|
MOV |
MOV dest, src; |
Copy source to destination. |
ARL |
ARL A0.x, src; |
Address register load. Source must be scalar. |
LIT |
LIT dest, src; |
Lighting computation. |
RCP |
RCP dest, src; |
Reciprocal (1/src). Source must be scalar; result is broadcast to all masked components. |
RSQ |
RSQ dest, src; |
Reciprocal square root (1/sqrt(|src|)). Source must be scalar. |
EXP |
EXP dest, src; |
Base-2 exponential (partial precision). Source must be scalar. |
EXPP |
EXPP dest, src; |
Same as EXP; both spellings are accepted. |
LOG |
LOG dest, src; |
Base-2 logarithm. Source must be scalar. |
| Opcode | Syntax | Description |
|---|---|---|
ADD |
ADD dest, src1, src2; |
Component-wise addition. |
MUL |
MUL dest, src1, src2; |
Component-wise multiplication. |
DP3 |
DP3 dest, src1, src2; |
3-component dot product (xyz). Result is written to all masked components. |
DP4 |
DP4 dest, src1, src2; |
4-component dot product (xyzw). Result is written to all masked components. |
DST |
DST dest, src1, src2; |
Distance vector: dest = (1, src1.y*src2.y, src1.z, src2.w). |
MIN |
MIN dest, src1, src2; |
Component-wise minimum. |
MAX |
MAX dest, src1, src2; |
Component-wise maximum. |
SLT |
SLT dest, src1, src2; |
Set on less-than: dest = (src1 < src2) ? 1.0 : 0.0 per component. |
SGE |
SGE dest, src1, src2; |
Set on greater-or-equal: dest = (src1 >= src2) ? 1.0 : 0.0 per component. |
| Opcode | Syntax | Description |
|---|---|---|
MAD |
MAD dest, src1, src2, src3; |
Multiply-add: dest = src1 * src2 + src3. |
12 read/write registers for intermediate calculations. A temporary register must be written before it can be read; the parser tracks which components have been written per register and will reject a program that reads an uninitialized component.
Important: The parser checks initialization based on the register name, not the instruction's mathematical behavior. For example, DP3 R0.w, R1, R1; reads R1 with the default .xyzw swizzle, so the parser requires all four components of R1 to have been written — even though DP3 only uses .xyz mathematically. To avoid this, either write all components early (e.g. DP3 R1, ...; with no write mask to initialize all four at once) or use an explicit .xyz swizzle if the instruction supports it (but note that 3-component swizzles are not valid in nelvp; only 1 or 4 components are accepted). The practical solution is to use one maskless write to initialize the full register before narrower writes refine individual components.
R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11
16 read-only registers carrying vertex attributes, accessed by hardware slot index. Each slot corresponds to a fixed vertex buffer attribute and maps directly to a GLSL layout(location = N) attribute in the GL3 driver.
Input registers are addressed by slot number, not by semantic. The named aliases (OPOS, NRML, etc.) are syntactic conveniences that resolve to fixed indices at parse time: v[OPOS] is identical to v[0], v[TEX0] is identical to v[8], and so on. The hardware does not distinguish between "position" and "slot 0"; it simply reads from whichever slot the vertex declaration has populated.
| Slot | Name | Attribute | GLSL 330 attribute |
|---|---|---|---|
| 0 | OPOS |
Object-space position | layout(location = 0) in vec4 vposition |
| 1 | WGHT |
Blend weight | layout(location = 1) in vec4 vweight |
| 2 | NRML |
Normal | layout(location = 2) in vec4 vnormal |
| 3 | COL0 |
Primary color (0.0--1.0 float; byte colors in the VB are converted by hardware) | layout(location = 3) in vec4 vprimaryColor |
| 4 | COL1 |
Secondary color (0.0--1.0 float; same conversion) | layout(location = 4) in vec4 vsecondaryColor |
| 5 | FOGC |
Fog coordinate | layout(location = 5) in vec4 vfog |
| 6 | -- | Palette skin (no named alias; use v[6]) |
layout(location = 6) in vec4 vpaletteSkin |
| 7 | -- | Reserved/empty in nelvp (no named alias; use v[7]). The GL3 driver assigns this slot to tangent. |
layout(location = 7) in vec4 vtangent |
| 8 | TEX0 |
Texture coordinate 0 | layout(location = 8) in vec4 vtexCoord0 |
| 9 | TEX1 |
Texture coordinate 1 | layout(location = 9) in vec4 vtexCoord1 |
| 10 | TEX2 |
Texture coordinate 2 | layout(location = 10) in vec4 vtexCoord2 |
| 11 | TEX3 |
Texture coordinate 3 | layout(location = 11) in vec4 vtexCoord3 |
| 12 | TEX4 |
Texture coordinate 4 | layout(location = 12) in vec4 vtexCoord4 |
| 13 | TEX5 |
Texture coordinate 5 | layout(location = 13) in vec4 vtexCoord5 |
| 14 | TEX6 |
Texture coordinate 6 | layout(location = 14) in vec4 vtexCoord6 |
| 15 | TEX7 |
Texture coordinate 7 | layout(location = 15) in vec4 vtexCoord7 |
Syntax: v[0], v[15], v[OPOS], v[TEX0], etc.
The input slot assignments are identical between nelvp and the GL3 TAttribOffset enum, with the single exception of slot 7: nelvp leaves it as a reserved gap (IEmpty), while the GL3 driver assigns it to tangent data. A nelvp program can read v[7], but the slot carries no conventional attribute in the legacy drivers. See the historical note below for why slots 6 and 7 exist as gaps.
Write-only registers for vertex program results. Although output registers are addressed by semantic name (o[HPOS], o[COL0], etc.), each name maps to a fixed hardware slot. The semantic names are not dynamically bound; they are fixed aliases for slot indices, just like the input register names. The distinction from input registers is purely syntactic: output registers must be addressed by name (the parser does not accept o[0]), whereas input registers accept both forms.
The 16-slot output bank has the same 8+8 structure as the input bank: slots 0--7 hold position, colors, fog, and point size; slots 8--15 hold texture coordinates. Slot 7 is not exposed in nelvp (nor in ARB_vertex_program or D3D vs_1_1), creating a gap. The nelvp output enum numbers its 15 named registers 0--14 sequentially, hiding this gap; the hardware slot column below shows the actual slot indices.
| Name | HW Slot | Output | Notes |
|---|---|---|---|
HPOS |
0 | Clip-space (homogeneous) position | Must be written by every program. Undergoes perspective division and viewport transform after the VP. Maps to gl_Position in GLSL. |
COL0 |
1 | Primary color | Float values, unclamped inside the VP. The rasterizer clamps to [0, 1] after interpolation. |
COL1 |
2 | Secondary color | Same clamping behavior as COL0. |
BFC0 |
3 | Back-face primary color | Not supported on all implementations. Used for two-sided lighting. |
BFC1 |
4 | Back-face secondary color | Not supported on all implementations. |
FOGC |
5 | Fog coordinate | Only the x component is used. D3D driver strips write masks on this register. |
PSIZ |
6 | Point size | Only the x component is used. Maps to gl_PointSize in GLSL. |
| (none) | 7 | (not exposed) | Reserved in nelvp/ARB/D3D. The GL3 builtin VP uses this slot for tangent. |
TEX0 |
8 | Texture coordinate 0 | General-purpose vec4 output. |
TEX1 |
9 | Texture coordinate 1 | |
TEX2 |
10 | Texture coordinate 2 | |
TEX3 |
11 | Texture coordinate 3 | |
TEX4 |
12 | Texture coordinate 4 | |
TEX5 |
13 | Texture coordinate 5 | |
TEX6 |
14 | Texture coordinate 6 | |
TEX7 |
15 | Texture coordinate 7 |
Syntax: o[HPOS], o[COL0], o[TEX3], etc.
Of the 15 named outputs, two (o[HPOS] and o[PSIZ]) become GLSL built-in outputs (gl_Position and gl_PointSize) rather than interpolated varyings. The remaining 13 (11 without back-face colors) are user varyings that the nelvp→GLSL converter remaps to the GL3 builtin pixel program's varying locations (see GLSL 330 Translation and GL3 Interoperability).
96 constant registers (indices 0--95) set by the application before the program runs.
Direct access: c[0], c[42], c[95].
Indexed access via the address register: c[A0.x], c[A0.x + 16], c[A0.x - 8]. The offset must be in the range -64 to +63.
A single scalar register used for dynamic indexing into the constant register file. It can only be loaded with the ARL instruction and can only be read as part of a constant register index expression (c[A0.x + offset]).
The nelvp/ARB register layout has 16 input slots and 16 output slots, but two input slots (6--7) have no conventional attribute mapping, one output slot (7) is not exposed, and the nelvp output enum hides the gap by numbering its 15 named registers sequentially. The reason is a hardware design choice by NVIDIA.
NVIDIA's NV_vertex_program extension (GeForce 3 / NV20, 2001) aligned the 8 texture coordinates to the upper half of the 16-register bank: slots 8--15 for both inputs and outputs. This gives a clean 8+8 hardware partition. The lower bank (0--7) holds position, colors, fog, point size, and similar per-vertex attributes, but only 6 of those slots (0--5) had well-defined uses, leaving slots 6 and 7 as gaps.
D3D8 used a completely different, densely packed numbering that had no such gap:
| Slot | D3D8 (D3DVSDE_*) |
NV/ARB |
|---|---|---|
| 0 | Position | Position |
| 1 | BlendWeight | Weight |
| 2 | BlendIndices | Normal |
| 3 | Normal | PrimaryColor |
| 4 | PSize | SecondaryColor |
| 5 | Diffuse | FogCoord |
| 6 | Specular | (no mapping) |
| 7 | TexCoord0 | (no mapping) |
| 8 | TexCoord1 | TexCoord0 |
| 9 | TexCoord2 | TexCoord1 |
| 10 | TexCoord3 | TexCoord2 |
| 11 | TexCoord4 | TexCoord3 |
| 12 | TexCoord5 | TexCoord4 |
| 13 | TexCoord6 | TexCoord5 |
| 14 | TexCoord7 | TexCoord6 |
| 15 | Position2 | TexCoord7 |
| 16 | Normal2 | (beyond 16 slots) |
The D3DVSDE_* values are fixed-function semantic identifiers, not hardware register indices. D3D8 packed texcoords starting at slot 7 with no gap, and defined D3DVSDE_POSITION2 = 15 and D3DVSDE_NORMAL2 = 16 for vertex tweening, exceeding the 16-register hardware limit entirely. With programmable shaders, the application's vertex declaration explicitly maps elements to v0--v15. NVIDIA's NV/ARB layout, by contrast, is the hardware register layout (16 slots, 8+8 bank split). NeL's nelvp follows the NVIDIA/ARB layout, and the D3D driver remaps register indices when translating nelvp to vs_1_1 assembly.
The GL3 driver's TAttribOffset enum reclaims the empty slots: slot 6 is used by NeL for PaletteSkin (blend indices), and slot 7 is assigned to tangent, filling in the hardware gaps that NVIDIA's original design left open.
The general form of an operand:
[-] register [.swizzle_or_mask]
Source operands may be prefixed with - to negate the value. Negation is not allowed on destination operands.
ADD R2, -R0, R1; # valid: R2 = -R0 + R1
A swizzle reorders or replicates components of a source operand. Append . followed by exactly 1 or exactly 4 component letters (x, y, z, w). Two or three letters are not accepted.
.x means .xxxx, .w means .wwww, etc..yzxw, .xxzz, etc..xyzw).Instructions that require a scalar source (ARL, RCP, RSQ, EXP/EXPP, LOG) enforce that the swizzle is scalar (all four components the same).
R0.x # broadcast x to all 4 components
R0.yzxw # reorder: y->x, z->y, x->z, w->w
c[4].w # broadcast w
A write mask selects which components of the destination are updated. Append . followed by 1 to 4 component letters (x, y, z, w), in order, with no duplicates.
.xyzw).o[HPOS].x # write x only
R1.xy # write x and y
o[TEX0].xyz # write x, y, and z
R0.xyzw # write all (same as no suffix)
A single instruction may not read from two different constant registers, and may not read from two different input registers. Reading the same constant or input register in multiple source operands is fine.
# OK: same constant register in both sources
MUL R0, c[4], c[4];
# OK: same input register with different swizzles
MUL R0, v[0].x, v[0].y;
# ERROR: two different constants
ADD R0, c[0], c[1];
# ERROR: two different input registers
ADD R0, v[0], v[2];
Workaround: use a temporary register to hold the value of one operand.
MOV R0, c[1];
ADD R1, c[0], R0;
MOV, MUL, ADD, MAD, DP3, DP4, DST, LIT, MIN, MAX, SLT, SGE, RSQ, EXP/EXPP, LOG, RCP: destination must be a temporary register (R0--R11) or an output register (o[...]).ARL: destination must be A0.x.Writing to constant registers (c[...]) or input registers (v[...]) is never allowed.
Source operands can be: temporary registers, constant registers, or input registers. Output registers and the address register (except as part of c[A0.x + offset]) cannot be used as sources.
This section documents the precise mathematical behavior of each nelvp instruction, its backend mappings, edge-case handling, and equivalent GLSL code for porting.
Backend mapping key:
In the GLSL examples below, src refers to a vec4 source value (after swizzle and negation are applied), and dest is the vec4 destination. Write masks are handled separately by only assigning to the masked components.
Semantics:
dest = src
Backends: ARB MOV, D3D mov.
Special cases: None. Straightforward component-wise copy.
GLSL:
dest = src;
Semantics:
A0.x = floor(src.s) // s = the scalar component selected by swizzle
Backends: ARB ARL, D3D mov (D3D uses mov for address register loads; the driver rewrites ARL to mov).
Special cases: The result is always the floor (round toward negative infinity). The loaded value is used as an integer index for c[A0.x + offset] constant access. Reads outside the valid constant range (0--95) return (0, 0, 0, 0).
GLSL:
int A0x = int(floor(src.x)); // assuming .x swizzle
// Then use A0x as: c[A0x + offset]
Semantics:
dest.c = src1.c + src2.c // for each component c in {x, y, z, w}
Backends: ARB ADD, D3D add.
Special cases: Standard IEEE-754 float addition. (+Inf) + (-Inf) = NaN.
GLSL:
dest = src1 + src2;
Semantics:
dest.c = src1.c * src2.c // for each component c
Backends: ARB MUL, D3D mul.
Special cases: Standard IEEE-754 float multiplication. 0.0 * Inf = NaN (not guaranteed to be 0).
GLSL:
dest = src1 * src2;
Semantics:
dest.c = src1.c * src2.c + src3.c // for each component c
Backends: ARB MAD, D3D mad.
Special cases: Same as MUL followed by ADD. No fused multiply-add guarantee (intermediate rounding may occur).
GLSL:
dest = src1 * src2 + src3;
// or: dest = fma(src1, src2, src3); // if fused precision is desired
Semantics:
float t = src1.x * src2.x + src1.y * src2.y + src1.z * src2.z;
dest.x = dest.y = dest.z = dest.w = t; // replicated to all components
Backends: ARB DP3, D3D dp3.
Special cases: The w components of the sources are ignored. The scalar result is replicated to all four destination components (write mask selects which are actually written).
GLSL:
dest = vec4(dot(src1.xyz, src2.xyz));
Semantics:
float t = src1.x * src2.x + src1.y * src2.y + src1.z * src2.z + src1.w * src2.w;
dest.x = dest.y = dest.z = dest.w = t; // replicated
Backends: ARB DP4, D3D dp4.
Special cases: Scalar result replicated to all four components.
GLSL:
dest = vec4(dot(src1, src2));
Semantics:
dest.x = 1.0;
dest.y = src1.y * src2.y;
dest.z = src1.z;
dest.w = src2.w;
Backends: ARB DST, D3D dst.
Special cases: None beyond normal float arithmetic. The instruction is designed to be used with src1 = (_, d*d, d*d, _) and src2 = (_, 1/d, _, 1/d), producing (1, d, d*d, 1/d) for distance attenuation polynomial evaluation via a subsequent DP4 against attenuation coefficients.
GLSL:
dest = vec4(1.0, src1.y * src2.y, src1.z, src2.w);
Semantics:
dest.c = min(src1.c, src2.c) // for each component c
Backends: ARB MIN, D3D min.
Special cases: NaN behavior is implementation-dependent; GLSL min may propagate or suppress NaN depending on the GPU.
GLSL:
dest = min(src1, src2);
Semantics:
dest.c = max(src1.c, src2.c) // for each component c
Backends: ARB MAX, D3D max.
Special cases: Same NaN caveat as MIN.
GLSL:
dest = max(src1, src2);
Semantics:
dest.c = (src1.c < src2.c) ? 1.0 : 0.0 // for each component c
Backends: ARB SLT, D3D slt.
Special cases: If either operand is NaN, the comparison returns false (0.0) on D3D. ARB behavior with NaN is undefined.
GLSL:
dest = vec4(lessThan(src1, src2));
// or component-wise:
dest.x = (src1.x < src2.x) ? 1.0 : 0.0;
dest.y = (src1.y < src2.y) ? 1.0 : 0.0;
dest.z = (src1.z < src2.z) ? 1.0 : 0.0;
dest.w = (src1.w < src2.w) ? 1.0 : 0.0;
// or using step (note reversed argument order):
dest = vec4(1.0) - step(src2, src1);
Semantics:
dest.c = (src1.c >= src2.c) ? 1.0 : 0.0 // for each component c
Backends: ARB SGE, D3D sge.
Special cases: Same NaN caveat as SLT.
GLSL:
dest = vec4(greaterThanEqual(src1, src2));
// or component-wise:
dest.x = (src1.x >= src2.x) ? 1.0 : 0.0;
// ...
// or using step:
dest = step(src2, src1);
Semantics:
float s = src.s; // scalar component selected by swizzle
float t;
if (s == 1.0) t = 1.0; // exactly 1.0, no rounding error
else if (s == 0.0) t = +Inf; // positive infinity
else t = 1.0 / s;
dest.x = dest.y = dest.z = dest.w = t; // replicated
Backends: ARB RCP, D3D rcp.
Special cases:
GLSL:
// Direct translation (GLSL division handles 0→Inf on most GPUs):
dest = vec4(1.0 / src.x);
// Note: GLSL does not guarantee 1.0/1.0 == exactly 1.0 the way
// ARB_vertex_program does, but in practice modern GPUs honor this.
Semantics:
float s = abs(src.s); // absolute value of scalar component
float t;
if (s == 1.0) t = 1.0; // exactly 1.0
else if (s == 0.0) t = +Inf; // positive infinity
else t = 1.0 / sqrt(s);
dest.x = dest.y = dest.z = dest.w = t; // replicated
Backends: ARB RSQ, D3D rsq.
Special cases:
GLSL:
dest = vec4(inversesqrt(abs(src.x)));
// Caution: GLSL inversesqrt(0.0) is undefined behavior per the spec,
// though most GPUs return +Inf. GLSL inversesqrt does NOT take
// the absolute value automatically -- you must use abs() explicitly
// to match the nelvp/ARB behavior with negative inputs.
Semantics (vs_1_1 / ARB_vertex_program):
float s = src.s; // scalar component
dest.x = pow(2.0, floor(s)); // 2 raised to the integer part
dest.y = s - floor(s); // fractional part (== fract(s))
dest.z = pow(2.0, s); // 2^s, approximate (~10-bit precision)
dest.w = 1.0;
Backends: ARB EXP, D3D expp.
Special cases:
exp2()..z component is the actual exponential but only at partial precision (~10 bits, i.e. 2^-11 maximum relative error)..x and .y components together reconstruct the value: dest.x * exp2(dest.y) ≈ exp2(s).Cross-platform divergence: In D3D vs_2_0+, expp changes behavior and replicates 2^s to all four components. The multi-component behavior only applies to vs_1_1.
Typical usage in nelvp: The fractional part (.y) is commonly used for LUT indexing: EXP R0.y, R0.x; extracts fract(R0.x) into R0.y (as seen in the vegetation wind animation example).
GLSL (full multi-component equivalent):
float s = src.x; // scalar swizzle
dest.x = exp2(floor(s));
dest.y = fract(s); // == s - floor(s)
dest.z = exp2(s); // full precision in GLSL
dest.w = 1.0;
GLSL (if only using .y for fract, as is common):
dest.y = fract(src.x);
GLSL (if only using .z for exp2):
dest.z = exp2(src.x);
Semantics (ARB_vertex_program):
float s = abs(src.s); // absolute value of scalar component
if (s != 0.0) {
float e = floor(log2(s));
dest.x = e; // exponent (integer part of log2)
dest.y = s / exp2(e); // mantissa, in range [1.0, 2.0)
dest.z = log2(s); // approximate log2 (~10-bit precision)
dest.w = 1.0;
} else {
dest.x = -Inf;
dest.y = 1.0; // implementation-dependent
dest.z = -Inf;
dest.w = 1.0;
}
Backends: ARB LOG, D3D log.
Special cases:
Cross-platform divergence: The D3D log instruction (full precision, 10 instruction slots) replicates log2(abs(src)) to all four components: it does NOT produce the multi-component (exponent, mantissa, log2, 1.0) split. The nelvp header warns: "You can only expect to have dest.z = log2(abs(src.w))" across all backends.
GLSL (full multi-component equivalent, matching ARB):
float s = abs(src.x); // scalar swizzle, absolute value
if (s != 0.0) {
float e = floor(log2(s));
dest.x = e;
dest.y = s / exp2(e); // == s * exp2(-e)
dest.z = log2(s);
dest.w = 1.0;
} else {
dest = vec4(-1.0 / 0.0, 1.0, -1.0 / 0.0, 1.0); // -Inf
}
GLSL (portable; only dest.z is reliable across backends):
dest.z = log2(abs(src.x));
Semantics:
// Input: src.x = N·L (diffuse dot product)
// src.y = N·H (specular dot product)
// src.z = (ignored)
// src.w = specular exponent
dest.x = 1.0; // ambient term
dest.w = 1.0; // constant
float power = clamp(src.w, -128.0 + epsilon, 128.0 - epsilon);
if (src.x > 0.0) {
dest.y = src.x; // diffuse term
if (src.y > 0.0)
dest.z = pow(src.y, power); // specular term
else
dest.z = 0.0;
} else {
dest.y = 0.0;
dest.z = 0.0;
}
Backends: ARB LIT, D3D lit.
Special cases:
dest.z) is forced to 0.0 whenever src.x <= 0.0, regardless of src.y. This models the physical constraint that a surface facing away from the light has no specular highlight.src.y is effectively max(src.y, 0.0) before the pow().src.y and power are 0, the result is implementation-dependent (could be 1.0 per IEEE pow or 0.0 per clamping).pow() is allowed to be computed as exp2(power * log2(src.y)), introducing reduced precision (roughly one bit of error per 8-bit color channel; i.e. the result may be off by ~1/256 in the 0.0--1.0 range).GLSL:
float NdotL = src.x;
float NdotH = src.y;
float power = clamp(src.w, -128.0, 128.0);
dest.x = 1.0;
dest.y = max(NdotL, 0.0);
dest.z = (NdotL > 0.0) ? pow(max(NdotH, 0.0), power) : 0.0;
dest.w = 1.0;
| nelvp | ARB assembly | D3D vs_1_1 | Operands | Scalar input | Replicated output |
|---|---|---|---|---|---|
MOV |
MOV |
mov |
1 | no | no |
ARL |
ARL |
mov |
1 | yes | -- (writes A0.x) |
ADD |
ADD |
add |
2 | no | no |
MUL |
MUL |
mul |
2 | no | no |
MAD |
MAD |
mad |
3 | no | no |
DP3 |
DP3 |
dp3 |
2 | no | yes |
DP4 |
DP4 |
dp4 |
2 | no | yes |
DST |
DST |
dst |
2 | no | no (per-component) |
MIN |
MIN |
min |
2 | no | no |
MAX |
MAX |
max |
2 | no | no |
SLT |
SLT |
slt |
2 | no | no |
SGE |
SGE |
sge |
2 | no | no |
RCP |
RCP |
rcp |
1 | yes | yes |
RSQ |
RSQ |
rsq |
1 | yes | yes |
EXP/EXPP |
EXP |
expp |
1 | yes | no (4 different outputs) |
LOG |
LOG |
log |
1 | yes | divergent (see above) |
LIT |
LIT |
lit |
1 | no | no (4 different outputs) |
dest.z = log2(abs(src)) is guaranteed across all implementations. The ARB backend produces multi-component output (exponent, mantissa, log2, 1.0); the D3D backend replicates log2 to all four components.The nelvp→GLSL converter translates nelvp programs into GLSL 330 vertex shaders for the OpenGL 3.3+ driver. This section documents how nelvp registers map to GLSL constructs, the varying layout used by converted programs, and the resulting limitations relative to the GL3 builtin vertex program.
Input register mapping is straightforward: each nelvp slot index becomes a GLSL layout(location = N) attribute at the same index. The hardware slot is the location; the semantic names are irrelevant.
| nelvp | Slot | GLSL declaration | Notes |
|---|---|---|---|
v[OPOS] / v[0] |
0 | layout(location = 0) in vec4 vposition; |
|
v[WGHT] / v[1] |
1 | layout(location = 1) in vec4 vweight; |
|
v[NRML] / v[2] |
2 | layout(location = 2) in vec4 vnormal; |
|
v[COL0] / v[3] |
3 | layout(location = 3) in vec4 vprimaryColor; |
0.0--1.0 float |
v[COL1] / v[4] |
4 | layout(location = 4) in vec4 vsecondaryColor; |
|
v[FOGC] / v[5] |
5 | layout(location = 5) in vec4 vfog; |
|
v[6] |
6 | layout(location = 6) in vec4 vpaletteSkin; |
|
v[7] |
7 | layout(location = 7) in vec4 vtangent; |
Reserved/IEmpty in nelvp; tangent in GL3 |
v[TEX0]--v[TEX7] |
8--15 | layout(location = 8..15) in vec4 vtexCoordN; |
These assignments are shared with the GL3 TAttribOffset enum and the GL3 builtin VP. There is no conflict on the input side: all vertex programs (nelvp-converted and builtin) read from the same attribute locations.
The one oddity is slot 7. In nelvp it is IEmpty, readable but carrying no conventional data. The GL3 driver assigns tangent data to slot 7 via TAttribOffset. A nelvp program that reads v[7] will receive tangent data on the GL3 driver, which may or may not be what the original program expected; in practice, no existing nelvp program reads v[7].
The converter remaps nelvp output registers to match the GL3 builtin pixel program's varying layout. Rather than emitting each varying at its nelvp hardware slot index, the converter maps outputs to the locations the builtin PP expects, so that converted programs can pair with the builtin mega PP directly. Two outputs become GLSL built-ins instead of varyings.
| nelvp output | HW Slot | GLSL output | Notes |
|---|---|---|---|
o[HPOS] |
0 | gl_Position |
Built-in; no varying location consumed. |
o[COL0] |
1 | layout(location = 3) smooth out vec4 diffuseColor; |
Remapped from slot 1 to builtin PP's diffuse location. |
o[COL1] |
2 | layout(location = 4) smooth out vec4 specularColor; |
Remapped from slot 2 to builtin PP's specular location. |
o[BFC0] |
3 | (not emitted) | Not supported on GL3. |
o[BFC1] |
4 | (not emitted) | Not supported on GL3. |
o[FOGC] |
5 | layout(location = 5) smooth out vec4 fog; |
Passed through if written. Not consumed by builtin PP; see Fog Handling. |
o[PSIZ] |
6 | gl_PointSize |
Built-in; no varying location consumed. |
| (none) | 7 | (not accessible from nelvp) | Slot 7 is never written by nelvp programs. |
o[TEX0]--o[TEX7] |
8--15 | layout(location = 8..15) smooth out vec4 texCoordN; |
Direct 1:1 mapping. Matches builtin PP. |
In addition to the nelvp outputs, the converter synthesizes an eye-space position varying (ecPos) at location 0. After the translated program body, the converter appends ecPos = inverseProjectionBasis * gl_Position, deriving NeL-space eye position from the final clip-space output. This correctly reflects any position modifications the VP made (wind displacement, skinning, geomorphing, etc.). The inverseProjectionBasis matrix (inv(Projection * ChangeBasis)) is a field in the NlCamera UBO, computed by the driver in stageCameraUBO() whenever the projection or view matrix changes (see Camera UBO and ecPos Synthesis).
A converted nelvp program's GLSL preamble looks like:
#version 330
#extension GL_ARB_separate_shader_objects : enable
out gl_PerVertex { vec4 gl_Position; float gl_PointSize; };
layout(location = 0) smooth out vec4 ecPos; // synthesized eye-space position for fog
layout(location = 3) smooth out vec4 diffuseColor; // o[COL0]
layout(location = 4) smooth out vec4 specularColor; // o[COL1]
layout(location = 5) smooth out vec4 fog; // o[FOGC], if written
layout(location = 8) smooth out vec4 texCoord0; // o[TEX0]
layout(location = 9) smooth out vec4 texCoord1; // o[TEX1]
// ... etc., only for outputs actually written by the program
// Converter epilogue (appended after translated instructions):
// ecPos = inverseProjectionBasis * gl_Position;
Because the converter remaps nelvp outputs to the builtin PP's varying locations, a converted nelvp vertex program can be paired directly with the GL3 builtin mega pixel program for all standard rendering modes (unlit, vertex-lit, lightmapped, textured).
nelvp vertex programs that need custom fragment processing (such as the tangent-space per-pixel lighting example below) are paired with application-specific pixel programs, which must declare matching in varyings at the same locations.
The GL3 builtin VP uses a different varying layout from the nelvp hardware slot assignments. The builtin VP computes lighting in the vertex shader and outputs semantic results (lit diffuse, lit specular, eye-space position, world-space position, raw vertex color) rather than passing through raw register values. The converter bridges this gap by remapping nelvp outputs to the builtin PP's expected locations. The following table shows the underlying difference:
| Location | GL3 builtin VP varying | nelvp output (original slot) | Converter mapping |
|---|---|---|---|
| 0 | ecPos (eye-space position, for fog and clip planes) |
o[HPOS] → gl_Position (slot 0) |
Synthesized: inverseProjectionBasis * gl_Position via camera UBO |
| 1 | vertexColor (raw vertex color for PPL modulation) |
o[COL0] (primary color, slot 1) |
Remapped to location 3 |
| 2 | normal (world-space normal) |
o[COL1] (secondary color, slot 2) |
Remapped to location 4 |
| 3 | diffuseColor (lit diffuse result) |
o[BFC0] (back-face; not supported) |
← receives o[COL0] |
| 4 | specularColor (lit specular result) |
o[BFC1] (back-face; not supported) |
← receives o[COL1] |
| 5 | worldPos (world-space position, for PPL) |
o[FOGC] (slot 5) |
Passed through as fog; builtin PP computes fog from ecPos, PPL from worldPos |
| 6 | paletteSkin |
o[PSIZ] → gl_PointSize (slot 6) |
(not a varying in either system) |
| 7 | tangent (world-space) |
(not exposed in nelvp) | Not emitted |
| 8--15 | texCoord0--texCoord7 |
o[TEX0]--o[TEX7] |
Direct match |
The converter bridges the layout gap at locations 0--5 by remapping nelvp outputs and synthesizing ecPos. Locations 8--15 match directly.
The nelvp output model predates certain GL3 builtin features. While the converter handles the varying layout, the following features are not available through converted nelvp programs:
Per-pixel lighting. The builtin PP's PPL mode reads worldPos (location 5, world-space position), normal (location 2), and vertexColor (location 1) for per-pixel Blinn-Phong lighting. The nelvp model has no equivalent outputs: o[COL0] carries the final (typically pre-lit or unlit) color, and there is no world-space position output. Since the legacy ARB and D3D drivers do not support builtin PPL either, this is not a regression. nelvp programs that need per-pixel lighting implement it entirely within their own VP/PP pair (see the tangent-space example below).
Builtin tangent-space normal mapping. The builtin VP outputs a world-space tangent at location 7 for the builtin PP's normal mapping. The nelvp model does not expose slot 7. nelvp programs that perform tangent-space normal mapping do so by transforming the light vector into tangent space in the VP and passing the result via texture coordinate registers, a self-contained approach that does not depend on a dedicated tangent varying.
Back-face colors. o[BFC0] and o[BFC1] (hardware slots 3--4) are not supported on the GL3 driver. Programs using two-sided lighting via back-face color outputs must be rewritten.
These are inherent to the nelvp register model's design era. Programs requiring these features must provide native GLSL 330 sources via IProgram::glsl330v.
The builtin PP computes fog from ecPos (eye-space position, location 0), not from a fog varying. The builtin VP always outputs ecPos = modelView * vposition (eye-space). For nelvp-converted programs, the converter synthesizes ecPos by appending ecPos = inverseProjectionBasis * gl_Position after the translated program body. This derives NeL-space eye position from the VP's final clip-space output, correctly capturing any position modifications the VP made (wind displacement, skinning, geomorphing, etc.). The inverseProjectionBasis matrix (inv(_GLProjMat * _ChangeBasis)) is a field in the NlCamera UBO, computed by the driver in stageCameraUBO() whenever the projection or view matrix changes. The converted program declares UsesCameraUBO = true, so the camera UBO header is automatically prepended during compilation.
The ecPos must be in NeL eye space (not GL eye space). The builtin PP fog function uses abs(ecPos.y / ecPos.w) as the forward depth, relying on NeL's Y-forward convention. Using inv(_GLProjMat) alone would produce GL eye space (Y = up), causing fog to be computed along the vertical axis instead of the forward axis — resulting in no visible fog.
Deriving ecPos from gl_Position (the final clip-space result) is critical for correctness with VP-modified positions: using ModelView * v[0] (the raw input position) would be wrong for programs that modify the position (landscape geomorphing, vegetation wind, etc.), and would also fail with the _PZBCameraPos precision trick since v[0] hasn't been adjusted.
The fog equation (linear, exponential, or exponential squared), fog parameters (start, end, density), and fog color are applied by the builtin PP using uniforms from the same buffer. No fog-specific logic is needed in the converted VP.
The nelvp o[FOGC] output is passed through at location 5 if the program writes it, but is not consumed by the builtin PP. On the legacy ARB and D3D drivers, o[FOGC] fed the fixed-function fog pipeline; the GL3 builtin PP replaces this with the ecPos-based computation. Note that the builtin VP uses location 5 for worldPos (world-space position for PPL), but there is no conflict: nelvp programs are never paired with PPL-enabled pixel programs, so the fog pass-through and worldPos varying are in mutually exclusive shader variants.
| Aspect | Compatible? | Notes |
|---|---|---|
| Input attributes (all 16 slots) | Yes | 1:1 mapping; location = slot index. |
| Texture coordinate outputs (slots 8--15) | Yes | 1:1 mapping for both nelvp and builtin VP. |
| Fog | Yes | Converter synthesizes ecPos from gl_Position; builtin PP computes fog from it. o[FOGC] passed through but not consumed. |
| Color outputs | Yes | Converter remaps o[COL0] → diffuseColor (location 3), o[COL1] → specularColor (location 4). |
| Position/point size | Yes | Both map to gl_Position / gl_PointSize. |
| Back-face colors | No | o[BFC0]/o[BFC1] not supported on GL3. |
| Tangent (slot 7) | N/A | Not exposed in nelvp; nelvp programs use texcoord registers for tangent-space data. |
| Per-pixel lighting (worldPos, vertexColor) | N/A | Not available through nelvp; not supported on legacy drivers either. |
This section documents non-obvious issues encountered when converting nelvp programs to GLSL 330 for the GL3 driver. These affect both the automated converter and anyone hand-porting nelvp programs to GLSL.
The old OpenGL driver applied the NeL→GL basis change (changeBasis: X→X, Y→-Z, Z→Y) inside setupViewMatrix, baking it into _ViewMtx. As a result, _ModelViewMatrix = _ViewMtx * model was already in GL eye space (Y up, Z back).
The GL3 driver keeps the basis change in a separate _ChangeBasis matrix and does not bake it into _ViewMtx. This means _ModelViewMatrix stays in NeL world space (X right, Y forward, Z up). The basis change is only applied explicitly when constructing projection-related matrices:
| Matrix | Old GL driver | GL3 driver |
|---|---|---|
| ModelView | changeBasis * view * model (GL eye space) |
view * model (NeL space) |
| Projection | _GLProjMat (raw) |
_GLProjMat * _ChangeBasis |
| MVP | _GLProjMat * _ModelViewMatrix |
_GLProjMat * _ChangeBasis * _ModelViewMatrix |
For regular GLSL programs this is transparent — MVP produces the same clip-space result either way. But nelvp programs that use ModelView separately (for eye-space lighting, normal transformation, fog direction, etc.) will get values in the wrong coordinate system if ModelView is passed without the basis change.
The converter handles this by prepending _ChangeBasis to ModelView in the nelvp setUniformMatrix path, and stripping it from Projection (since ModelView already includes it):
nelvp ModelView = _ChangeBasis * _ModelViewMatrix
nelvp Projection = _GLProjMat (no basis; MV has it)
nelvp MVP = _GLProjMat * _ChangeBasis * _ModelViewMatrix (same as GLSL path)
Symptom if missed: The geometry appears mostly clipped out or invisible, with only edges from the back side visible. Depth and face winding are computed along the wrong axis.
nelvp programs transform positions with four consecutive DP4 instructions:
DP4 o[HPOS].x, c[0], R4; # dot(row0, pos)
DP4 o[HPOS].y, c[1], R4; # dot(row1, pos)
DP4 o[HPOS].z, c[2], R4; # dot(row2, pos)
DP4 o[HPOS].w, c[3], R4; # dot(row3, pos)
Each constant register c[i] holds one row of the matrix. But CMatrix is column-major: M[0..3] is the first column, not the first row. If a CMatrix is stored directly into the constant UBO, the DP4 instructions will dot against columns instead of rows, producing a transposed transform.
The old GL driver transposes the matrix before storing it to ARB VP constant registers (line 304 of driver_opengl_uniform.cpp). The GL3 converter does the same: mat.transpose() before writing to the UBO.
Note that the inverseProjectionBasis field in the camera UBO is stored in CMatrix's native column-major order (via memcpy from CMatrix::get()), matching GLSL's column-major mat4 storage directly — so it is NOT transposed. Only matrices written to the nelvp constant register UBO for user consumption (where DP4 reads rows from consecutive registers) are transposed.
Symptom if missed: Geometry is wildly distorted — a spinning plane instead of the expected shape, with stretching toward infinity.
The EXP/EXPP instruction produces four different values per component:
dest.x = exp2(floor(src)); // 2^(integer part)
dest.y = fract(src); // fractional part
dest.z = exp2(src); // actual exponential
dest.w = 1.0;
This is frequently exploited: .y gives fract() for wrapping angles into [0,1), .x gives the integer power for range reduction. The Taylor series cosine approximation in the wobble example (and in vegetation wind animation) relies on extracting .y from EXP to wrap angles before computing the polynomial.
A naive GLSL translation that broadcasts exp2(src) to all components will produce correct .z values but wrong .y values, causing unbounded inputs to any subsequent polynomial or LUT indexing.
The same applies to LOG, which produces (floor(log2(|src|)), |src|/2^floor(log2(|src|)), log2(|src|), 1.0).
Symptom if missed: Geometry stretches to infinity as Taylor series or LUT inputs grow without bounds instead of being wrapped.
In nelvp, a 4-component source swizzle is valid regardless of the destination write mask width:
MUL R0.x, R0.xxxx, c[12].zzzz; # valid nelvp: scalar dest, 4-component sources
In GLSL, this is an implicit narrowing error — you cannot assign a vec4 expression to a single float component. The source swizzle must be truncated to match the write mask width:
R0.x = R0.x * c[12].z; // correct GLSL: scalar sources for scalar dest
For component-wise instructions (MOV, ADD, MUL, MAD, MIN, MAX), the converter truncates source swizzles to the number of components in the write mask.
For comparison instructions (SLT, SGE), single-component masks use scalar comparison with a ternary ((a < b) ? 1.0 : 0.0), while multi-component masks use GLSL's lessThan/greaterThanEqual with a vector cast.
For instructions with fixed per-component semantics (DST, LIT, EXP, LOG), the converter computes the full vec4 result into a temporary, then extracts through the write mask.
The nelvp constant registers are packed into a per-VP UBO as a vec4 c[N] array (std140 layout, UBO block name NlNelvpConstants), where N is the number of registers actually needed by the program. The UBO size is determined by NelvpRegisterCount in CProgramFeatures, which is set by one of two mechanisms:
Auto-detection (default): The converter scans every parsed instruction and tracks the highest constant register index referenced. Direct references like c[42] set the count to at least 43. Indexed access (c[A0.x + offset]) conservatively assumes the full 96 registers, since A0.x could address any register at runtime. The result is rounded up to a multiple of 4 for std140 alignment, with a minimum of 4.
Explicit override: The nelvp source can pre-set source->Features.NelvpRegisterCount before compilation. For example, meshvp_wind_tree.cpp sets it to 40 (covering c[0..23] base + c[24..38] lighting), reducing the UBO from 1536 bytes to 640 bytes. This is useful when the auto-detector would over-allocate (e.g. due to indexed access that only touches a known subrange).
The UBO is bound to UBBindingVertexProgram (GL binding point NL_USER_VERTEX_PROGRAM_BINDING, binding 4). The setUniform* family of functions intercepts writes when an nelvp-converted program is active: the index parameter is treated as a constant register index, and the value is written to the UBO at byte offset index * 16 instead of calling glProgramUniform*.
The inverseProjectionBasis matrix used by the ecPos synthesis epilogue is not stored in the nelvp constant register UBO. Instead, it is a field in the NlCamera UBO (binding point 0), shared by all programs that declare UsesCameraUBO = true. The converter sets this flag on all nelvp-converted programs, and the camera UBO header is automatically prepended to the generated GLSL source during compilation.
The driver computes inverseProjectionBasis = inv(_GLProjMat * _ChangeBasis) in stageCameraUBO() whenever the projection matrix, view matrix, or frustum changes. The camera UBO is uploaded to the GPU in uploadCameraUBO(), called from setupUniforms() when any active program uses it. This design avoids wasting nelvp constant register space on the inverse projection matrix and ensures the matrix is always in sync with the current camera state without requiring per-VP updates.
The getUniformIndex override supports two name resolution mechanisms:
ParamIndices (checked first): nelvp sources can declare named parameter mappings (e.g., "viewCenter" → 10, "fog" → 5) via CSource::ParamIndices. The converter copies these from the nelvp source to the generated GLSL source, and the driver stores them on CProgramDrvInfosGL3::NelvpParamIndices. When getUniformIndex("viewCenter") is called, it returns 10 — the constant register index.
"constantN" names (fallback): Returns the register index directly for names matching "constant0" through "constant95", since the UBO members are not individually queryable via glGetUniformLocation.
This two-tier lookup is critical for landscape, vegetation, water, and other engine systems that set uniforms by name rather than by raw register index. Without ParamIndices support, getUniformIndex returns ~0 for these names, causing assertion failures in buildInfo() and silent rendering failures.
isUniformProgramState)Each nelvp-converted vertex program has its own UBO. This means uniforms are per-program state — setting a uniform on one VP does not affect another VP's UBO, even if they share the same constant register layout.
The GL3 driver returns true from isUniformProgramState() (unlike the legacy ARB VP and D3D drivers which return false). Engine code that manages multiple vertex programs must check this flag and set uniforms on each VP separately when it returns true.
The landscape system is the primary consumer: it maintains separate VPs for near tiles (standard and lightmap variants), Far0, and Far1 LOD levels. When isUniformProgramState() is false (legacy drivers), setting uniforms on the Tile VP suffices — all VPs inherit the same driver state. When true (GL3), each VP's UBO must be populated independently.
Symptom if missed: Far landscape disappears entirely (MVP matrix is zero → all geometry projects to origin). Near landscape lightmaps are missing (lightmap VP has empty UBO).
To write nelvp programs that work on all drivers (legacy OpenGL, Direct3D, and GL3 via the converter):
v[0], not v0.c[0], not c0.o[HPOS], not oPos.R0, not r0.;.ARL (not MOV) to load the address register.NOP or macros.o[BFC0] or o[BFC1]; these are unsupported on many backends including GL3.v[7] unless you know it carries valid data (it is IEmpty in the nelvp model but tangent in GL3).LOG and EXP/EXPP, only rely on the .z component being consistent across backends.The Direct3D driver automatically rewrites the syntax at load time (e.g. v[0] -> v0, ARL -> MOV). The nelvp→GLSL converter handles all register and instruction translation to GLSL 330.
From bloom_effect.cpp. Transforms position to clip space, sets a constant color, and computes four offset texture coordinates for a blur pass.
!!VP1.0
MOV o[COL0].x, c[8].x;
MOV o[COL0].y, c[8].y;
MOV o[COL0].z, c[8].z;
MOV o[COL0].w, c[8].w;
MOV o[HPOS].x, v[OPOS].x;
MOV o[HPOS].y, v[OPOS].y;
MOV o[HPOS].z, v[OPOS].z;
MOV o[HPOS].w, c[9].w;
ADD o[TEX0], v[TEX0], c[10];
ADD o[TEX1], v[TEX0], c[11];
ADD o[TEX2], v[TEX0], c[12];
ADD o[TEX3], v[TEX0], c[13];
END
From meshvp_per_pixel_light.cpp. Computes the bitangent from the normal and tangent, transforms the light direction into tangent space for per-pixel normal mapping. Note that this program carries the tangent-space light vector in texture coordinate registers (o[TEX0]), not via a dedicated tangent varying; this is the nelvp idiom for tangent-space operations, and it works identically across all backends including the GL3 converter.
!!VP1.0
# Compute bitangent: B = N x T
MOV R6, v[2];
MUL R1, R6.yzxw, v[9].zxyw;
MAD R1, v[9].yzxw, -R6.zxyw, R1;
# Compute and normalize light direction L
ADD R2, c[4], -v[0]; # L = LightPos - VertexPos
DP3 R3, R2, R2;
RSQ R3, R3.x;
MUL R2, R3, R2;
# Transform L into tangent space [T B N]
DP3 o[TEX0].x, v[9], R2; # L dot T
DP3 o[TEX0].y, R1, R2; # L dot B
DP3 o[TEX0].z, R6, R2; # L dot N
END
From vegetable_manager.cpp. Uses EXP for fractional extraction and ARL for look-up table indexing to animate vegetation.
!!VP1.0
# Compute animation time: time * frequency + phase
MAD R0.x, c[17].x, v[9].z, v[9].y;
# Use EXP to get fractional part, scale to [0, 64[
EXP R0.y, R0.x;
MUL R0, R0.y, c[23].xyyy;
# Index into the 64-entry LUT
ARL A0.x, R0.x;
EXP R0.y, R0.x; # fractional interpolation factor
# Lookup and lerp from LUT
MAD R0.xy, R0.y, c[A0.x+32].zwww, c[A0.x+32].xyww;
# Apply wind bend with vertex bend factor (stored in v[0].w)
MAD R5, R0, v[0].w, v[0].xyzw;
# Renormalize and scale to original length (stored in v[9].x)
DP3 R0.x, R5, R5;
RSQ R0.x, R0.x;
MUL R0.x, R0.x, v[9].x;
MAD R5, R0.xxxw, R5, v[10];
# Transform to clip space (c[0]--c[3] = ModelViewProjection)
DP4 o[HPOS].x, c[0], R5;
DP4 o[HPOS].y, c[1], R5;
DP4 o[HPOS].z, c[2], R5;
DP4 o[HPOS].w, c[3], R5;
END
From nel/samples/3d/nelvp/main.cpp. Displaces sphere vertices along their normals using a two-wave cosine wobble (8th-order Taylor series with EXP for fract() extraction), then computes Blinn-Phong lighting for two colored point lights in eye space using LIT. Demonstrates the full vertex transform pipeline with setUniformMatrix for MVP/MV/normal matrices.
!!VP1.0
# --- Wave 1: spatial pattern = pos.x + pos.y + pos.z ---
DP3 R0.x, v[OPOS], c[15].zzzz; # sum position components
MAD R0.x, R0.x, c[12].w, c[12].x; # * phase_scale + time
MUL R0.x, R0.x, c[12].z; # * frequency
# Wrap angle to [-Pi, Pi] via fract()
MUL R0.x, R0.x, c[14].y; # / (2*Pi)
EXP R1, R0.x; # R1.y = fract
MAD R0.x, R1.y, c[14].x, -c[14].z; # fract * 2*Pi - Pi
# cos(x) via 8th-order Taylor series
MUL R1.x, R0.x, R0.x; # x^2
MUL R1.y, R1.x, R1.x; # x^4
MUL R1.z, R1.y, R1.x; # x^6
MUL R1.w, R1.z, R1.x; # x^8
MAD R0.y, R1.x, c[13].y, c[13].x; # 1 - x^2/2
MAD R0.y, R1.y, c[13].z, R0.y; # + x^4/24
MAD R0.y, R1.z, c[13].w, R0.y; # - x^6/720
MAD R0.y, R1.w, c[14].w, R0.y; # + x^8/40320
# (Wave 2 omitted for brevity -- uses pos.x - pos.z spatial pattern
# and a different frequency for temporal decoherence)
# ...
# Combine waves and displace along normal
MAD R0.y, R0.z, c[15].y, R0.y; # cos1 + 0.5*cos2
MUL R0.y, R0.y, c[12].y; # * amplitude
MOV R4, v[OPOS];
MOV R3, v[NRML];
MAD R4.xyz, R3, R0.y, R4; # displace xyz, preserve w=1
# Transform to clip space
DP4 o[HPOS].x, c[0], R4;
DP4 o[HPOS].y, c[1], R4;
DP4 o[HPOS].z, c[2], R4;
DP4 o[HPOS].w, c[3], R4;
# Eye-space normal (note: maskless DP3 initializes all 4 components)
DP3 R6, c[8], R3;
DP3 R6.y, c[9], R3;
DP3 R6.z, c[10], R3;
DP3 R6.w, R6, R6;
RSQ R6.w, R6.w;
MUL R6.xyz, R6, R6.w;
# Blinn-Phong per light (using LIT for diffuse+specular)
# L = normalize(lightPos - eyePos)
ADD R7, c[16], -R5;
DP3 R7.w, R7, R7;
RSQ R7.w, R7.w;
MUL R7.xyz, R7, R7.w;
DP3 R8, R6, R7; # NdotL (all components for LIT)
MAX R8.x, R8.x, c[15].x;
# H = normalize(L + V)
ADD R10, R7, R9;
DP3 R10.w, R10, R10;
RSQ R10.w, R10.w;
MUL R10.xyz, R10, R10.w;
DP3 R8.y, R6, R10; # NdotH
MOV R8.w, c[23].x; # specular power
LIT R11, R8; # (1, diff, spec, 1)
MOV R0, c[22]; # ambient
MAD R0.xyz, R11.y, c[17], R0; # += diffuse
MAD R0.xyz, R11.z, c[18], R0; # += specular
# (Light 1 follows same pattern...)
MOV o[COL0], R0;
END
Programs are typically embedded as C++ string literals and compiled at runtime:
#include "nel/3d/vertex_program_parse.h"
#include "nel/3d/vertex_program.h"
const char *src = "!!VP1.0\n DP4 o[HPOS].x, c[0], v[0]; ...\n END";
// Create a vertex program object (parses internally on driver activation)
CVertexProgram *vp = new CVertexProgram(src);
driver->activeVertexProgram(vp);
// Or parse manually for inspection:
CVPParser parser;
CVPParser::TProgram program;
std::string error;
if (!parser.parse(src, program, error))
nlwarning("VP parse error: %s", error.c_str());
// Dump back to text
std::string text;
CVPParser::dump(program, text);
// Check if a specific input attribute is used
bool usesNormal = CVPParser::isInputUsed(program, CVPOperand::INormal);