Very long instruction word

Very Long Instruction Word or VLIW which refers to a CPU architecture designed to take advantage of instruction level parallelism (ILP) but at minimum level of hardware complexities. ( Alternatively, Variable Length Instruction Word or VLIW a refers to a CPU instruction ( instruction set ) designed to load ( or copy ) a literal value count of inline Machine code to the on-chip RAM for higher speed CPU decoding. )

A processor that executes every instruction one after the other (i.e. a non-pipelined scalar architecture) may use processor resources inefficiently, leading to poor performance. The performance can be improved by using micro-architectural design techniques that use ILP including:

Instruction pipelining where the execution of multiple instructions can be partially overlapped; where each instructions is divided into series of sub-steps (termed: micro-operations).
Superscalar execution in which multiple execution units are used to execute multiple instructions in parallel.
Out-of-order execution where instructions execute in any order but without violating data dependencies.
Register renaming which is a technique used to avoid unnecessary serialization of program instructions caused by the reuse of registers by those instructions, in order to enable out-of-order execution.
Speculative execution which allow the execution of complete instructions or parts of instructions before being sure whether this execution is required.
Branch prediction which is used to avoid delays (termed: stalls) cause of control dependencies to be resolved. Branch prediction is used with speculative execution.

All above ILP techniques are implemented at a higher cost with increased hardware complexity. Before executing any operations in-parallel, the processor must verify that the instructions do not have interdependencies. There are many types of interdependencies, but a simple example would be a program in which the first instruction's result is used as an input for the second instruction. They clearly cannot execute at the same time, and the second instruction cannot be executed before the first. Modern out-of-order processors use major resources in order to take advantage of these techniques, since the scheduling of instructions must be determined dynamically as a program executes based on dependencies.

The VLIW approach, on the other hand, executes operation in parallel based on a fixed schedule determined when programs are compiled. Since determining the order of execution of operations (including which operations can execute simultaneously) is handled by the compiler, the processor does not need the complex hardware required by ILP techniques described above. As a result, VLIW CPUs offer significant computational power with less hardware complexity but with greater compiler design complexity.

The VLIW approach is a concept which is only useful as the code generated by a compiler makes it, but with a number of special-purpose instructions available to simplify certain complicated operations:

In superscalar designs, the number of execution units is invisible to the instruction set. Each instruction encodes only one operation. For most superscalar designs, the instruction width is 32 bits or less.
In contrast, one VLIW instruction encodes multiple operations; specifically, one instruction encodes at least one operation for each execution unit of the device. For example, if a VLIW device has five execution units, then a VLIW instruction for that device would have five operation fields, each field specifying what operation should be done on that corresponding execution unit. In order to find a space for these operation fields, VLIW instructions are usually at least 64-bits in width and on some architectures 128-bits or wider; this is how the name comes.