The Journal of
Instruction-Level Parallelism |
Description
of the Simulation Infrastructure The
provided evaluation framework includes a set of traces, and a driver that
reads traces and simulates the behavior of a branch predictor. The framework models
a simple out-of-order core with the following basic parameters: o 256-entry reorder buffer, and
three schedulers: an interger scheduler with 64 entries and an FP and load/store schedulers with 32-entries each. o The pocessor has a 14-stage,
4-wide pipeline except in the execution stage where it has a 12-wide
execution scheduler (6 int, 4 FP, and. 2 load.store). o The memory model will consist of a 2-level cache
hierarchy, consisting of an L1 split instruction and data caches, and an L2 last
level cache. All caches support 64-byte lines. The L1 instruction cache is
32KB 8-way set associative cache. The L1 data cache is 32KB 8-way
set-associative. The L2 data cache is a 4 MB, 8-way set-associative cache. Traces The
trace set includes 40 traces, classified into 5 categories: CLIENT, INT
(Integer), MM (Multimedia), SERVER and WS (Workstation). Traces
are approximately 50 million micro-ops long and will include both user and
system activity. The traces include both value and timing
information of each micro-op from a detailed out-of-order timing simulator.
The timing simulator is configured with perfect
branch prediction so that there are no wrong path micro-ops in the traces. In
order to achieve maximum transparency, all traces are
provided to contestants, and will be used to rank the contestants and
crown the champion. Driver The
driver will read a trace and call the branch predictor through a standard
interface. The predictor can decide when and what predictions to
provide to the driver. The driver will record whether the predictor was
correct and when the prediction was provided. Then a
misprediction penalty value is
calculated for each branch. The misprediction
penalty is measured by the number of cycles that the
fetch unit was on wrong path. At the end of the run, the driver will provide
two final scores (condition branches and indirect branches) of a predictor
represented in Misprediction Penalty per Kilo
Instructions (MPPKI). The framework will include an example predictor to help
guide contestants. The
driver will provide the predictor with both static and dynamic information
about the instructions and micro-ops in the trace. A static instruction
includes one or more static micro-ops. For each micro-op, static information:
the instructions program counter, micro-ops type (BR_CONDITIONAL,
BR_INDIRECT, BR_CALL, LOAD, STORE,etc.), and register source and destination specifiers will be passed to the predictor. Dynamic
information: results, load and store addresses, and branch outcomes will be made available on different pipeline stages.
The
organizers believe that this framework allows the implementation of most
published predictor algorithms in addition to providing some room for
innovation. We cannot provide training and reference traces needed for
profile-based predictors and the traces do not contain wrong path
information, which, unfortunately, excludes some predictors. |
|