flag SAPHO: an Introduction
The fixed architecture of currently available Processor Soft Cores (PSCs) is a common characteristic among them. This means that the same amount of hardware resources is allocated regardless of the embedded program. Furthermore, the word size is fixed and is typically oversized. This website presents a scalable soft-core processor developed at the Signal Processing and Instrumentation Core of the Engineering Faculty at UFJF (Federal University of Juiz de Fora).
Overview of SAPHO
SAPHO, which stands for Scalable Architecture Processor for Hardware Optimization, is an open-source soft-core processor designed to automatically allocate necessary hardware resources during the compile time of embedded programs. It is primarily used in multi-core processing frameworks and has been implemented in various established systems.
The functionality of SAPHO is based on its two compilers, accessible through its main Integrated Development Environment (IDE). The first compiler, C+−, is responsible for interpreting programs written in a subset of the C language and converting them into Assembly language. The second compiler takes the Assembly code generated by the first compiler and translates it into machine code, allocating the required hardware resources and generating a configuration file in Verilog.
The assembler first identifies the instructions produced by the user-written program and then creates only the hardware resources needed for execution. Most configurations, such as the development of internal circuits for the Arithmetic Logic Unit (ALU), are automated. Programmers can modify additional features, including word size, ALU type (fixed-point or floating-point), and the number of input and output ports, using compilation directives.
A key distinction between SAPHO and available PSCs lies in its ability to adjust its architecture based on the implemented program. However, it loses the ability to update the program during runtime. Nonetheless, in many current applications, the advantages of resource optimization justify this limitation.
Description of SAPHO
SAPHO is based on a Harvard architecture and features a Reduced Instruction Set Computer (RISC) design, which allows it to determine the necessary hardware resources based on the user's application. The Arithmetic Logic Unit (ALU) within SAPHO can be configured for both fixed-point and floating-point operations, and its memory systems—both data and program—feature self-scaling addresses.
In addition to the processor itself, which includes its core and two memory units, several tools have been developed to facilitate programming. These tools include a development interface, a compiler that interprets a subset of the C language known as C+−, and an assembler.
Some of the advantages of using the SAPHO processor include:
- Single Instruction Execution per Clock Cycle: it can execute one instruction per clock cycle, even in routines with conditional jumps, without disrupting the pipeline, thanks to its three-stage pipeline architecture.
- Configurable ALU: the ALU can be adjusted for variable word sizes and can handle both fixed-point and floating-point operations.
- Parameterized Resource Allocation: resources are allocated in a parameterized manner at design time, tailored to the specific application developed.
- Programming Flexibility: users can program the processor using either a subset of the C language or directly in Assembly language.
Hardware Architecture of SAPHO
The primary blocks of the SAPHO architecture are illustrated in Figure 1, where solid lines indicate the flow of data, and dashed lines represent control signals. The light yellow blocks are instantiated automatically on demand, depending on the embedded program, while the light blue blocks are always instantiated. As both memory units are synchronous, the three pipeline stages required by the processor are depicted in Figure 2.
Logical-arithmetic instructions, such as addition, multiplication, or comparisons, typically require two arguments to execute. However, in SAPHO, the assembler is designed to operate with just one memory address, as the second parameter for the operation is the value previously stored in the main accumulator (ACC) from the output of the ALU.
Pointers for the data stack and instruction stack (Stack Pointer and Instruction Pointer) manage the read and write operations in their respective stacks, which are shared between the data and program memories. In the data memory, the stack is used for temporary data storage via PUSH and POP instructions and their variants. In the program memory, the stack enables the use of subroutines with CALL and RETURN instructions.
When the ALU is configured for floating-point operations, transformation circuits between fixed-point and floating-point representations are automatically instantiated on the input and output buses, allowing the processor to communicate with I/O devices using integers in two's complement notation.
Image: SAPHO Processor Block Diagram (1).
Image: diagram of pipeline stages executed by SAPHO (2).
Below you can see a more detailed explanation of each processor block shown in Figure 1.
-
arrow_drop_downData and Program Memory
-
arrow_drop_downProgram Counter - PC
-
arrow_drop_downPrefetch
-
arrow_drop_downInstruction Decoder
-
arrow_drop_downStack Pointer
-
arrow_drop_downRegister File
-
arrow_drop_downArithmetic Logic Unit (ALU)
Instruction | Circuit | ALU | Fixed Point | Floating Point |
---|---|---|---|---|
DIV | Division | X | X | X |
OR | Bitwise OR | X | X | |
LOR | Logical OR | X | X | |
GRE | Greater than | X | X | |
MOD | Modulo | X | X | X |
MLT | Multiplication | X | X | X |
LES | Less than | X | X | |
EQU | Equal to | X | X | |
AND | Bitwise AND | X | X | |
LAN | Logical AND | X | X | |
INV | Bitwise NOT | X | X | |
LIN | Logical NOT | X | X | |
SHR | Shift Right | X | X | |
SHL | Shift Left | X | X | |
SRS | Arithmetic Shift | X | X | |
CALL | Stack Pointer | X | ||
SRF | Indirect Addressing | X |
Table: Instructions and respective circuits created automatically.
Software Overview of SAPHO
The SAPHO processor automatically generates Verilog code based on programs written in a language called C+−. This process is facilitated by an Integrated Development Environment (IDE) designed specifically for invoking two necessary compilers: the C+− compiler and the Assembly compiler. After invoking these compilers, the IDE produces parametrized hardware description code, with the data and program memories correctly initialized.
This streamlined approach allows developers to efficiently convert high-level C+− code into a low-level hardware description suitable for FPGA implementation, optimizing the process of hardware design and ensuring that the necessary memory structures are in place for the intended application.