SAPHO - Scalable Architecture Processor for Hardware Optimization

flag SAPHO: an Introduction

The fixed architecture of currently available Processor Soft Cores (PSCs) is a common characteristic among them. This means that the same amount of hardware resources is allocated regardless of the embedded program. Furthermore, the word size is fixed and is typically oversized. This website presents a scalable soft-core processor developed at the Signal Processing and Instrumentation Core of the Engineering Faculty at UFJF (Federal University of Juiz de Fora).

Overview of SAPHO

SAPHO, which stands for Scalable Architecture Processor for Hardware Optimization, is an open-source soft-core processor designed to automatically allocate necessary hardware resources during the compile time of embedded programs. It is primarily used in multi-core processing frameworks and has been implemented in various established systems.

The functionality of SAPHO is based on its two compilers, accessible through its main Integrated Development Environment (IDE). The first compiler, C+−, is responsible for interpreting programs written in a subset of the C language and converting them into Assembly language. The second compiler takes the Assembly code generated by the first compiler and translates it into machine code, allocating the required hardware resources and generating a configuration file in Verilog.

The assembler first identifies the instructions produced by the user-written program and then creates only the hardware resources needed for execution. Most configurations, such as the development of internal circuits for the Arithmetic Logic Unit (ALU), are automated. Programmers can modify additional features, including word size, ALU type (fixed-point or floating-point), and the number of input and output ports, using compilation directives.

A key distinction between SAPHO and available PSCs lies in its ability to adjust its architecture based on the implemented program. However, it loses the ability to update the program during runtime. Nonetheless, in many current applications, the advantages of resource optimization justify this limitation.

memory

Description of SAPHO

SAPHO is based on a Harvard architecture and features a Reduced Instruction Set Computer (RISC) design, which allows it to determine the necessary hardware resources based on the user's application. The Arithmetic Logic Unit (ALU) within SAPHO can be configured for both fixed-point and floating-point operations, and its memory systems—both data and program—feature self-scaling addresses.

In addition to the processor itself, which includes its core and two memory units, several tools have been developed to facilitate programming. These tools include a development interface, a compiler that interprets a subset of the C language known as C+−, and an assembler.

Some of the advantages of using the SAPHO processor include:

Single Instruction Execution per Clock Cycle: it can execute one instruction per clock cycle, even in routines with conditional jumps, without disrupting the pipeline, thanks to its three-stage pipeline architecture.
Configurable ALU: the ALU can be adjusted for variable word sizes and can handle both fixed-point and floating-point operations.
Parameterized Resource Allocation: resources are allocated in a parameterized manner at design time, tailored to the specific application developed.
Programming Flexibility: users can program the processor using either a subset of the C language or directly in Assembly language.

Hardware Architecture of SAPHO

The primary blocks of the SAPHO architecture are illustrated in Figure 1, where solid lines indicate the flow of data, and dashed lines represent control signals. The light yellow blocks are instantiated automatically on demand, depending on the embedded program, while the light blue blocks are always instantiated. As both memory units are synchronous, the three pipeline stages required by the processor are depicted in Figure 2.

Logical-arithmetic instructions, such as addition, multiplication, or comparisons, typically require two arguments to execute. However, in SAPHO, the assembler is designed to operate with just one memory address, as the second parameter for the operation is the value previously stored in the main accumulator (ACC) from the output of the ALU.

Pointers for the data stack and instruction stack (Stack Pointer and Instruction Pointer) manage the read and write operations in their respective stacks, which are shared between the data and program memories. In the data memory, the stack is used for temporary data storage via PUSH and POP instructions and their variants. In the program memory, the stack enables the use of subroutines with CALL and RETURN instructions.

When the ALU is configured for floating-point operations, transformation circuits between fixed-point and floating-point representations are automatically instantiated on the input and output buses, allowing the processor to communicate with I/O devices using integers in two's complement notation.

Image: SAPHO Processor Block Diagram (1).

Image: diagram of pipeline stages executed by SAPHO (2).

Below you can see a more detailed explanation of each processor block shown in Figure 1.

Data and Program Memory

As previously mentioned, the sizes of the data and program memories are determined by the written code. The assembly compiler calculates the number of addresses required to store all instructions and variables from the program code, thus parameterizing the instance of the memories. The assembler generates the content of these memories and saves it in memory initialization files (with a .mif extension), which are instantiated in the FPGA along with the processor hardware.

Program Counter - PC

This block points to the location of the instruction in memory to be read from the program memory. During normal program execution, it is incremented after each instruction. When a jump instruction is detected, it is loaded with a specific value that refers to the next instruction. The size, in bits, of the PC is defined by the assembly compiler based on the size of the program memory.

Prefetch

While the processor executes the current instruction, it also fetches the next instruction from the program memory (ROM) (1st pipeline stage). It separates the opcode from the operand, which are concatenated in the ROM, and controls the instruction decoder and the program counter.

Instruction Decoder

The instruction decoder receives the opcode, interprets it, and sends the correct control signals, such as those for the ALU operation, enabling data output, assigning values in the Register File when arrays are used, enabling data writing to memory, and sending PUSH and POP signals to the Stack Pointer, among others.

Stack Pointer

There are two Stack Pointers: one for data and one for instructions (identified in Figure 16 as Instruction Pointer). They point to the top of the stack, where the data and program memory receive the highest addresses. Assembly instructions PUSH and POP allow adding and removing elements from the top of the stack. The return address for a function call made using the CALL instruction fills the instruction stack. The instruction stack and its corresponding stack pointer are only generated when the use of subroutines (CALL and RETURN instructions) is recognized.

Register File

The Register File is created whenever arrays are used in the user's program. Its function is to correctly index the elements of the array, offsetting the memory position of the first element to access the desired element's position.

Arithmetic Logic Unit (ALU)

The ALU of this soft-core processor offers a significant degree of automated parameterization flexibility. As a result, the processor's structure can be modified to suit its intended purpose. The ALU is automatically parameterized by the assembly compiler based on the instructions produced by the C+− compiler, in addition to parameters selected by the user at design time, such as fixed-point or floating-point arithmetic and the bit size of the word. The circuits created automatically by the assembler can be found in Table 2. The third column indicates which circuits are created within the ALU, while the others are created externally. The following columns show which circuits are created for fixed-point or floating-point processors.

Instruction	Circuit	ALU	Fixed Point	Floating Point
DIV	Division	X	X	X
OR	Bitwise OR	X	X
LOR	Logical OR	X	X
GRE	Greater than	X	X
MOD	Modulo	X	X	X
MLT	Multiplication	X	X	X
LES	Less than	X	X
EQU	Equal to	X	X
AND	Bitwise AND	X	X
LAN	Logical AND	X	X
INV	Bitwise NOT	X	X
LIN	Logical NOT	X	X
SHR	Shift Right	X	X
SHL	Shift Left	X	X
SRS	Arithmetic Shift	X	X
CALL	Stack Pointer	X
SRF	Indirect Addressing	X

Table: Instructions and respective circuits created automatically.

Software Overview of SAPHO

The SAPHO processor automatically generates Verilog code based on programs written in a language called C+−. This process is facilitated by an Integrated Development Environment (IDE) designed specifically for invoking two necessary compilers: the C+− compiler and the Assembly compiler. After invoking these compilers, the IDE produces parametrized hardware description code, with the data and program memories correctly initialized.

This streamlined approach allows developers to efficiently convert high-level C+− code into a low-level hardware description suitable for FPGA implementation, optimizing the process of hardware design and ensuring that the necessary memory structures are in place for the intended application.

integration_instructions Integrated Development Environment (IDE) Overview

The SAPHO processor's IDE, developed in C#, provides essential features for parameterization and compiler invocation. The main interface includes five key areas:

Menu Bar: offers options for creating new projects, managing the current project, or exiting the application.
Toolbar: located below the menu bar, it features quick-access icons for adding new processors and invoking the C+− and Assembly compilers.
Project Hierarchy Tree: displays all processors within the project and allows access to C+− and assembly files for each processor.
Programming Window: enables users to write and view code in C+− and Assembly, organized through open tabs.
Console Window: displays compilation messages, including errors and warnings, and shows the number of instructions and variables after successful compilation.

Creating a New Project

Upon opening the IDE, users must create a new project by navigating to File > New Project, where they can name the project and select a directory (avoiding spaces or special characters). Once created, the project appears in the hierarchy tree, and options for adding processors become available.

Adding a Processor in the IDE

After creating a project, users can add a processor by clicking the enabled button, which opens the processor configuration screen. This screen includes four main configuration fields:

General Settings: users provide a name for the processor, which will also be used for the generated .c and .asm files.
ALU Settings: users can choose the number of bits for fixed-point representation or specify the number of bits for the mantissa and exponent in floating-point representation.
Memory Stack Settings: the sizes of the Data and Instruction stacks are defined here if they are utilized.
Input/Output Settings: users establish the number of input and output addresses.

All configuration parameters are automatically written as compilation directives at the beginning of the .c file, serving as a header and influencing the hardware resources used.

Upon completing the compilation process (C+− followed by Assembly), two memory initialization files, data.mif and inst.mif, are created, containing the contents of the data and program memories, respectively. Additionally, a Verilog file is generated, encapsulating the parameters specified in the SAPHO IDE for hardware instantiation.

Figures illustrate the processor configuration screen, the project folder structure, and a flowchart detailing the project creation process in SAPHO.

Image: folder structure of a project in SAPHO (3).

Image: flowchart of the process of creating a project in SAPHO (4).

computer Compilers

C Compiler

The C compiler for the SAPHO processor is developed using GNU Flex and Bison, focusing on keyword identification and pattern recognition. It employs a subset of the C language, incorporating specific keywords and operators to enhance syntax friendliness. Notably, the SAPHO compiler introduces additional operators not found in standard C, aimed at optimizing signal processing algorithms. Key features include:

Single-file Program Structure: the program must be contained in one file and supports subroutines, enabling automatic instruction stack creation for function returns and recursive algorithms.
Pointer Limitations: pointers are restricted to indexing fixed-size one-dimensional arrays to facilitate prior memory size calculation.
No Dynamic Memory Allocation: this restriction ensures efficient hardware resource utilization. When arrays are declared, the compiler generates necessary indirect addressing circuits.

Supported functions, keywords, and operators are summarized in a table, showcasing the capabilities of the C compiler.

Category	Content
Functions	in(), out()
Keywords	int, float, void, return, while, if, else
Operators	− + * / <> ! » « ≥ ≤ == ≠ && \|\|

Assembler Compiler

The assembler compiler features 49 instructions and generates the Verilog file that describes the processor's hardware, alongside initialization files for the associated memories. Built with GNU Flex, it effectively recognizes opcodes and operands. Key capabilities include:

Instruction and Variable Recognition: it assesses the total number of program instructions and required variables to allocate the correct size for both program and data memory.

Proposed Floating-Point Format

In digital circuits, floating-point numbers are typically represented according to the IEEE 754 standard, using 32 bits for single precision and 64 bits for double precision. This standard includes special number representations and defines arithmetic operations.

The SAPHO processor adopts a simplified floating-point representation, designed for better hardware efficiency and data flow. This format maintains the essential structure of IEEE 754, consisting of a sign bit (S), mantissa (M), and exponent (E). However, the exponent is represented using two's complement, facilitating hardware implementation. This representation is described mathematically as follows:

(- 1)^{S} \times M \times 2^{E}

This approach allows for a flexible range of mantissa and exponent values, optimizing the processor's performance in handling floating-point arithmetic.

slideshow Media & Resources

For more detailed insights into SAPHO, watch the videos above or explore additional articles and resources below.

science References

This section includes resources on processor architecture, advancements in fast computation, and research articles for in-depth study:

hardwareIntel Research - Processor and Architecture
memoryARM - Processor Architecture
speedNature - Quantum Computing Breakthroughs
auto_graphACM Journal on Computing Architectures
syncTom's Hardware - Understanding CPUs
trending_upIEEE - Computer Architecture
insightsarXiv - Latest in Computer Architecture
tuneACM - Fast Computation Techniques
developer_modeHPC Wire - High-Performance Computing
settings_suggestWikipedia - Instruction Set Architecture