PROGRAMMABLE DIGITAL SIGNAL PROCESSORS
P.D. Sawaant
1
Contents Commercial Digital signal-processing Devices Architecture of TMS320C67XX Processors
P.D. Sawaant
2
Commercial Digital signal-processing Devices Basic versions of most several families of commercial DSP devices have Harvard architecture A single-cycle hardware multiplier. An Address Generation unit with dedicated Address s. Special addressing modes. On-chip peripherals interfaces. Three most popular ones are those from Texas Instruments, Motorola, and Analog Devices. Texas Instruments was one of the first to come out with a commercial programmable DSP with the introduction of its TMS32010 in 1982. P.D. Sawaant
3
Summary of the Architectural Features of three fixed-Points DSPs
P.D. Sawaant
4
Summary of the Architectural Features of three fixed-Points DSPs
P.D. Sawaant
5
Architecture of TMS320C67xx digital signal processors The TMS320C6x are the first processors to use velociTI architecture, having implemented the VLIW architecture. The TMS320C67x is a 32 bit floating point DSP processor. The C6713 DSK is a low-cost standalone development platform that enables s to evaluate and develop applications for the TI C67xx DSP family. Its Architecture is similar to the fixed point ‘C62x’ and ‘C64x’ processors.
‘C67X’ extends the ’C62x’ instruction set to floating point arithmetic. Therefore ‘C67X’ is upward compatible with ‘C64x’ ‘C67X’ processor is particularly used in application that require high precision, a
large dynamic range and intensive computation such as radar, sonar,3-D graphics , wireless base stations, digital scriber loops and medical imaging.
P.D. Sawaant
6
Architecture of TMS320C67xx digital signal processors In General the TMS320C6x devices execute up to eight 32-bit instructions per cycle. The ‘67x devices core consist of ‘C6x U which has following features.
TMS320C67xx Features. Program fetch unit, Instruction dispatch unit & Instruction decode unit. Two data paths, each 32-bit wide and with four functional units . The functional units consist of 2 multiplier and 6 ALUs. Thirty-two 32-bit s. Control s. And Control logic. Test, Emulation, and Interrupt logic. Parallel execution of eight instructions. 8/16/32-bit data , providing efficient memory for a variety of applications. 40-bit arithmetic options add extra precision for computationally intensive applications. P.D. Sawaant
7
Architecture of TMS320C67xx digital signal processors
P.D. Sawaant
8
Architecture of TMS320C67xx digital signal processors Central Processing Unit (U). Consist of Program fetch unit, instruction dispatch unit, instruction decode unit. It fetches a (VLIW) (256 bits wide) to supply up to eight 32-bit instructions to the eight functional units during every clock cycle. The VLIW architecture features that all eight units do not have to be supplied with instructions if they are not ready to execute. The first bit of every 32-bit instruction determines if the next instruction belongs to the same execute packet as the previous instruction, or whether it should be executed in the following clock as a part of the next execute packet. Fetch packets are always 256 bits wide; however, the execute packets can vary in size. The variable-length execute packets are a key memorysaving feature, distinguishing the C67x U from other VLIW architectures. P.D. Sawaant
9
TMS320C67xx DSP (data path) Central Processing Unit. The U also contains two data paths (Containing s A and B respectively) in which the processing takes place. Each data path has four functional units (.L, .M, .S and .D). The functional units execute logic, multiply, shifting and data address operation. Figure 3.2 shows the simplified block diagram of the two data paths.
Simplified block diagram of the two data paths.
P.D. Sawaant
10
Architecture of TMS320C67xx digital signal processors Central Processing Unit. All instructions except loads and stores operate on the . All data transfers between the files and memory take place only through two data-addressing units (.D1 and .D2).
P.D. Sawaant
11
TMS320C67xx DSP (General Purpose Files)
Central Processing Unit.(cont..) The U contains two general purpose files A and B. These can be used for data or as Data Address Pointers. Each file contains sixteen 32-bit s (A0-A15 for file A and B0B15 for file B). The s A0, A1, A2, B0, B1, B2 can also be used as condition s. The s A4-A7 and B4-B7 can be used for circular addressing. These s provide 32-bit and 40-bit fixed-point data. The 32-bit data can be stored in any . For 40-bit data, processor stores LSB 32 bits in an even and remaining 8 bits in upper (odd) .
P.D. Sawaant
12
TMS320C67xx DSP (Functional Units) Central Processing Unit.(cont..) The U features two sets of functional units. Each set contains four units and a file. One set contains functional units .L1, .S1, .M1, and .D1; the other set contains units .D2, .M2, .S2, and .L2. The two files each contain sixteen 32-bit s for a total of 32 general-purpose s. Each functional unit has two 32-bit read ports for source operands and one 32-bit write port into a general purpose file.
P.D. Sawaant
13
TMS320C67xx DSP (Functional Units) Central Processing Unit.(cont..) The functional units . L1, .S1, .M1, and .D1 write to file A and the functional units .L2, .S2, .M2, and .D2 write to file B. As each unit has its own 32-bit write port, all eight ports can be used in parallel in every cycle.
The .L, .S, and .M functional units are ALUs. They perform 32-bit/40bit arithmetic and logical operations. .S unit also performs branching operations and .D units perform linear and circular address calculations. Only .S2 unit performs accesses to control file.
P.D. Sawaant
14
TMS320C67xx DSP (Functional Units)
P.D. Sawaant
15
ARCHITECTURE OF TMS320C6713 DSP PROCESSOR
P.D. Sawaant
16
Bus Structure
TMS320C67xx DSP
The C67xx P-DSP architecture is built around eight major 32-bit buses (four program/data buses and four address buses): The program bus (PB) carries the instruction code and immediate operands from program memory. Three data buses (CB, DB, and EB) interconnect to various elements, such as the U, data address generation logic, program address generation logic, on-chip peripherals, and data memory. The CB and DB carry the operands that are read from data memory. The EB carries the data to be written to memory. Four address buses (PAB, CAB, DAB, and EAB) carry the addresses needed for instruction execution. C67XX can generate up to two data-memory addresses per cycle using two auxiliary unit (ARAU0 and ARAU1) in DAGEN block. P.D. Sawaant
17
TMS320C67xx DSP (Memory System) The memory of the TMS320C671x series processor implements a modified Harvard architecture, providing separate address spaces for instruction and data memory. The processor uses a two-level cache-based architecture and has a powerful and diverse set of peripherals. The Level 1 program cache (L1P) is a 4K-byte direct-mapped cache and the Level 1 data cache (L1D) is a 4K-byte 2-way set-associative cache. The Level 2 memory/cache (L2) consists of a 256K-byte memory space that is shared between program and data space. 64K bytes of the 256K bytes in L2 memory can be configured as mapped memory, cache, or combinations of the two. The remaining 192K bytes in L2 serve as mapped SRAM.
P.D. Sawaant
18
TMS320C67xx DSP (Peripherals of TMS320C6713) The TMS320C67x devices contain peripherals for communication with off-chip memory, co-processors, host processors and serial devices. The following subsections discuss the peripherals of ‘C6713 processor. Enhanced DMA (EDMA) Host Port Interface (HPI)
External Memory Interface (EMIF) Multichannel Buffered Serial Port (McSP) Timers Multichannel Audio Serial Port (McASP) Power Down Logic(PDL) P.D. Sawaant
19