US20040263524A1 - Memory command handler for use in an image signal processor having a data driven architecture - Google Patents

Memory command handler for use in an image signal processor having a data driven architecture Download PDF

Info

Publication number
US20040263524A1
US20040263524A1 US10/609,042 US60904203A US2004263524A1 US 20040263524 A1 US20040263524 A1 US 20040263524A1 US 60904203 A US60904203 A US 60904203A US 2004263524 A1 US2004263524 A1 US 2004263524A1
Authority
US
United States
Prior art keywords
memory
cluster communication
memory address
data
image signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US10/609,042
Other versions
US7088371B2 (en
Inventor
Louis Lippincott
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corp filed Critical Intel Corp
Priority to US10/609,042 priority Critical patent/US7088371B2/en
Assigned to INTEL CORPORATION reassignment INTEL CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: LIPPINCOTT, LOUIS A.
Priority to MYPI20041954A priority patent/MY137269A/en
Priority to AT04754790T priority patent/ATE470189T1/en
Priority to PCT/US2004/018291 priority patent/WO2005006207A2/en
Priority to JP2006515360A priority patent/JP4344383B2/en
Priority to EP04754790A priority patent/EP1639495B1/en
Priority to KR1020057024970A priority patent/KR100818819B1/en
Priority to DE602004027493T priority patent/DE602004027493D1/en
Priority to TW093117042A priority patent/TWI269168B/en
Publication of US20040263524A1 publication Critical patent/US20040263524A1/en
Publication of US7088371B2 publication Critical patent/US7088371B2/en
Application granted granted Critical
Adjusted expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining

Definitions

  • Embodiments of the invention relate to the field of image processing. More particularly, embodiments of the invention relate to a memory command handler for use in an image signal processor having a data driven architecture.
  • image processing is the application of signal processing techniques to the domain of the two-dimensional images such as photocopies, scanned images, photographs, video, etc.
  • image processing involves analyzing and manipulating images and is performed utilizing electronic means.
  • Image processing generally involves three operations: importing an image with a device such as an optical scanner or directly through some other type of device (e.g. digital camera, digital camcorder, etc.); manipulating or analyzing the image in some way; and lastly outputting the result.
  • the image is manipulated or analyzed by electronic means in a particular or pre-programmed way.
  • techniques such as image enhancement and data compression may be utilized.
  • the image may be analyzed or enhanced to find patterns that are not visible to the human eye.
  • the analysis of a picture may utilize techniques that can identify structures, shades, colors and relationships that cannot be perceived by the human eye.
  • image processing techniques can be utilized to identify problems, such as in forensic medicine.
  • image processing techniques may be used in creating weather maps from satellite pictures. More generally, image processing may deal with images in bitmapped graphics formats that have been scanned in, or captured with digital cameras or the like, that may then be analyzed, manipulated, enhanced, etc. Also, often an image is improved by image processing techniques such as by refining a degraded image that has been previously scanned or entered from a video source.
  • FIG. 1 is a block diagram illustrating an example of a system configuration for use in image processing, according to one embodiment of the present invention.
  • FIG. 2 is a block diagram illustrating an example of architecture for an image signal processor (ISP), according to one embodiment of the present invention.
  • ISP image signal processor
  • FIG. 3 is a block diagram illustrating an example of a shared memory including a plurality of cluster communication registers (CCRs), according to one embodiment of the present invention.
  • CCRs cluster communication registers
  • FIG. 4 is a block diagram illustrating a memory command handler, according to one embodiment of the present invention.
  • FIG. 5 illustrates an example of memory address generators (MAGs) mapped to cluster communication registers (CCRs), according to one embodiment of the present invention.
  • FIG. 6 is a block diagram showing a variety of registers that may be utilized within a memory address generator (MAG), according to one embodiment of the present invention.
  • MAG memory address generator
  • FIG. 7 lists an example of commands that may be supported by each of the memory address generators (MAGs), according one embodiment of the present invention.
  • FIG. 8 is a diagram illustrating one example of a method of encoding commands, according to one embodiment of the present invention.
  • Embodiments of the invention relate to the field of image processing. More particularly, embodiments of the invention relate to a memory command handler (MCH) for use in an image signal processor having a data driven architecture. Even more particularly, embodiments of the invention relate to an image signal processor (ISP) for use in an image processor having a memory command handler (MCH) that acts as an address generator device to handle many different image processing tasks and to control a large number of parameters utilizing single commands. Further, the memory command handler includes powerful addressing capabilities by utilizing a plurality of memory address generators (MAGs), which allow for the automatic feeding of data held in, for example, two dimensional sub-sampled arrays.
  • MAGs memory address generators
  • the memory command handler using a data driven architecture in conjunction with its powerful addressing capabilities may be used to control parameters in the memory address generators to efficiently manage image processing tasks. For example, typically, in data flow applications, such as image processing tasks, a very small set of instructions may be efficiently utilized to operate on large data streams.
  • Embodiments of the invention relate to optimized architectures for maximizing the efficiency of image processing applications.
  • FIG. 1 is a block diagram illustrating an example of a system configuration 100 for use in image processing, according to one embodiment of the present invention.
  • the system configuration 100 includes at least one image processor 102 having a plurality of individual image signal processors (ISPs) 104 coupled to one another.
  • image processors typically include a number of individual image signal processors.
  • the image processor 102 implements, a data-driven, shared register architecture, as discussed below.
  • the data paths between the image signal processors (ISPs) 104 may be 16-bits wide and may operate at a core frequency of approximately 266 MHz.
  • ISPs image signal processors
  • the system configuration 100 is optimized to provide maximum performance on these types of image processing data flow applications.
  • the image processor 102 may include eight image signal processors 104 which are connected to each other through programmable ports.
  • the programmable ports may be quad ports.
  • the system configuration 100 is highly scalable and programmable to perform image processing tasks on an input pixel stream 105 from other devices (e.g. other image processing chips or devices capable of producing an input pixel stream).
  • memory devices 106 and 107 may be coupled to the image processor 102 .
  • the memory devices 106 and 107 may be Double Data Rate (DDR) Synchronous Dynamic Random Access Memories (SDRAMs).
  • DDR Double Data Rate
  • SDRAMs Synchronous Dynamic Random Access Memories
  • these dual DDR SDRAM devices may provide more than 1 Gbyte per second of data movement bandwidth. This data flow bandwidth may be balanced with the processing performance of the image signal processors 104 .
  • the system configuration 100 includes at least one host processor 111 , input/output (I/O) interfaces 133 , and a network interface 134 coupled to the image processor 102 by a bus 103 .
  • the bus 103 may be a peripheral component interface (PCI) type of bus.
  • System memory devices 113 may also be coupled to processor 111 .
  • the system configuration 100 may include additional components (not shown) such as co-processors, modems, etc.—this being only a very basic example of a system configuration.
  • the host processor 111 may be coupled to the image processor 102 and the other components previously described by bus 103 .
  • processor or “CPU” refers to any machine that is capable of executing a sequence of instructions and shall be taken to include, but not limited to, general purpose microprocessors, special purpose microprocessors, application specific integrated circuits (ASICs), multi-media controllers, microprocessors, microcontrollers, etc.
  • the processor 111 may be a general-purpose microprocessor that is capable of executing an INTEL ARCHITECTURE instruction set.
  • the processor 111 can be one of the PENTIUM classes of processors or one of the CELERON classes of processors.
  • the memory devices 106 and 107 can include any memory device adapted to store digital information, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), and double data rate (DDR) SDRAM or DRAM, etc.
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • DDR double data rate SDRAM or DRAM
  • the memory devices 106 and 107 and the system memory device 113 include volatile memory.
  • memory devices 106 and 107 and system memory device 113 can also consist of or include non-volatile memory such as read-only memory (ROM).
  • the system configuration 100 may include I/O interfaces and ports 133 to interface with I/O devices 139 .
  • I/O interfaces 133 include, but are not limited to, PCI slots, PCI agents, universal serial bus (USB) interfaces, Institute of Electrical Electronics Engineering (IEEE) 1394 interfaces, parallel port interfaces, phone interfaces, integrated drive electronic (IDE) interfaces (e.g. for a hard drive), high-speed serial interfaces, as should be appreciated by those of skill in the art. It should also be appreciated that there are a wide variety of different types of I/O devices that may be connected through suitable I/O interfaces to the system configuration 100 .
  • I/O devices may include any I/O device to perform suitable I/O functions.
  • I/O devices may include a monitor, a keypad, a modem, a printer, storage devices (e.g. Compact Disk Rom (CD ROM), Digital Video Disk (DVD), hard drive, floppy drive, etc.) or any other types of I/O devices (e.g., input devices, mouse, trackball, pointer device), media cards (e.g. audio, video, graphics, etc.).
  • I/O devices may include phones, fax machines, scanners, photocopy machines, digital copiers, video cameras, digital cameras, multi-function peripherals, as well as other types of devices suitable for coupling to an image processor.
  • a network interface 134 may be provided to couple the system configuration 100 to a network 140 .
  • the network interface 134 is provided to communicate with a network 140 (e.g. a Local Area Network (LAN), a Wide Area Network (WAN), the Internet, etc.) using a standard and suitable network protocol, as are known in the art.
  • a network 140 e.g. a Local Area Network (LAN), a Wide Area Network (WAN), the Internet, etc.
  • the image processor 102 of exemplary system configuration 100 may utilize the external host processor 111 and bus 103 for downloading microcode, register configuration, register initialization, interrupt servicing and for uploading and downloading image data for image processing.
  • the data read from any of the interfaces described previously may be processed by the image signal processors 104 of the image processor 102 .
  • the image signal processors 104 may be connected to each other in a shared mesh topology through quad ports to facilitate rapid and flexible movement of data across the image processor 102 .
  • the image processor 102 may include eight image signal processors (ISPs) 104 .
  • ISPs image signal processors
  • a global bus 123 connects to the PCI bus 103 and is coupled to all of the image signal processors (ISPs) 104 of the image processor 102 , as well as all the other functional units (not shown) of the image processor 102 .
  • the various units and registers of the image processor 102 may be set up and controlled through the global bus 123 .
  • the global bus 123 is used to read and write the configuration registers of each ISP.
  • the global bus is 16-bits and has a 16-bit data bus and an 8-bit address bus and conveys interrupt status to the PCI bus 103 .
  • FIG. 2 is a block diagram illustrating an example of architecture for an image signal processor (ISP) 200 , according to one embodiment of the present invention.
  • the ISP 200 includes an input programming element (Input PE) 202 , an output programming element (Output PE) 204 , a first Multiply-Accumulate programming element (MACPE) 206 , a second MACPE 208 , a general purpose programming element (GPE) 210 , first and second accelerator units 214 and 216 , respectively, and a memory command handler 220 , all of which are coupled to a shared memory 230 .
  • Input PE input programming element
  • Output PE output programming element
  • MACPE Multiply-Accumulate programming element
  • GPE general purpose programming element
  • the shared memory 230 may include a plurality of cluster communication registers (CCRs). In an even more particular embodiment, each cluster communication register may be 16-bits wide.
  • the cluster communication registers (CCRs) 230 of the image signal processor 200 are used to store and transfer data between the processing elements, as is discussed below in more detail.
  • the input and output programming elements 202 and 204 are configured to route data through the image signal processor 200 .
  • the image signal processor 200 may include data storage memory 222 managed by the memory command handler 200 to maximize bandwidth access efficiency, as discussed below.
  • each programming element 202 , 204 , 206 , 208 , and 210 includes local registers (e.g. sixteen 16-bit local registers). Both the cluster communication registers 230 and the other local registers can be used in image processing applications, for example, on either 16-bit images or dual 8-bit images. All of the processing elements may run concurrently and implement a common base line instruction set that consists of flow control instructions and arithmetic logic unit (ALU) instructions, as discussed below. Particularly, the MAC programming elements 206 and 208 respectively, may implement additional multiply-accumulate instructions and the general purpose programming element implement 210 may implement additional bit-rotation instructions.
  • ALU arithmetic logic unit
  • the performance of the programming elements combined with the programmability of the image signal processor (ISP) 200 allows a programmer to develop and tune algorithms rapidly for optimum performance.
  • the ISP 200 may include dedicated hardware accelerators 214 and 216 .
  • these accelerator units 214 and 216 may include Huffman encoder/decoder accelerators, G4 engine accelerators, Fast 2D triangular filter accelerators, etc.
  • each programming element 202 , 204 , 206 , 208 , and 210 may implement a data flow/data driven architecture to implement image processing functions.
  • each programming element 202 , 204 , 206 , 208 , and 210 includes an instruction memory 240 to hold instructions.
  • the instruction memory 240 may be a 128-instruction memory.
  • an instruction set may be used that consists of three or four tight loops in addition to data flow and arithmetic instructions.
  • each programming element may have local registers (e.g.
  • the cluster communication registers 230 are used to exchange data between the programming elements. Particularly, in one embodiment, data passing through the cluster communication registers 230 may be tagged with data valid (DV) bits.
  • the data valid bits may be used to establish the ownership of the data storage resource and may establish one or more consumers of the data as discussed below.
  • the image processor 102 of the system configuration 100 includes a plurality of image signal processors (ISPs) 200 (FIG. 2).
  • the image processor includes eight image signal processors (ISPs) 200 .
  • ISPs image signal processors
  • programming elements 202 , 204 , 206 , 208 , and 210 and hard-wired accelerators 214 and 216 are interconnected through a shared memory 230 , which, in one embodiment, may be implemented as cluster communication registers (CCRs) (e.g. 16-bit registers).
  • CCRs cluster communication registers
  • the five programming elements in the image signal processor 200 communicate with each other through the cluster communication registers 230 .
  • the cluster communication registers 230 are the only mechanism by which the five programming elements in the image signal processor 200 can communicate with each other.
  • the memory command handler 220 is used to manage the data flow to the programming elements.
  • each programming element has its own set of local registers as well as operating in conjunction with the cluster communication registers 230 .
  • Both the local registers and the cluster communication registers may be 16-bits wide and can be used for either 16-bit operands or two 8-bit operands.
  • each programming element may be designed for a basic set of instructions. In one embodiment, the basic set of instructions may be divided into flow control instructions that support flow control, ALU instructions that support arithmetic and logic functions, as well as custom instructions.
  • the MAC programming elements (MACPEs) 206 and 208 have multiply accumulate instructions and the general purpose programming element (GPE) 210 may include bit-rotation instructions.
  • GPE general purpose programming element
  • all of the programming elements 202 , 204 , 206 , 208 , and 210 support all of the flow control instructions and ALU instructions.
  • Examples of flow control instructions include: load, read instruction memory, write instruction memory, break, conditional call, interrupt control, conditional jump, indirect register control, loop instruction, no operation, repeat, return, stop, pack, and unpack.
  • Examples of ALU instructions include: absolute function, add function, add and accumulate function, add and shift function, subtract function, subtract and accumulate function, bit-wise AND function, bit-wise OR function, bit-wise exclusive OR function, min/max functions, store accumulator function, sign-extend function, shift left function, shift right function, and an instruction to test data valid bits.
  • exemplary instructions include: multiply and accumulate function, multiply instruction, and mode set.
  • an exemplary instruction includes a bit-rotation instruction.
  • the general purpose programming element (GPE) 210 includes the basic flow and ALU instructions previously discussed, along with the custom bit-rotation instruction, and is the basic programming element upon which all the other programming elements 202 , 204 , 206 , and 208 are built.
  • general purpose programming element (GPE) 210 implements the base line instruction set, as previously discussed.
  • the input programming element 202 is based on the general purpose programming element 210 (minus the bit rotation instruction) with a quad port interface as an input port to receive incoming data.
  • the output programming element 204 is likewise based on the general purpose programming element 210 (minus the bit rotation instruction) with a quad port interface as an output port to output data.
  • MAC programming elements 206 and 208 are likewise based on the general purpose programming element 210 (minus the bit rotation instruction) with enhanced MAC functionality provided by the multiply-accumulators 242 and 244 , respectively.
  • MAC programming elements 206 and 208 support dual Single Instruction Multiple Data (SIMD) 8 ⁇ 8 instructions or single 16 ⁇ 16 multiply-accumulate instructions.
  • SIMD Single Instruction Multiple Data
  • MAC programming elements 206 and 208 provide a wide array of arithmetic and logic functions useful for implementing image processing algorithms. It should be appreciated that the multiply-accumulators 242 and 244 of the MAC programming elements 206 and 208 , respectively, like any other ALU unit, can utilize data from any of the cluster communication registers (CCRs) 230 or from local registers.
  • CCRs cluster communication registers
  • the shared memory 230 may be a plurality of cluster communication registers (CCRs), as previously discussed.
  • the cluster communication registers (CCRs) 230 may each be 16-bit registers.
  • the cluster communication registers 230 allow the processing elements 202 , 204 , 206 , 208 and 210 to exchange data and may be used as general purpose registers.
  • data valid (DV) bits may implement a semaphore system to coordinate data flow and cluster communication register ownership by the various processing elements.
  • FIG. 3 illustrates an example of a shared memory 230 including a plurality of cluster communication registers (CCRs) 302 1-N , according to one embodiment of the present invention.
  • each cluster communication register 302 1-N may include 16 data bits and has one additional data bit 304 added to it (e.g. PE 1 , PE 2 , PE 3 . . . PE n ) for each processing element in the image signal processor 200 .
  • these additional processing element identification bits 304 termed data valid (DV) bits, may be used to indicate the ownership of each cluster communication register by which processing element.
  • the processor element identification data bit (DV bit) may be set high to indicate the processing element that currently owns the cluster communication register.
  • the memory command handler 220 may be coupled to memory 222 (e.g. data RAM), which allows for local storage of data, constants and instructions within the image signal processor (ISP) 200 . It provides a scalable mechanism for local storage optimized for access patterns characteristic of image processing.
  • the memory command handler 220 provides the means for accessing the data in structured patterns often required by image processing such as by component, by row, by column, or by 2D block.
  • the memory command handler 220 is utilized to support independent data streams using memory address generators (MAGs), in image processing applications, as discussed below in more detail later.
  • MAGs memory address generators
  • the data that each programming element 202 , 204 , 206 , 208 , and 210 will process typically comes through one or more of the cluster communication registers (CCRs) 230 .
  • the cluster communication registers (CCRs) 230 are 16-bit wide, and therefore require 16-bit wide data paths.
  • all communication to the memory command handler 220 is therefore required to be done through the same 16-bit data path 221 .
  • the restricted 16-bit data path 221 creates several problems around setting up and controlling the memory command handler 220 . For example, the narrow data path limits the number of commands and the information contained in each command.
  • 16-bit data paths are optimal for image processing applications because a single pixel is usually represented by a subsampled color space in 16-bits, such as YU, YV or La, Lb or YCr, YCb, etc.
  • the memory command handler 220 may operate utilizing a 16-bit data path in a unique and efficient manner.
  • an image signal processor that includes a memory command handler having a plurality of memory address generators that are coupled to a local memory, which stores data related to image processing. Each memory address generator generates a memory address to the local memory and interprets a command to be performed on the data of the local memory located at the memory address to aid in image processing tasks.
  • a shared memory is coupled to the plurality of memory address generators and is used to store data to be sent to the local memory and commands to be performed by the memory address generators.
  • the shared memory may comprise a plurality of cluster communication registers that interface with the memory address generators by the use of a cluster communication register interface.
  • the plurality of cluster communication registers may include data cluster communication registers to store data and command cluster communication registers to store commands.
  • a pair of cluster communication registers may be assigned to each memory address generator, wherein each pair of cluster communication registers includes a data cluster communication register and a command cluster communication register.
  • the memory command handler includes an arbiter to arbitrate access to the local memory by the memory address generators.
  • the plurality of cluster communication registers may be at least 16-bit registers and 16-bit wide data paths may be used to couple the cluster communication registers to the memory address generators, the memory address generators to the arbiter, and the arbiter to the local memory.
  • FIG. 4 shows a block diagram illustrating a memory command handler 400 , according to one embodiment of the invention.
  • the memory command handler 400 in conjunction with local memory 402 , provides a means for accessing data and structured patterns often required by image processing such as by component, by row, by column, or by 2D block.
  • the memory command handler 400 operates in conjunction with local memory 402 and includes a plurality of memory address generators (MAGs) 404 , an arbiter 406 , a cluster communication register (CCR) interface 410 for coupling to cluster communication registers (CCRs) 414 , and a global bus interface 420 .
  • the memory address generators (MAGs) 404 e.g. GB MAG, MAG 0 , MAG 1 . . . MAG 7 ) are utilized to support independent data streams.
  • MAGs memory address generators
  • CCRs cluster communication register
  • GB global bus
  • any number of memory address generators may be utilized to support any number of suitable data streams.
  • Each of the MAGs 0 - 7 are coupled to the cluster communication registers (CCRs) 414 through the cluster communication register (CCR) interface 410 , respectively, as well as to the arbiter 406 , and the GB MAG is coupled directly to the arbiter 406 .
  • the arbiter 406 on a clock cycle basis controls access to the memory 402 .
  • a global bus interface 420 couples the memory command handler 400 to the global bus 423 and all the other MAGs of all the other ISPs of the image processor.
  • the global bus 423 connects to the PCI bus and is coupled to all of the other image signal processors (ISPS) of the image processor, as well as all the other functional units (not shown) of the image processor.
  • ISPS image signal processors
  • the various units and registers of the image processor may be set up and controlled through the global bus 423 .
  • the global bus 423 is used to read and write the configuration registers of each ISP.
  • the global bus is 16-bits wide and includes a 16-bit data bus and an 8-bit address bus and conveys interrupt status to the PCI bus.
  • the memory command handler 400 may be programmed through the global bus interface 420 for all commands to the memory address generators 404 .
  • Data to the memory 402 is communicated through the cluster communication registers (CCRs) 414 , utilizing MAGs 0 - 7 404 , as well as through some commands. Additionally, data to the memory 402 may be communicated from the global bus 423 , including data from other MAGs of other ISP's utilizing the GB MAG.
  • CCRs cluster communication registers
  • the memory 402 may be static random access memory (SRAM).
  • SRAM 402 in one embodiment, may be organized as N addresses of 16-bit words.
  • the SRAM 402 appears to software as one contiguous area of memory.
  • the SRAM block 402 has a data-in port (DI), a data-out port (DO), a control port (CNTL) and an address port (A), all of which are coupled to arbiter 406 .
  • DI data-in port
  • DO data-out port
  • CNTL control port
  • A address port
  • the arbiter 406 accepts all requests for access to the SRAM block 402 from the memory address generators (MAGs) 404 and arbitrates for ownership of the memory control (CNTL), address (A), and data (DI and DO) busses.
  • the arbiter 406 implements, in one embodiment, a round-robin type of arbitration where the last memory address generator 404 granted access assumes the lowest priority, the next memory address generator 404 assumes the highest priority, and the descending priority chain is forwarded through each sequential memory address generator.
  • the arbiter 406 decides on a clock-by-clock basis which memory address generator 404 gets to perform a memory access cycle.
  • the CCR interface 410 connects the memory address generators (MAGs) 404 (e.g. MAGs 0 - 7 ) to the cluster communication registers 414 for passing data and commands to and from the SRAM block 402 via the memory address generators.
  • the memory address generators 404 e.g. MAGs 0 - 7
  • CCRs cluster communication registers
  • Each pair of cluster communication registers (CCRs) includes one command CCR and one data CCR.
  • the command CCRs are used to send various read and write commands to the memory address generators 404 .
  • the data CCRs are used to send the data words to the SRAM block 402 via the memory address generators 404 .
  • an appropriate data valid (DV) bit may be set in the cluster communication register (CCR) 414 for the memory command handler 400 and by using the cluster communication register for that particular memory address generator (MAG) 404 a specific memory address generator 404 (e.g. one of MAGs 0 - 7 ) is selected. If the memory command handler 400 data value (DV) bit is not set, the memory command handler 400 does not respond to the command or data in the cluster communication register. This allows the cluster communication registers (CCRs) 414 to be used to communicate with other programming elements even though the cluster communication register is connected to a particular memory address generator within the memory command handler. Conversely, when the memory address generator returns data through a cluster communication register any combination of data valid (DV) bits can be set returning the data to any combination of programming elements in the overall image signal processor.
  • FIG. 5 illustrates an example of memory address generators (MAGs) mapped to particular cluster communication registers (CCRs), according to one embodiment of the present invention.
  • CCR 0 and CCR 1 are assigned to MAG 0 for command and data 502 and 504 , respectively.
  • CCR 2 and CCR 3 are assigned to MAG 1 for command and data 506 and 508 , respectively, CCR 4 and CCR 5 are assigned to MAG 2 for command and data 510 and 512 , respectively, etc.
  • the memory address generators (MAGs) 404 are used to generate the address for each word passed through the data cluster communication registers 414 into the SRAM memory block 402 and to interpret the commands sent to the memory address generators 404 through the command cluster communication registers 414 , as well as from the global bus 423 through the GB MAG.
  • the memory address generators (MAGs) 404 there are nine memory address generators (MAGs) 404 in each memory command handler in each image signal processor.
  • Each memory address generator (MAG) includes a command interpreter that receives commands via the command cluster communication register (CCR) 414 or global bus 423 and decodes them into various functions supported by the memory address generator, as discussed below in more detail.
  • each memory address generator (MAG) 404 is particularly optimized for image processing algorithms, such that it can handle 2D arrays in a variety of formats and dimensions.
  • the power and flexibility of each memory address generator (MAG) 404 is created by the various parameters that are programmed by the processing elements through the cluster communication registers 414 .
  • embodiments of the invention allow the commands to be a single 16-bit word.
  • FIG. 6 is a block diagram showing a variety of registers that may be utilized within a memory address generator (MAG) 404 , according to one embodiment of the present invention.
  • each memory address generator 404 may contain a: mask register 602 , data path data valid (DV) bits register 604 , base offset register 606 , memory pointer register 608 , first increment register 610 , a second increment register 612 , operation complete register 614 , and registers for various control bits 616 .
  • each memory address generator 404 may include additional registers not shown here.
  • the encoding for commands may be accomplished in a way that allows for the maximum number of bits for parameters.
  • a command interpreter of each memory address generator 404 receives commands via a command cluster communication register (CCR) 414 and encodes them into various functions supported by the particular memory address generator.
  • CCR command cluster communication register
  • FIG. 7 lists an example of commands that may be supported by each of the memory address generators (MAGs) 404 , according one embodiment of the present invention. As shown in FIG.
  • these commands include, but are not limited to: a write mask command 702 utilized in address calculations to create circular buffer addressing; set data path data valid (DV) bits 704 to determine the target processing elements for the read data; a read immediate command 706 to read RAM from a specified address; a write immediate command 708 to write RAM from the data cluster communication register (CCR) to a specified RAM address; a write MPR command 710 which provides an initial offset value to be used in address calculations; a write increment register command 712 to provide X and Y increment values for 1D or 2D addressing; a write base offset register command 714 to set the base offset register used in addressing; a read indirect, N words command 716 to read N words into the data cluster communication register (CCR) using the memory address generator (MAG) memory pointer; a write indirect, N words command 718 to write N words from the data CCR using the MAG memory pointer; a read op complete command 720 which is used to signal the memory command handler (MCH) controlling a processing element that
  • FIG. 8 shows a diagram illustrating one example of a method of encoding commands, according to one embodiment of the present invention.
  • the read immediate command is encoded as 00 in the top two bits of the 16-bit data path example. This leaves 14 bits left over for the immediate address that would be included in the command, which gives each memory address generator (MAG) the ability to directly address, in one example, 32 KB of data.
  • MAG memory address generator
  • the read immediate bits are purposely chosen to be 00 so that a lookup table can be implemented by simply writing the lookup input table as the command.
  • command cluster communication register For example, writing a 0057h to the command cluster communication register (CCR) will cause read immediate to location 0057h in the RAM, which would return the new output value for that input value.
  • the base offset register can offset the look up table address.
  • some commands do not need as much parameter data and may be encoded as longer commands.
  • An example of this is the set read operation complete register, which, in this example, needs only nine bits of parameter data. For example, as shown in FIG. 8, many of the commands have fixed values (i.e. 0 or 1).
  • the image processor having image signal processors (ISPs) utilizing the memory command handler (MCH) provides address generating functionality that can handle many different image processing tasks and can control a large number of parameters with, in one example, single 16-bit commands.
  • the powerful addressing capability of the memory address generators (MAGs) and the memory command handler advantageously provides for the automatic feeding of data, such as in two dimensional, sub-sampled arrays.
  • the image processor having image signal processors (ISPs) utilizing the memory command handler (MCH) provides the performance of an ASIC with the programmability of a processor.
  • the architecture provides the flexibility to implement, in one embodiment, a wide range of document processing image paths (e.g. high ppm monochrome, binary color, contone color, MRC-based algorithms, etc.) while accelerating the execution of frequently used imaging functions (e.g. color conversion, compression, and filter operations).
  • document processing image paths e.g. high ppm monochrome, binary color, contone color, MRC-based algorithms, etc.
  • frequently used imaging functions e.g. color conversion, compression, and filter operations.
  • the embodiments of the present invention can be implemented in hardware, software, firmware, middleware or a combination thereof and utilized in systems, subsystems, components, or sub-components thereof.
  • the elements of the present invention are the instructions/code segments to perform the necessary tasks.
  • the program or code segments can be stored in a machine readable medium (e.g. a processor readable medium or a computer program product), or transmitted by a computer data signal embodied in a carrier wave, or a signal modulated by a carrier, over a transmission medium or communication link.
  • the machine-readable medium may include any medium that can store or transfer information in a form readable and executable by a machine (e.g. a processor, a computer, etc.).
  • Examples of the machine-readable medium include an electronic circuit, a semiconductor memory device, a ROM, a flash memory, an erasable programmable ROM (EPROM), a floppy diskette, a compact disk CD-ROM, an optical disk, a hard disk, a fiber optic medium, a radio frequency (RF) link, etc.
  • the computer data signal may include any signal that can propagate over a transmission medium such as electronic network channels, optical fibers, air, electromagnetic, RF links, bar codes, etc.
  • the code segments may be downloaded via networks such as the Internet, Intranet, etc.

Abstract

Disclosed is an image signal processor for use in an image processing system. The image signal processor includes a local memory to store data and a memory command handler having a plurality of memory address generators. Each memory address generator generates a memory address to the local memory and interprets a command to perform an operation on the data of the local memory located at the memory address to aid in image processing tasks.

Description

    FIELD
  • Embodiments of the invention relate to the field of image processing. More particularly, embodiments of the invention relate to a memory command handler for use in an image signal processor having a data driven architecture. [0001]
  • DESCRIPTION OF RELATED ART
  • Generally, image processing is the application of signal processing techniques to the domain of the two-dimensional images such as photocopies, scanned images, photographs, video, etc. Typically, image processing involves analyzing and manipulating images and is performed utilizing electronic means. Image processing generally involves three operations: importing an image with a device such as an optical scanner or directly through some other type of device (e.g. digital camera, digital camcorder, etc.); manipulating or analyzing the image in some way; and lastly outputting the result. [0002]
  • Although not always, but in many instances, the image is manipulated or analyzed by electronic means in a particular or pre-programmed way. In this stage, techniques such as image enhancement and data compression may be utilized. For example, the image may be analyzed or enhanced to find patterns that are not visible to the human eye. The analysis of a picture may utilize techniques that can identify structures, shades, colors and relationships that cannot be perceived by the human eye. In this way, image processing techniques can be utilized to identify problems, such as in forensic medicine. As another example, image processing techniques may be used in creating weather maps from satellite pictures. More generally, image processing may deal with images in bitmapped graphics formats that have been scanned in, or captured with digital cameras or the like, that may then be analyzed, manipulated, enhanced, etc. Also, often an image is improved by image processing techniques such as by refining a degraded image that has been previously scanned or entered from a video source. [0003]
  • As one particular example, image processing in document imaging applications has traditionally been handled by high-performance fixed-function Application Specific Integrated Circuits (ASICs). However, these fixed-function ASICs provide little flexibility in image processing tasks. On the other hand, programmable solutions (e.g. Field Programmable Gate Arrays (FPGAs), Digital Signal Processors (DSPs), etc.), while offering more flexibility in image processing tasks, have not offered the price vs. performance characteristics required for these applications to be competitive and widely utilized. Moreover, because of the lack of affordable, programmable and scalable solutions, products across different performance segments have not been standardized onto a common platform. [0004]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram illustrating an example of a system configuration for use in image processing, according to one embodiment of the present invention. [0005]
  • FIG. 2 is a block diagram illustrating an example of architecture for an image signal processor (ISP), according to one embodiment of the present invention. [0006]
  • FIG. 3 is a block diagram illustrating an example of a shared memory including a plurality of cluster communication registers (CCRs), according to one embodiment of the present invention. [0007]
  • FIG. 4 is a block diagram illustrating a memory command handler, according to one embodiment of the present invention. [0008]
  • FIG. 5 illustrates an example of memory address generators (MAGs) mapped to cluster communication registers (CCRs), according to one embodiment of the present invention. [0009]
  • FIG. 6 is a block diagram showing a variety of registers that may be utilized within a memory address generator (MAG), according to one embodiment of the present invention. [0010]
  • FIG. 7 lists an example of commands that may be supported by each of the memory address generators (MAGs), according one embodiment of the present invention. [0011]
  • FIG. 8 is a diagram illustrating one example of a method of encoding commands, according to one embodiment of the present invention. [0012]
  • DETAILED DESCRIPTION
  • In the following description, the various embodiments of the invention will be described in detail. However, such details are included to facilitate understanding of the invention and to describe exemplary embodiments for employing the invention. Such details should not be used to limit the invention to the particular embodiments described because other variations and embodiments are possible while staying within the scope of the invention. Furthermore, although numerous details are set forth in order to provide a thorough understanding of the embodiments of the invention, it will be apparent to one skilled in the art that these specific details are not required in order to practice the embodiments of the invention. In other instances details such as, well-known methods, types of data, protocols, procedures, components, electrical structures and circuits, are not described in detail, or are shown in block diagram form, in order not to obscure the invention. Furthermore, embodiments of the invention will be described in particular embodiments but may be implemented in hardware, software, firmware, middleware, or a combination thereof. [0013]
  • Embodiments of the invention relate to the field of image processing. More particularly, embodiments of the invention relate to a memory command handler (MCH) for use in an image signal processor having a data driven architecture. Even more particularly, embodiments of the invention relate to an image signal processor (ISP) for use in an image processor having a memory command handler (MCH) that acts as an address generator device to handle many different image processing tasks and to control a large number of parameters utilizing single commands. Further, the memory command handler includes powerful addressing capabilities by utilizing a plurality of memory address generators (MAGs), which allow for the automatic feeding of data held in, for example, two dimensional sub-sampled arrays. [0014]
  • Advantageously, the memory command handler using a data driven architecture in conjunction with its powerful addressing capabilities, may be used to control parameters in the memory address generators to efficiently manage image processing tasks. For example, typically, in data flow applications, such as image processing tasks, a very small set of instructions may be efficiently utilized to operate on large data streams. Embodiments of the invention relate to optimized architectures for maximizing the efficiency of image processing applications. [0015]
  • FIG. 1 is a block diagram illustrating an example of a [0016] system configuration 100 for use in image processing, according to one embodiment of the present invention. The system configuration 100 includes at least one image processor 102 having a plurality of individual image signal processors (ISPs) 104 coupled to one another. As is known in the art, image processors typically include a number of individual image signal processors.
  • However, according to embodiments of the present invention, the [0017] image processor 102 implements, a data-driven, shared register architecture, as discussed below. In one embodiment of the invention, the data paths between the image signal processors (ISPs) 104 may be 16-bits wide and may operate at a core frequency of approximately 266 MHz. As previously discussed, typically in data flow applications, a very small set of instructions operate on large data streams. The system configuration 100, according to embodiments of the present invention, is optimized to provide maximum performance on these types of image processing data flow applications. In one particular embodiment, the image processor 102 may include eight image signal processors 104 which are connected to each other through programmable ports. In one example, the programmable ports may be quad ports.
  • The [0018] system configuration 100, as discussed below, is highly scalable and programmable to perform image processing tasks on an input pixel stream 105 from other devices (e.g. other image processing chips or devices capable of producing an input pixel stream).
  • Further, in one embodiment, [0019] memory devices 106 and 107 may be coupled to the image processor 102. In one embodiment, the memory devices 106 and 107 may be Double Data Rate (DDR) Synchronous Dynamic Random Access Memories (SDRAMs). In one example, these dual DDR SDRAM devices may provide more than 1 Gbyte per second of data movement bandwidth. This data flow bandwidth may be balanced with the processing performance of the image signal processors 104.
  • As shown in FIG. 1, the [0020] system configuration 100 includes at least one host processor 111, input/output (I/O) interfaces 133, and a network interface 134 coupled to the image processor 102 by a bus 103. The bus 103, for example, may be a peripheral component interface (PCI) type of bus. System memory devices 113 may also be coupled to processor 111. Additionally, the system configuration 100 may include additional components (not shown) such as co-processors, modems, etc.—this being only a very basic example of a system configuration. As previously discussed, the host processor 111 may be coupled to the image processor 102 and the other components previously described by bus 103. For the purposes of the present specification, the term “processor” or “CPU” refers to any machine that is capable of executing a sequence of instructions and shall be taken to include, but not limited to, general purpose microprocessors, special purpose microprocessors, application specific integrated circuits (ASICs), multi-media controllers, microprocessors, microcontrollers, etc. In one embodiment, the processor 111 may be a general-purpose microprocessor that is capable of executing an INTEL ARCHITECTURE instruction set. For example, the processor 111 can be one of the PENTIUM classes of processors or one of the CELERON classes of processors.
  • The [0021] memory devices 106 and 107, as well as the system memory device 113, can include any memory device adapted to store digital information, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), and double data rate (DDR) SDRAM or DRAM, etc. Thus, in one embodiment, the memory devices 106 and 107 and the system memory device 113 include volatile memory. Further, it should be appreciated that memory devices 106 and 107 and system memory device 113 can also consist of or include non-volatile memory such as read-only memory (ROM).
  • Also, the [0022] system configuration 100 may include I/O interfaces and ports 133 to interface with I/O devices 139. I/O interfaces 133 include, but are not limited to, PCI slots, PCI agents, universal serial bus (USB) interfaces, Institute of Electrical Electronics Engineering (IEEE) 1394 interfaces, parallel port interfaces, phone interfaces, integrated drive electronic (IDE) interfaces (e.g. for a hard drive), high-speed serial interfaces, as well as other types of interfaces, as should be appreciated by those of skill in the art. It should also be appreciated that there are a wide variety of different types of I/O devices that may be connected through suitable I/O interfaces to the system configuration 100. Examples of I/O devices may include any I/O device to perform suitable I/O functions. For example, I/O devices may include a monitor, a keypad, a modem, a printer, storage devices (e.g. Compact Disk Rom (CD ROM), Digital Video Disk (DVD), hard drive, floppy drive, etc.) or any other types of I/O devices (e.g., input devices, mouse, trackball, pointer device), media cards (e.g. audio, video, graphics, etc.). More particularly, I/O devices may include phones, fax machines, scanners, photocopy machines, digital copiers, video cameras, digital cameras, multi-function peripherals, as well as other types of devices suitable for coupling to an image processor. Moreover, a network interface 134 may be provided to couple the system configuration 100 to a network 140. The network interface 134 is provided to communicate with a network 140 (e.g. a Local Area Network (LAN), a Wide Area Network (WAN), the Internet, etc.) using a standard and suitable network protocol, as are known in the art.
  • The [0023] image processor 102 of exemplary system configuration 100 may utilize the external host processor 111 and bus 103 for downloading microcode, register configuration, register initialization, interrupt servicing and for uploading and downloading image data for image processing. The data read from any of the interfaces described previously may be processed by the image signal processors 104 of the image processor 102. In one embodiment, the image signal processors 104 may be connected to each other in a shared mesh topology through quad ports to facilitate rapid and flexible movement of data across the image processor 102. In one example, the image processor 102 may include eight image signal processors (ISPs) 104. Further, a global bus 123 connects to the PCI bus 103 and is coupled to all of the image signal processors (ISPs) 104 of the image processor 102, as well as all the other functional units (not shown) of the image processor 102. The various units and registers of the image processor 102 may be set up and controlled through the global bus 123. Particularly, the global bus 123 is used to read and write the configuration registers of each ISP. In one embodiment, the global bus is 16-bits and has a 16-bit data bus and an 8-bit address bus and conveys interrupt status to the PCI bus 103.
  • Turning now to FIG. 2, FIG. 2 is a block diagram illustrating an example of architecture for an image signal processor (ISP) [0024] 200, according to one embodiment of the present invention. As can be seen in FIG. 2, the ISP 200 includes an input programming element (Input PE) 202, an output programming element (Output PE) 204, a first Multiply-Accumulate programming element (MACPE) 206, a second MACPE 208, a general purpose programming element (GPE) 210, first and second accelerator units 214 and 216, respectively, and a memory command handler 220, all of which are coupled to a shared memory 230. In one embodiment, the shared memory 230 may include a plurality of cluster communication registers (CCRs). In an even more particular embodiment, each cluster communication register may be 16-bits wide. The cluster communication registers (CCRs) 230 of the image signal processor 200 are used to store and transfer data between the processing elements, as is discussed below in more detail. The input and output programming elements 202 and 204 are configured to route data through the image signal processor 200. Also, the image signal processor 200 may include data storage memory 222 managed by the memory command handler 200 to maximize bandwidth access efficiency, as discussed below.
  • Further, each [0025] programming element 202, 204, 206, 208, and 210, respectively, includes local registers (e.g. sixteen 16-bit local registers). Both the cluster communication registers 230 and the other local registers can be used in image processing applications, for example, on either 16-bit images or dual 8-bit images. All of the processing elements may run concurrently and implement a common base line instruction set that consists of flow control instructions and arithmetic logic unit (ALU) instructions, as discussed below. Particularly, the MAC programming elements 206 and 208 respectively, may implement additional multiply-accumulate instructions and the general purpose programming element implement 210 may implement additional bit-rotation instructions. The performance of the programming elements combined with the programmability of the image signal processor (ISP) 200 allows a programmer to develop and tune algorithms rapidly for optimum performance. Also, the ISP 200 may include dedicated hardware accelerators 214 and 216. For example, these accelerator units 214 and 216 may include Huffman encoder/decoder accelerators, G4 engine accelerators, Fast 2D triangular filter accelerators, etc.
  • As previously discussed, image processing applications typically perform a small set of operations on large amounts of data. The image processor, and each image signal processor (ISP) [0026] 200 of the image processor, may implement a data flow/data driven architecture to implement image processing functions. In each image signal processor 200, each programming element 202, 204, 206, 208, and 210, respectively, includes an instruction memory 240 to hold instructions. In one embodiment, the instruction memory 240 may be a 128-instruction memory. Typically, an instruction set may be used that consists of three or four tight loops in addition to data flow and arithmetic instructions. Further, each programming element may have local registers (e.g. sixteen 16-bit registers) for local data storage and also has read/write access to all of the cluster communications registers (CCRs) 230. The cluster communication registers 230 are used to exchange data between the programming elements. Particularly, in one embodiment, data passing through the cluster communication registers 230 may be tagged with data valid (DV) bits. The data valid bits may be used to establish the ownership of the data storage resource and may establish one or more consumers of the data as discussed below.
  • Briefly summarizing, the [0027] image processor 102 of the system configuration 100 (FIG. 1) includes a plurality of image signal processors (ISPs) 200 (FIG. 2). In one embodiment the image processor includes eight image signal processors (ISPs) 200. Although, it should be appreciated that any suitable number of image signal processors can be utilized without any significant change to the architecture. Further, programming elements 202, 204, 206, 208, and 210 and hard-wired accelerators 214 and 216 are interconnected through a shared memory 230, which, in one embodiment, may be implemented as cluster communication registers (CCRs) (e.g. 16-bit registers). In one embodiment, the five programming elements in the image signal processor 200 communicate with each other through the cluster communication registers 230. In this embodiment, the cluster communication registers 230 are the only mechanism by which the five programming elements in the image signal processor 200 can communicate with each other. The memory command handler 220, as discussed below, is used to manage the data flow to the programming elements.
  • Looking particularly now at some of the individual components of the [0028] image signal processor 200, the programming elements (PEs) 202, 204, 206, 208, and 210 will now be particularly discussed. As previously mentioned, each programming element has its own set of local registers as well as operating in conjunction with the cluster communication registers 230. Both the local registers and the cluster communication registers, in one embodiment of the invention, may be 16-bits wide and can be used for either 16-bit operands or two 8-bit operands. Further, each programming element may be designed for a basic set of instructions. In one embodiment, the basic set of instructions may be divided into flow control instructions that support flow control, ALU instructions that support arithmetic and logic functions, as well as custom instructions. For example, as to custom instructions, the MAC programming elements (MACPEs) 206 and 208 have multiply accumulate instructions and the general purpose programming element (GPE) 210 may include bit-rotation instructions. However, all of the programming elements 202, 204, 206, 208, and 210 support all of the flow control instructions and ALU instructions.
  • Examples of flow control instructions include: load, read instruction memory, write instruction memory, break, conditional call, interrupt control, conditional jump, indirect register control, loop instruction, no operation, repeat, return, stop, pack, and unpack. Examples of ALU instructions include: absolute function, add function, add and accumulate function, add and shift function, subtract function, subtract and accumulate function, bit-wise AND function, bit-wise OR function, bit-wise exclusive OR function, min/max functions, store accumulator function, sign-extend function, shift left function, shift right function, and an instruction to test data valid bits. As to custom instructions for use with the MAC programming elements (MACPEs) [0029] 206 and 208, exemplary instructions include: multiply and accumulate function, multiply instruction, and mode set. As to custom instructions for the general purpose programming element (GPE) 210, an exemplary instruction includes a bit-rotation instruction.
  • The general purpose programming element (GPE) [0030] 210 includes the basic flow and ALU instructions previously discussed, along with the custom bit-rotation instruction, and is the basic programming element upon which all the other programming elements 202, 204, 206, and 208 are built. Thus, general purpose programming element (GPE) 210 implements the base line instruction set, as previously discussed. The input programming element 202 is based on the general purpose programming element 210 (minus the bit rotation instruction) with a quad port interface as an input port to receive incoming data. Similarly, the output programming element 204 is likewise based on the general purpose programming element 210 (minus the bit rotation instruction) with a quad port interface as an output port to output data. MAC programming elements 206 and 208 are likewise based on the general purpose programming element 210 (minus the bit rotation instruction) with enhanced MAC functionality provided by the multiply- accumulators 242 and 244, respectively. In one embodiment, MAC programming elements 206 and 208 support dual Single Instruction Multiple Data (SIMD) 8×8 instructions or single 16×16 multiply-accumulate instructions. In addition to these basic capabilities, MAC programming elements 206 and 208 provide a wide array of arithmetic and logic functions useful for implementing image processing algorithms. It should be appreciated that the multiply- accumulators 242 and 244 of the MAC programming elements 206 and 208, respectively, like any other ALU unit, can utilize data from any of the cluster communication registers (CCRs) 230 or from local registers.
  • Turning now to the shared [0031] memory 230, in one embodiment, the shared memory 230 may be a plurality of cluster communication registers (CCRs), as previously discussed. In one example, the cluster communication registers (CCRs) 230 may each be 16-bit registers. The cluster communication registers 230 allow the processing elements 202, 204, 206, 208 and 210 to exchange data and may be used as general purpose registers. In one embodiment, data valid (DV) bits may implement a semaphore system to coordinate data flow and cluster communication register ownership by the various processing elements.
  • Referring briefly to FIG. 3, FIG. 3 illustrates an example of a shared [0032] memory 230 including a plurality of cluster communication registers (CCRs) 302 1-N, according to one embodiment of the present invention. In one embodiment, each cluster communication register 302 1-N may include 16 data bits and has one additional data bit 304 added to it (e.g. PE1, PE2, PE3 . . . PEn) for each processing element in the image signal processor 200. In this way, these additional processing element identification bits 304, termed data valid (DV) bits, may be used to indicate the ownership of each cluster communication register by which processing element. For example, the processor element identification data bit (DV bit) may be set high to indicate the processing element that currently owns the cluster communication register.
  • Returning again to FIG. 2, the [0033] memory command handler 220 will now be discussed in more particular detail. The memory command handler 220 may be coupled to memory 222 (e.g. data RAM), which allows for local storage of data, constants and instructions within the image signal processor (ISP) 200. It provides a scalable mechanism for local storage optimized for access patterns characteristic of image processing. The memory command handler 220 provides the means for accessing the data in structured patterns often required by image processing such as by component, by row, by column, or by 2D block. The memory command handler 220 is utilized to support independent data streams using memory address generators (MAGs), in image processing applications, as discussed below in more detail later.
  • As previously discussed, the data that each [0034] programming element 202, 204, 206, 208, and 210 will process typically comes through one or more of the cluster communication registers (CCRs) 230. In one embodiment of the invention, the cluster communication registers (CCRs) 230 are 16-bit wide, and therefore require 16-bit wide data paths. Thus, in this embodiment, all communication to the memory command handler 220 is therefore required to be done through the same 16-bit data path 221. The restricted 16-bit data path 221 creates several problems around setting up and controlling the memory command handler 220. For example, the narrow data path limits the number of commands and the information contained in each command. However, 16-bit data paths are optimal for image processing applications because a single pixel is usually represented by a subsampled color space in 16-bits, such as YU, YV or La, Lb or YCr, YCb, etc. Moreover, as an example, the memory command handler 220, according to one embodiment, as discussed below, may operate utilizing a 16-bit data path in a unique and efficient manner.
  • As a brief overview, generally, embodiments of the present relate to an image signal processor that includes a memory command handler having a plurality of memory address generators that are coupled to a local memory, which stores data related to image processing. Each memory address generator generates a memory address to the local memory and interprets a command to be performed on the data of the local memory located at the memory address to aid in image processing tasks. A shared memory is coupled to the plurality of memory address generators and is used to store data to be sent to the local memory and commands to be performed by the memory address generators. In one embodiment, the shared memory may comprise a plurality of cluster communication registers that interface with the memory address generators by the use of a cluster communication register interface. The plurality of cluster communication registers may include data cluster communication registers to store data and command cluster communication registers to store commands. [0035]
  • Particularly, a pair of cluster communication registers may be assigned to each memory address generator, wherein each pair of cluster communication registers includes a data cluster communication register and a command cluster communication register. Further, discussed below, the memory command handler includes an arbiter to arbitrate access to the local memory by the memory address generators. Also, in one particular embodiment, the plurality of cluster communication registers may be at least 16-bit registers and 16-bit wide data paths may be used to couple the cluster communication registers to the memory address generators, the memory address generators to the arbiter, and the arbiter to the local memory. [0036]
  • With reference now to FIG. 4, FIG. 4 shows a block diagram illustrating a [0037] memory command handler 400, according to one embodiment of the invention. The memory command handler 400, in conjunction with local memory 402, provides a means for accessing data and structured patterns often required by image processing such as by component, by row, by column, or by 2D block. The memory command handler 400 operates in conjunction with local memory 402 and includes a plurality of memory address generators (MAGs) 404, an arbiter 406, a cluster communication register (CCR) interface 410 for coupling to cluster communication registers (CCRs) 414, and a global bus interface 420. The memory address generators (MAGs) 404 (e.g. GB MAG, MAG0, MAG1 . . . MAG7) are utilized to support independent data streams.
  • Particularly, in the embodiment shown in FIG. 4, nine memory address generators (MAGs) [0038] 404 are utilized to support up to eight independent data streams to and from the cluster communication register (CCRs) 414 utilizing MAGs 0-7 and another independent data stream from the global bus 423 through the global bus interface 420 utilizing the global bus (GB) MAG. However, it should be appreciated that any number of memory address generators may be utilized to support any number of suitable data streams. Each of the MAGs 0-7 are coupled to the cluster communication registers (CCRs) 414 through the cluster communication register (CCR) interface 410, respectively, as well as to the arbiter 406, and the GB MAG is coupled directly to the arbiter 406. The arbiter 406 on a clock cycle basis controls access to the memory 402.
  • Further, a [0039] global bus interface 420 couples the memory command handler 400 to the global bus 423 and all the other MAGs of all the other ISPs of the image processor. The global bus 423 connects to the PCI bus and is coupled to all of the other image signal processors (ISPS) of the image processor, as well as all the other functional units (not shown) of the image processor. The various units and registers of the image processor may be set up and controlled through the global bus 423. Particularly, the global bus 423 is used to read and write the configuration registers of each ISP. In one embodiment, the global bus is 16-bits wide and includes a 16-bit data bus and an 8-bit address bus and conveys interrupt status to the PCI bus. The memory command handler 400 may be programmed through the global bus interface 420 for all commands to the memory address generators 404.
  • Data to the [0040] memory 402 is communicated through the cluster communication registers (CCRs) 414, utilizing MAGs 0-7 404, as well as through some commands. Additionally, data to the memory 402 may be communicated from the global bus 423, including data from other MAGs of other ISP's utilizing the GB MAG.
  • Each of the components of the [0041] memory command handler 400 and the memory 402 will now be particularly discussed. Starting with the memory 402, in one embodiment, the memory 402 may be static random access memory (SRAM). The SRAM 402, in one embodiment, may be organized as N addresses of 16-bit words. The SRAM 402 appears to software as one contiguous area of memory. As shown in FIG. 4, the SRAM block 402 has a data-in port (DI), a data-out port (DO), a control port (CNTL) and an address port (A), all of which are coupled to arbiter 406.
  • Looking now to the [0042] arbiter 406, the arbiter 406 accepts all requests for access to the SRAM block 402 from the memory address generators (MAGs) 404 and arbitrates for ownership of the memory control (CNTL), address (A), and data (DI and DO) busses. The arbiter 406 implements, in one embodiment, a round-robin type of arbitration where the last memory address generator 404 granted access assumes the lowest priority, the next memory address generator 404 assumes the highest priority, and the descending priority chain is forwarded through each sequential memory address generator. The arbiter 406 decides on a clock-by-clock basis which memory address generator 404 gets to perform a memory access cycle.
  • Turning now to the cluster communication register (CCR) [0043] interface 410, the CCR interface 410 connects the memory address generators (MAGs) 404 (e.g. MAGs 0-7) to the cluster communication registers 414 for passing data and commands to and from the SRAM block 402 via the memory address generators. The memory address generators 404 (e.g. MAGs 0-7) are connected to specific pairs of cluster communication registers (CCRs) 414. Each pair of cluster communication registers (CCRs) includes one command CCR and one data CCR. The command CCRs are used to send various read and write commands to the memory address generators 404. The data CCRs are used to send the data words to the SRAM block 402 via the memory address generators 404.
  • In one embodiment, an appropriate data valid (DV) bit may be set in the cluster communication register (CCR) [0044] 414 for the memory command handler 400 and by using the cluster communication register for that particular memory address generator (MAG) 404 a specific memory address generator 404 (e.g. one of MAGs 0-7) is selected. If the memory command handler 400 data value (DV) bit is not set, the memory command handler 400 does not respond to the command or data in the cluster communication register. This allows the cluster communication registers (CCRs) 414 to be used to communicate with other programming elements even though the cluster communication register is connected to a particular memory address generator within the memory command handler. Conversely, when the memory address generator returns data through a cluster communication register any combination of data valid (DV) bits can be set returning the data to any combination of programming elements in the overall image signal processor.
  • Looking briefly at FIG. 5, FIG. 5 illustrates an example of memory address generators (MAGs) mapped to particular cluster communication registers (CCRs), according to one embodiment of the present invention. As shown in FIG. 5, CCR[0045] 0 and CCR1 are assigned to MAG0 for command and data 502 and 504, respectively. CCR2 and CCR3 are assigned to MAG1 for command and data 506 and 508, respectively, CCR4 and CCR5 are assigned to MAG2 for command and data 510 and 512, respectively, etc.
  • Returning to FIG. 4, the memory address generators (MAGs) [0046] 404 will now be discussed. The memory address generators (MAGs) 404 are used to generate the address for each word passed through the data cluster communication registers 414 into the SRAM memory block 402 and to interpret the commands sent to the memory address generators 404 through the command cluster communication registers 414, as well as from the global bus 423 through the GB MAG. In one embodiment there are nine memory address generators (MAGs) 404 in each memory command handler in each image signal processor. Each memory address generator (MAG) includes a command interpreter that receives commands via the command cluster communication register (CCR) 414 or global bus 423 and decodes them into various functions supported by the memory address generator, as discussed below in more detail.
  • Advantageously, each memory address generator (MAG) [0047] 404 is particularly optimized for image processing algorithms, such that it can handle 2D arrays in a variety of formats and dimensions. The power and flexibility of each memory address generator (MAG) 404 is created by the various parameters that are programmed by the processing elements through the cluster communication registers 414. Advantageously, embodiments of the invention allow the commands to be a single 16-bit word.
  • The memory address generators (MAGs) [0048] 404 of the memory command handler 400, according to embodiments of the invention, obtain their flexible capabilities through the use of several offset registers, counters, pointers, etc. Turning briefly to FIG. 6, FIG. 6 is a block diagram showing a variety of registers that may be utilized within a memory address generator (MAG) 404, according to one embodiment of the present invention. As shown in FIG. 6, each memory address generator 404 may contain a: mask register 602, data path data valid (DV) bits register 604, base offset register 606, memory pointer register 608, first increment register 610, a second increment register 612, operation complete register 614, and registers for various control bits 616. Of course, each memory address generator 404 may include additional registers not shown here. Further, the encoding for commands may be accomplished in a way that allows for the maximum number of bits for parameters.
  • As previously discussed, a command interpreter of each [0049] memory address generator 404 receives commands via a command cluster communication register (CCR) 414 and encodes them into various functions supported by the particular memory address generator. With reference now to FIG. 7, FIG. 7 lists an example of commands that may be supported by each of the memory address generators (MAGs) 404, according one embodiment of the present invention. As shown in FIG. 7, these commands include, but are not limited to: a write mask command 702 utilized in address calculations to create circular buffer addressing; set data path data valid (DV) bits 704 to determine the target processing elements for the read data; a read immediate command 706 to read RAM from a specified address; a write immediate command 708 to write RAM from the data cluster communication register (CCR) to a specified RAM address; a write MPR command 710 which provides an initial offset value to be used in address calculations; a write increment register command 712 to provide X and Y increment values for 1D or 2D addressing; a write base offset register command 714 to set the base offset register used in addressing; a read indirect, N words command 716 to read N words into the data cluster communication register (CCR) using the memory address generator (MAG) memory pointer; a write indirect, N words command 718 to write N words from the data CCR using the MAG memory pointer; a read op complete command 720 which is used to signal the memory command handler (MCH) controlling a processing element that a block transfer is complete; and an infinite indirect operation command 722 to set infinite indirect memory command handler (MCH) operations.
  • Turning now to FIG. 8, FIG. 8 shows a diagram illustrating one example of a method of encoding commands, according to one embodiment of the present invention. For example, as shown in FIG. 8, the read immediate command is encoded as 00 in the top two bits of the 16-bit data path example. This leaves 14 bits left over for the immediate address that would be included in the command, which gives each memory address generator (MAG) the ability to directly address, in one example, 32 KB of data. It should also be noted that the read immediate bits are purposely chosen to be 00 so that a lookup table can be implemented by simply writing the lookup input table as the command. For example, writing a 0057h to the command cluster communication register (CCR) will cause read immediate to location 0057h in the RAM, which would return the new output value for that input value. Of course the base offset register can offset the look up table address. Also, it should be noted, that some commands do not need as much parameter data and may be encoded as longer commands. An example of this is the set read operation complete register, which, in this example, needs only nine bits of parameter data. For example, as shown in FIG. 8, many of the commands have fixed values (i.e. 0 or 1). [0050]
  • Thus, the image processor having image signal processors (ISPs) utilizing the memory command handler (MCH) according to embodiments of the present invention, provides address generating functionality that can handle many different image processing tasks and can control a large number of parameters with, in one example, single 16-bit commands. The powerful addressing capability of the memory address generators (MAGs) and the memory command handler advantageously provides for the automatic feeding of data, such as in two dimensional, sub-sampled arrays. Further, the image processor having image signal processors (ISPs) utilizing the memory command handler (MCH) according to embodiments of the present invention, provides the performance of an ASIC with the programmability of a processor. Moreover, the architecture provides the flexibility to implement, in one embodiment, a wide range of document processing image paths (e.g. high ppm monochrome, binary color, contone color, MRC-based algorithms, etc.) while accelerating the execution of frequently used imaging functions (e.g. color conversion, compression, and filter operations). [0051]
  • While embodiments of the present invention and its various functional components have been described in particular embodiments, it should be appreciated that the embodiments of the present invention can be implemented in hardware, software, firmware, middleware or a combination thereof and utilized in systems, subsystems, components, or sub-components thereof. When implemented in software or firmware, the elements of the present invention are the instructions/code segments to perform the necessary tasks. The program or code segments can be stored in a machine readable medium (e.g. a processor readable medium or a computer program product), or transmitted by a computer data signal embodied in a carrier wave, or a signal modulated by a carrier, over a transmission medium or communication link. The machine-readable medium may include any medium that can store or transfer information in a form readable and executable by a machine (e.g. a processor, a computer, etc.). Examples of the machine-readable medium include an electronic circuit, a semiconductor memory device, a ROM, a flash memory, an erasable programmable ROM (EPROM), a floppy diskette, a compact disk CD-ROM, an optical disk, a hard disk, a fiber optic medium, a radio frequency (RF) link, etc. The computer data signal may include any signal that can propagate over a transmission medium such as electronic network channels, optical fibers, air, electromagnetic, RF links, bar codes, etc. The code segments may be downloaded via networks such as the Internet, Intranet, etc. [0052]
  • Further, while embodiments of the invention have been described with reference to illustrative embodiments, these descriptions are not intended to be construed in a limiting sense. Various modifications of the illustrative embodiments, as well as other embodiments of the invention, which are apparent to persons skilled in the art to which embodiments of the invention pertain, are deemed to lie within the spirit and scope of the invention. [0053]

Claims (38)

What is claimed is:
1. An image signal processor comprising:
a local memory to store data; and
a memory command handler including a plurality of memory address generators, each memory address generator to generate a memory address to the local memory and to interpret a command to be performed on the data of the local memory located at the memory address to aid in image processing tasks.
2. The image signal processor of claim 1, further comprising a shared memory coupled to the plurality of the memory address generators, the shared memory storing data to be sent to the local memory and commands to be performed by the memory address generators.
3. The image signal processor of claim 2, wherein the shared memory comprises a plurality of cluster communication registers.
4. The image signal processor of claim 3, further comprising a cluster communication register interface to couple the plurality of cluster communication registers to the plurality of memory address generators.
5. The image signal processor of claim 3, wherein the plurality of cluster communication registers include data cluster communication registers to store data and command cluster communication registers to store commands.
6. The image signal processor of claim 5, wherein a pair of cluster communication registers are assigned to each memory address generator.
7. The image signal processor of claim 6, wherein each pair of cluster communication registers includes a data cluster communication register and a command cluster communication register.
8. The image signal processor of claim 3, further comprising an arbiter to arbitrate access to the local memory by the memory address generators.
9. The image signal processor of claim 8, wherein the plurality of cluster communication registers are at least 16-bit registers.
10. The image signal processor of claim 9, further comprising 16-bit data paths that couple the cluster communication registers to the memory address generators, the memory address generators to the arbiter, and the arbiter to the local memory.
11. The image signal processor of claim 10, wherein the local memory includes static random access memory (SRAM).
12. A method comprising:
storing data in a local memory of an image signal processor;
generating a memory address to the local memory utilizing a memory address generator within the image signal processor; and
performing an operation on the data of the local memory located at the memory address utilizing the memory address generator to aid in image processing tasks.
13. The method of claim 12, further comprising:
storing data to be sent to the local memory in a shared memory of the image signal processor; and
storing commands in the shared memory to be performed on the data in the local memory.
14. The method of claim 13, wherein the shared memory comprises a plurality of cluster communication registers.
15. The method of claim 14, wherein the plurality of cluster communication registers include data cluster communication registers to store data and command cluster communication registers to store commands.
16. The method of claim 15, further comprising assigning a pair of cluster communication registers to one of a plurality of memory address generators, each memory address generator to generate a memory address to the local memory within the image signal processor and to perform an operation on the data of the local memory located at the memory address to aid in image processing tasks.
17. The method of claim 16, further comprising arbitrating access to the local memory by the plurality of memory address generators.
18. The method of claim 14, wherein the plurality of cluster communication registers are at least 16-bit registers.
19. The image processor of claim 18, wherein 16-bit data paths couple the cluster communication registers to the memory address generators and the memory address generators to the local memory.
20. A machine-readable medium having stored thereon instructions, which when executed by a machine, cause the machine to perform the following operations comprising:
storing data in a local memory of an image signal processor;
generating a memory address to the local memory utilizing a memory address generator within the image signal processor; and
performing an operation on the data of the local memory located at the memory address utilizing the memory address generator to aid in image processing tasks.
21. The machine-readable medium of claim 20, further comprising:
storing data to be sent to the local memory in a shared memory of the image signal processor; and
storing commands in the shared memory to be performed on the data in the local memory.
22. The machine-readable medium of claim 21, wherein the shared memory comprises a plurality of cluster communication registers.
23. The machine-readable medium of claim 22, wherein the plurality of cluster communication registers include data cluster communication registers to store data and command cluster communication registers to store commands.
24. The machine-readable medium of claim 23, further comprising assigning a pair of cluster communication registers to one of a plurality of memory address generators, each memory address generator to generate a memory address to the local memory within the image signal processor and to perform an operation on the data of the local memory located at the memory address to aid in image processing tasks.
25. The machine-readable medium of claim 24, further comprising arbitrating access to the local memory by the plurality of memory address generators.
26. The machine-readable medium of claim 22, wherein the plurality of cluster communication registers are at least 16-bit registers.
27. The machine-readable medium of claim 26, wherein 16-bit data paths couple the cluster communication registers to the memory address generators and the memory address generators to the local memory.
28. An image processor system comprising:
a processor coupled to an image processor; and
a double data rate synchronous dynamic random access memory (DDR SDRAM) coupled to the image processor, the image processor including a plurality of image signal processors coupled to one another, each image signal processor including:
a local memory to store data, and
a memory command handler including a plurality of memory address generators, each memory address generator to generate a memory address to the local memory and to interpret a command to be performed on the data of the local memory located at the memory address to aid in image processing tasks.
29. The image processor system of claim 28, further comprising a shared memory coupled to the plurality of the memory address generators, the shared memory storing data to be sent to the local memory and commands to be performed by the memory address generators.
30. The image processor system of claim 29, wherein the shared memory comprises a plurality of cluster communication registers.
31. The image processor system of claim 30, further comprising a cluster communication register interface to couple the plurality of cluster communication registers to the plurality of memory address generators.
32. The image processor system of claim 30, wherein the plurality of cluster communication registers include data cluster communication registers to store data and command cluster communication registers to store commands.
33. The image processor system of claim 32, wherein a pair of cluster communication registers are assigned to each memory address generator.
34. The image processor system of claim 33, wherein each pair of cluster communication registers includes a data cluster communication register and a command cluster communication register.
35. The image processor system of claim 30, further comprising an arbiter to arbitrate access to the local memory by the memory address generators.
36. The image processor system of claim 35, wherein the plurality of cluster communication registers are at least 16-bit registers.
37. The image processor system of claim 36, further comprising 16-bit data paths that couple the cluster communication registers to the memory address generators, the memory address generators to the arbiter, and the arbiter to the local memory.
38. The image processor system of claim 37, wherein the local memory includes static random access memory (SRAM).
US10/609,042 2003-06-27 2003-06-27 Memory command handler for use in an image signal processor having a data driven architecture Expired - Fee Related US7088371B2 (en)

Priority Applications (9)

Application Number Priority Date Filing Date Title
US10/609,042 US7088371B2 (en) 2003-06-27 2003-06-27 Memory command handler for use in an image signal processor having a data driven architecture
MYPI20041954A MY137269A (en) 2003-06-27 2004-05-21 Memory command handler for use in an image signal processor having a data driven architecture
KR1020057024970A KR100818819B1 (en) 2003-06-27 2004-06-09 Memory command handler for use in an image signal processor having a data driven architecture
PCT/US2004/018291 WO2005006207A2 (en) 2003-06-27 2004-06-09 Memory command handler for use in an image signal processor having a data driven architecture
JP2006515360A JP4344383B2 (en) 2003-06-27 2004-06-09 Memory command handler for use in an image signal processor having a data driven architecture
EP04754790A EP1639495B1 (en) 2003-06-27 2004-06-09 Memory command handler for use in an image signal processor having a data driven architecture
AT04754790T ATE470189T1 (en) 2003-06-27 2004-06-09 MEMORY COMMAND HANDLER FOR USE IN AN IMAGE SIGNAL PROCESSOR HAVING A DATA-DRIVEN ARCHITECTURE
DE602004027493T DE602004027493D1 (en) 2003-06-27 2004-06-09 MEMORY COMMANDER FOR USE IN A PICTURE SIGNAL PROCESSOR WITH A DATA-BASED ARCHITECTURE
TW093117042A TWI269168B (en) 2003-06-27 2004-06-14 Image signal processor, method of processing image signals, machine-readable medium having stored thereon instructions and image processor system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US10/609,042 US7088371B2 (en) 2003-06-27 2003-06-27 Memory command handler for use in an image signal processor having a data driven architecture

Publications (2)

Publication Number Publication Date
US20040263524A1 true US20040263524A1 (en) 2004-12-30
US7088371B2 US7088371B2 (en) 2006-08-08

Family

ID=33540745

Family Applications (1)

Application Number Title Priority Date Filing Date
US10/609,042 Expired - Fee Related US7088371B2 (en) 2003-06-27 2003-06-27 Memory command handler for use in an image signal processor having a data driven architecture

Country Status (9)

Country Link
US (1) US7088371B2 (en)
EP (1) EP1639495B1 (en)
JP (1) JP4344383B2 (en)
KR (1) KR100818819B1 (en)
AT (1) ATE470189T1 (en)
DE (1) DE602004027493D1 (en)
MY (1) MY137269A (en)
TW (1) TWI269168B (en)
WO (1) WO2005006207A2 (en)

Cited By (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070217670A1 (en) * 2006-03-02 2007-09-20 Michael Bar-Am On-train rail track monitoring system
WO2013100783A1 (en) * 2011-12-29 2013-07-04 Intel Corporation Method and system for control signalling in a data path module
US9262704B1 (en) * 2015-03-04 2016-02-16 Xerox Corporation Rendering images to lower bits per pixel formats using reduced numbers of registers
US20160314546A1 (en) * 2015-04-27 2016-10-27 First Advantage Corporation Device and method for performing validation and authentication of a physical structure or physical object
US10331583B2 (en) 2013-09-26 2019-06-25 Intel Corporation Executing distributed memory operations using processing elements connected by distributed channels
US10380063B2 (en) 2017-09-30 2019-08-13 Intel Corporation Processors, methods, and systems with a configurable spatial accelerator having a sequencer dataflow operator
US10387319B2 (en) 2017-07-01 2019-08-20 Intel Corporation Processors, methods, and systems for a configurable spatial accelerator with memory system performance, power reduction, and atomics support features
US10402168B2 (en) 2016-10-01 2019-09-03 Intel Corporation Low energy consumption mantissa multiplication for floating point multiply-add operations
US10417175B2 (en) 2017-12-30 2019-09-17 Intel Corporation Apparatus, methods, and systems for memory consistency in a configurable spatial accelerator
US10416999B2 (en) 2016-12-30 2019-09-17 Intel Corporation Processors, methods, and systems with a configurable spatial accelerator
US10445250B2 (en) 2017-12-30 2019-10-15 Intel Corporation Apparatus, methods, and systems with a configurable spatial accelerator
US10445098B2 (en) 2017-09-30 2019-10-15 Intel Corporation Processors and methods for privileged configuration in a spatial array
US10445451B2 (en) 2017-07-01 2019-10-15 Intel Corporation Processors, methods, and systems for a configurable spatial accelerator with performance, correctness, and power reduction features
US10445234B2 (en) 2017-07-01 2019-10-15 Intel Corporation Processors, methods, and systems for a configurable spatial accelerator with transactional and replay features
US10459866B1 (en) 2018-06-30 2019-10-29 Intel Corporation Apparatuses, methods, and systems for integrated control and data processing in a configurable spatial accelerator
US10467183B2 (en) 2017-07-01 2019-11-05 Intel Corporation Processors and methods for pipelined runtime services in a spatial array
US10469397B2 (en) 2017-07-01 2019-11-05 Intel Corporation Processors and methods with configurable network-based dataflow operator circuits
US10474375B2 (en) 2016-12-30 2019-11-12 Intel Corporation Runtime address disambiguation in acceleration hardware
US10496574B2 (en) 2017-09-28 2019-12-03 Intel Corporation Processors, methods, and systems for a memory fence in a configurable spatial accelerator
US10515046B2 (en) 2017-07-01 2019-12-24 Intel Corporation Processors, methods, and systems with a configurable spatial accelerator
US10515049B1 (en) 2017-07-01 2019-12-24 Intel Corporation Memory circuits and methods for distributed memory hazard detection and error recovery
US10558575B2 (en) 2016-12-30 2020-02-11 Intel Corporation Processors, methods, and systems with a configurable spatial accelerator
US10564980B2 (en) 2018-04-03 2020-02-18 Intel Corporation Apparatus, methods, and systems for conditional queues in a configurable spatial accelerator
US10565134B2 (en) 2017-12-30 2020-02-18 Intel Corporation Apparatus, methods, and systems for multicast in a configurable spatial accelerator
US10572376B2 (en) 2016-12-30 2020-02-25 Intel Corporation Memory ordering in acceleration hardware
US10678724B1 (en) 2018-12-29 2020-06-09 Intel Corporation Apparatuses, methods, and systems for in-network storage in a configurable spatial accelerator
US10817291B2 (en) 2019-03-30 2020-10-27 Intel Corporation Apparatuses, methods, and systems for swizzle operations in a configurable spatial accelerator
US10853073B2 (en) 2018-06-30 2020-12-01 Intel Corporation Apparatuses, methods, and systems for conditional operations in a configurable spatial accelerator
US10891240B2 (en) 2018-06-30 2021-01-12 Intel Corporation Apparatus, methods, and systems for low latency communication in a configurable spatial accelerator
US10915471B2 (en) 2019-03-30 2021-02-09 Intel Corporation Apparatuses, methods, and systems for memory interface circuit allocation in a configurable spatial accelerator
US10965536B2 (en) 2019-03-30 2021-03-30 Intel Corporation Methods and apparatus to insert buffers in a dataflow graph
US11029927B2 (en) 2019-03-30 2021-06-08 Intel Corporation Methods and apparatus to detect and annotate backedges in a dataflow graph
US11037050B2 (en) 2019-06-29 2021-06-15 Intel Corporation Apparatuses, methods, and systems for memory interface circuit arbitration in a configurable spatial accelerator
US11086816B2 (en) 2017-09-28 2021-08-10 Intel Corporation Processors, methods, and systems for debugging a configurable spatial accelerator
US11200186B2 (en) 2018-06-30 2021-12-14 Intel Corporation Apparatuses, methods, and systems for operations in a configurable spatial accelerator
US11307873B2 (en) 2018-04-03 2022-04-19 Intel Corporation Apparatus, methods, and systems for unstructured data flow in a configurable spatial accelerator with predicate propagation and merging
US11907713B2 (en) 2019-12-28 2024-02-20 Intel Corporation Apparatuses, methods, and systems for fused operations using sign modification in a processing element of a configurable spatial accelerator

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8766995B2 (en) * 2006-04-26 2014-07-01 Qualcomm Incorporated Graphics system with configurable caches
US20070268289A1 (en) * 2006-05-16 2007-11-22 Chun Yu Graphics system with dynamic reposition of depth engine
US8884972B2 (en) 2006-05-25 2014-11-11 Qualcomm Incorporated Graphics processor with arithmetic and elementary function units
US8869147B2 (en) * 2006-05-31 2014-10-21 Qualcomm Incorporated Multi-threaded processor with deferred thread output control
US8644643B2 (en) 2006-06-14 2014-02-04 Qualcomm Incorporated Convolution filtering in a graphics processor
US8766996B2 (en) * 2006-06-21 2014-07-01 Qualcomm Incorporated Unified virtual addressed register file
US7529849B2 (en) * 2006-07-27 2009-05-05 International Business Machines Corporation Reduction of message flow between bus-connected consumers and producers
KR100867640B1 (en) * 2007-02-06 2008-11-10 삼성전자주식회사 System on chip including image processing memory with multiple access
JP2008249977A (en) * 2007-03-30 2008-10-16 Seiko Epson Corp Drawing circuit of electro-optical display device, drawing method of electro-optical display device, electro-optical display device and electronic equipment
US8260002B2 (en) * 2008-09-26 2012-09-04 Axis Ab Video analytics system, computer program product, and associated methodology for efficiently using SIMD operations
KR20100078193A (en) * 2008-12-30 2010-07-08 주식회사 동부하이텍 Slave and method for communicating between the slave and master
CA2825937A1 (en) 2011-01-28 2012-08-02 Eye IO, LLC Encoding of video stream based on scene type
US10506161B2 (en) 2017-10-26 2019-12-10 Qualcomm Incorporated Image signal processor data traffic management

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5790130A (en) * 1995-06-08 1998-08-04 Hewlett-Packard Company Texel cache interrupt daemon for virtual memory management of texture maps
US6370601B1 (en) * 1998-09-09 2002-04-09 Xilinx, Inc. Intelligent direct memory access controller providing controlwise and datawise intelligence for DMA transfers
US6421744B1 (en) * 1999-10-25 2002-07-16 Motorola, Inc. Direct memory access controller and method therefor
US20020156993A1 (en) * 2001-03-22 2002-10-24 Masakazu Suzuoki Processing modules for computer architecture for broadband networks
US6779098B2 (en) * 2001-01-24 2004-08-17 Renesas Technology Corp. Data processing device capable of reading and writing of double precision data in one cycle
US6785743B1 (en) * 2000-03-22 2004-08-31 University Of Washington Template data transfer coprocessor
US20040236877A1 (en) * 1997-12-17 2004-11-25 Lee A. Burton Switch/network adapter port incorporating shared memory resources selectively accessible by a direct execution logic element and one or more dense logic devices in a fully buffered dual in-line memory module format (FB-DIMM)
US20040250042A1 (en) * 2003-05-30 2004-12-09 Mehta Kalpesh Dhanvantrai Management of access to data from memory

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4384828B2 (en) 2001-11-22 2009-12-16 ユニヴァーシティ オブ ワシントン Coprocessor device and method for facilitating data transfer

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5790130A (en) * 1995-06-08 1998-08-04 Hewlett-Packard Company Texel cache interrupt daemon for virtual memory management of texture maps
US20040236877A1 (en) * 1997-12-17 2004-11-25 Lee A. Burton Switch/network adapter port incorporating shared memory resources selectively accessible by a direct execution logic element and one or more dense logic devices in a fully buffered dual in-line memory module format (FB-DIMM)
US6370601B1 (en) * 1998-09-09 2002-04-09 Xilinx, Inc. Intelligent direct memory access controller providing controlwise and datawise intelligence for DMA transfers
US6421744B1 (en) * 1999-10-25 2002-07-16 Motorola, Inc. Direct memory access controller and method therefor
US6785743B1 (en) * 2000-03-22 2004-08-31 University Of Washington Template data transfer coprocessor
US6779098B2 (en) * 2001-01-24 2004-08-17 Renesas Technology Corp. Data processing device capable of reading and writing of double precision data in one cycle
US20020156993A1 (en) * 2001-03-22 2002-10-24 Masakazu Suzuoki Processing modules for computer architecture for broadband networks
US20040250042A1 (en) * 2003-05-30 2004-12-09 Mehta Kalpesh Dhanvantrai Management of access to data from memory

Cited By (44)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070217670A1 (en) * 2006-03-02 2007-09-20 Michael Bar-Am On-train rail track monitoring system
US8942426B2 (en) * 2006-03-02 2015-01-27 Michael Bar-Am On-train rail track monitoring system
WO2013100783A1 (en) * 2011-12-29 2013-07-04 Intel Corporation Method and system for control signalling in a data path module
US10157060B2 (en) 2011-12-29 2018-12-18 Intel Corporation Method, device and system for control signaling in a data path module of a data stream processing engine
US10942737B2 (en) 2011-12-29 2021-03-09 Intel Corporation Method, device and system for control signalling in a data path module of a data stream processing engine
US10331583B2 (en) 2013-09-26 2019-06-25 Intel Corporation Executing distributed memory operations using processing elements connected by distributed channels
US10853276B2 (en) 2013-09-26 2020-12-01 Intel Corporation Executing distributed memory operations using processing elements connected by distributed channels
US9262704B1 (en) * 2015-03-04 2016-02-16 Xerox Corporation Rendering images to lower bits per pixel formats using reduced numbers of registers
US20160314546A1 (en) * 2015-04-27 2016-10-27 First Advantage Corporation Device and method for performing validation and authentication of a physical structure or physical object
US11562448B2 (en) * 2015-04-27 2023-01-24 First Advantage Corporation Device and method for performing validation and authentication of a physical structure or physical object
US10402168B2 (en) 2016-10-01 2019-09-03 Intel Corporation Low energy consumption mantissa multiplication for floating point multiply-add operations
US10416999B2 (en) 2016-12-30 2019-09-17 Intel Corporation Processors, methods, and systems with a configurable spatial accelerator
US10572376B2 (en) 2016-12-30 2020-02-25 Intel Corporation Memory ordering in acceleration hardware
US10558575B2 (en) 2016-12-30 2020-02-11 Intel Corporation Processors, methods, and systems with a configurable spatial accelerator
US10474375B2 (en) 2016-12-30 2019-11-12 Intel Corporation Runtime address disambiguation in acceleration hardware
US10469397B2 (en) 2017-07-01 2019-11-05 Intel Corporation Processors and methods with configurable network-based dataflow operator circuits
US10467183B2 (en) 2017-07-01 2019-11-05 Intel Corporation Processors and methods for pipelined runtime services in a spatial array
US10445234B2 (en) 2017-07-01 2019-10-15 Intel Corporation Processors, methods, and systems for a configurable spatial accelerator with transactional and replay features
US10445451B2 (en) 2017-07-01 2019-10-15 Intel Corporation Processors, methods, and systems for a configurable spatial accelerator with performance, correctness, and power reduction features
US10515046B2 (en) 2017-07-01 2019-12-24 Intel Corporation Processors, methods, and systems with a configurable spatial accelerator
US10515049B1 (en) 2017-07-01 2019-12-24 Intel Corporation Memory circuits and methods for distributed memory hazard detection and error recovery
US10387319B2 (en) 2017-07-01 2019-08-20 Intel Corporation Processors, methods, and systems for a configurable spatial accelerator with memory system performance, power reduction, and atomics support features
US10496574B2 (en) 2017-09-28 2019-12-03 Intel Corporation Processors, methods, and systems for a memory fence in a configurable spatial accelerator
US11086816B2 (en) 2017-09-28 2021-08-10 Intel Corporation Processors, methods, and systems for debugging a configurable spatial accelerator
US10380063B2 (en) 2017-09-30 2019-08-13 Intel Corporation Processors, methods, and systems with a configurable spatial accelerator having a sequencer dataflow operator
US10445098B2 (en) 2017-09-30 2019-10-15 Intel Corporation Processors and methods for privileged configuration in a spatial array
US10417175B2 (en) 2017-12-30 2019-09-17 Intel Corporation Apparatus, methods, and systems for memory consistency in a configurable spatial accelerator
US10445250B2 (en) 2017-12-30 2019-10-15 Intel Corporation Apparatus, methods, and systems with a configurable spatial accelerator
US10565134B2 (en) 2017-12-30 2020-02-18 Intel Corporation Apparatus, methods, and systems for multicast in a configurable spatial accelerator
US10564980B2 (en) 2018-04-03 2020-02-18 Intel Corporation Apparatus, methods, and systems for conditional queues in a configurable spatial accelerator
US11307873B2 (en) 2018-04-03 2022-04-19 Intel Corporation Apparatus, methods, and systems for unstructured data flow in a configurable spatial accelerator with predicate propagation and merging
US10853073B2 (en) 2018-06-30 2020-12-01 Intel Corporation Apparatuses, methods, and systems for conditional operations in a configurable spatial accelerator
US10891240B2 (en) 2018-06-30 2021-01-12 Intel Corporation Apparatus, methods, and systems for low latency communication in a configurable spatial accelerator
US11593295B2 (en) 2018-06-30 2023-02-28 Intel Corporation Apparatuses, methods, and systems for operations in a configurable spatial accelerator
US10459866B1 (en) 2018-06-30 2019-10-29 Intel Corporation Apparatuses, methods, and systems for integrated control and data processing in a configurable spatial accelerator
US11200186B2 (en) 2018-06-30 2021-12-14 Intel Corporation Apparatuses, methods, and systems for operations in a configurable spatial accelerator
US10678724B1 (en) 2018-12-29 2020-06-09 Intel Corporation Apparatuses, methods, and systems for in-network storage in a configurable spatial accelerator
US10817291B2 (en) 2019-03-30 2020-10-27 Intel Corporation Apparatuses, methods, and systems for swizzle operations in a configurable spatial accelerator
US11029927B2 (en) 2019-03-30 2021-06-08 Intel Corporation Methods and apparatus to detect and annotate backedges in a dataflow graph
US10965536B2 (en) 2019-03-30 2021-03-30 Intel Corporation Methods and apparatus to insert buffers in a dataflow graph
US10915471B2 (en) 2019-03-30 2021-02-09 Intel Corporation Apparatuses, methods, and systems for memory interface circuit allocation in a configurable spatial accelerator
US11693633B2 (en) 2019-03-30 2023-07-04 Intel Corporation Methods and apparatus to detect and annotate backedges in a dataflow graph
US11037050B2 (en) 2019-06-29 2021-06-15 Intel Corporation Apparatuses, methods, and systems for memory interface circuit arbitration in a configurable spatial accelerator
US11907713B2 (en) 2019-12-28 2024-02-20 Intel Corporation Apparatuses, methods, and systems for fused operations using sign modification in a processing element of a configurable spatial accelerator

Also Published As

Publication number Publication date
WO2005006207A2 (en) 2005-01-20
EP1639495A2 (en) 2006-03-29
ATE470189T1 (en) 2010-06-15
KR20060024444A (en) 2006-03-16
MY137269A (en) 2009-01-30
EP1639495B1 (en) 2010-06-02
DE602004027493D1 (en) 2010-07-15
US7088371B2 (en) 2006-08-08
TW200500859A (en) 2005-01-01
KR100818819B1 (en) 2008-04-02
TWI269168B (en) 2006-12-21
WO2005006207A3 (en) 2005-10-20
JP4344383B2 (en) 2009-10-14
JP2007520767A (en) 2007-07-26

Similar Documents

Publication Publication Date Title
US7088371B2 (en) Memory command handler for use in an image signal processor having a data driven architecture
US10776126B1 (en) Flexible hardware engines for handling operating on multidimensional vectors in a video processor
US6237079B1 (en) Coprocessor interface having pending instructions queue and clean-up queue and dynamically allocating memory
EP3161783B1 (en) Data distribution fabric in scalable gpus
US10754657B1 (en) Computer vision processing in hardware data paths
US8754893B2 (en) Apparatus and method for selectable hardware accelerators
US10694201B2 (en) Image processor, image processing system including image processor, system-on-chip including image processing system, and method of operating image processing system
KR100823379B1 (en) Data filtering apparatus, method, and system for image processing
US8711170B2 (en) Edge alphas for image translation
US6785800B1 (en) Single instruction stream multiple data stream processor
US6757430B2 (en) Image processing architecture
EP4198717A1 (en) Register file virtualization: applications and methods
US20220417542A1 (en) Image processing device, image processing system including image processing device, system-on-chip including image processing system, and method of operating image processing system
EP4195062A1 (en) Method and apparatus for separable convolution filter operations on matrix multiplication arrays
CN111209041B (en) Neural network processor, system on chip and electronic equipment
US8395630B2 (en) Format conversion apparatus from band interleave format to band separate format
JP3647078B2 (en) Processor
JP2006330871A (en) Signal processor
AU717168B2 (en) General image processor
AU717336B2 (en) Graphics processor architecture
JP2003280932A (en) Functional system, functional system management method, data processing device and computer program
JP2019074573A (en) Image processing apparatus
JP2011141791A (en) Parallel signal processor
JP2004165766A (en) Image processing method
Fujii et al. Design of multiprocessor DSP chip set for superhigh-definition image processing

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTEL CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:LIPPINCOTT, LOUIS A.;REEL/FRAME:014181/0106

Effective date: 20031204

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.)

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20180808