Skip to main content

FPGA-based neural network accelerator outperforms GPUs

Xilinx Developer Forum: Claimed to be the highest performance convolutional neural network (CNN) on an fpga, Omnitek’s CNN is available now. The deep learning processing unit (DPU) is future-proofed, explained CEO Roger Fawcett, due to the programmability of the fpga.

It was demonstrated as a GoogLeNet Inception-v1 CNN, using eight-bit integer resolution. It achieved 16.8 terra operations per second (TOPS) and can inference over 5,300 images per second on a Xilinx Virtex UltraScale+ XCVU9P-3 fpga. The modular, scalable approach, makes it suitable for object detection and video processing applications at the edge and in the cloud, explained Fawcett, as well as for inference in data centres and intelligent cameras.


The DPU can be configured to provide optimal compute performance for neural network topologies in machine learning applications, using the parallel DSP architecture, distributed memory and reconfigurability of logic and connectivity for different algorithms.
The DPU achieves over 50% higher performance than any competing CNNs and out-performs GPUs for a given power or cost budget, claims the company. “The fpga is a world-beating platform and architecture, which is very flexible for future-proofing and can outperform GPUs in AI, with lower latency,” added Fawcett.
The company has also announced it is sponsoring a DPhil (PhD0 at Oxford University to study techniques for implementing deep learning acceleration on fpgas. The work will be in collaboration with Omnitek’s own research into AI compute engines and algorithms.
VPS Hosting

Comments

Popular posts from this blog

AXI

When part of a team, your group can become more capable than a single individual, but only if your team can work together and communicate effectively. Having members of a group talk over each other leads to nothing but a cacophony, and nothing gets done. For this reason protocols need to be established, such as letting others speak without interruption, or facing those you are addressing. The same is necessary with electronics, especially with system on chip (SoC) designs.

Introducing the AXI ProtocolThe protocol used by many SoC designers today is AXI, or Advanced eXtensible Interface, and is part of the Arm Advanced Microcontroller Bus Architecture (AMBA) specification. It is especially prevalent in Xilinx’s Zynq devices, providing the interface between the processing system and programmable logic sections of the chip.My first introduction with the interface was in a tutorial I was following that was to be implemented on Aldec’s own development board based off the Zynq XC7Z030, theT…

VECTOR (The good Robot)

vector is not a toy but rather a joyful,smart home robot..
A helpful character. Vector is happiest when he’s helping. He’s eager to accommodate your requests and answer your questions. He isn’t a fully grown robot butler capable of doing your taxes, buttering your bread, or writing a position paper on the future of robot/human relationships, but he’s a helpful little guy who puts his whole self into helping you out. That’s what we call Characterful Utility. TIMER Vector will set a timer and share in your joy when your muffins come out perfect, or when your laundry is finally done.

BLACKJACK He’s a robot. He’s a friend. He’s a blackjack dealer.


WEATHER Ask him any city’s weather and he’ll show you



TAKE A PHOTO Ask him to take a photo, say cheese, and wait while he snaps it. Can we call this a robot selfie?

XILINX ALVEO

Overview:

Acceleration Applications Alveo Data Center accelerator cards can deliver dramatic acceleration across a broad set of applications and are reconfigurable to provide an ideal fit for the changing workloads of the modern data center. Compare how Alveo Data Center accelerator cards perform compared to traditional CPU architectures.






Accelerator Cards That Fit Your Performance Needs