A novel SDK that opens up the world of FPGAs to today's developers Altera Technology Roadshow 2013 ## **Today's News** Altera today announces its SDK for OpenCL - OpenCL allows software developers to boost system performance by using an FPGA's massively parallel architecture - Increases designer productivity by raising the level of design abstraction ## **Performance Challenge** #### Performance Wanted #### Multimedia - HD Video Processing - Image processing #### Medical - Medical imaging - Bio informatics #### **Military** - Radar image processing - Persistent surveillance #### **High-Performance Computing** - Financial Modeling - Big data analytics - Scientific computing #### Performance Challenges **Single Core** **Multiple Cores** **100s of Cores** ## Modern Altera FPGA: Massively Parallel - >1M logic elements - >3.9 billion transistors - >50 Mb of integrated memory - Variable precision floating-point DSP blocks High-speed serial transceivers ## Altera Is Driving Silicon Convergence Need for Efficiency » Purple DSP - Software programmable - Great flexibility - Poor power efficiency #### **FPGAs** FPGA Combines the **Best**of All Four + FPGA - Hardware and software programmable - Great flexibility - Good power efficiency - = Microprocessor - + DSP - + Application-Specific IP - + Programmable Fabric #### **Application-Specific** « Need for **Flexibility** - Not programmable, hard wired - Inflexible - Great power efficiency - Many contain embedded processors ## **Improving FPGA Development Productivity** ## **OpenCL for Heterogeneous Solutions** - C-based language with extensions: - Standard C Language - Altera OpenCL C extensions (adds parallelism to C) - API (Open standard for different devices) OpenCL Programming model supports parallelism in heterogeneous systems ## Introducing Altera SDK for OpenCL ## **OpenCL Implementation on Altera FPGAs** Altera FPGA ## **Accelerating Performance with SoC FPGAs** ## Single-Chip OpenCL Solution: SoC = ARM + FPGA ## **Integration Enables:** - Higher bandwidth and lower latency between FPGA and processor - ->125Gbps Interconnect - Processor integration reduces system cost # OpenCL Example #1 – Faster Time-to-Market ## Video Camera Requiring Intense Video Processing - Proprietary video codec algorithms - Captures frames with different exposure levels → retains highlight and shadow details #### Let Customer Implement Code In An FPGA In <1 Week - Port C-code to OpenCL to FPGA implementation - C → HDL typically requires 3-6 months ## Saved Months of Development # OpenCL Example #2 – Higher Performance ## Financial Marketplace Monte-Carlo Black Scholes Simulations - Calculate the value of trading options w/ multiple sources or uncertainty - FPGA delivers higher performance at a fraction of the power | OpenCL<br>MCBS | Quad-core<br>µP | Comparable<br>GPU | Stratix IV<br>FPGA | 12.0<br>11.5<br>11.0 | | |------------------------|-----------------|-------------------|--------------------|----------------------|-----------------------------------------| | Number of Cores | 8 | 448 | N/A | Stock Price | | | Simulations per Second | 240M | 2100M | | 9.0 | | | Power (Watts) | 130W | 215W | 21W | 8.0 | 8 5 1 1 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 | >9X Higher Performance vs CPU Alone # OpenCL Example #3 – Power Efficiency ### **Documenting Search / Filtering Algorithm** - Review incoming stream (documents) and return best match - E.g. Monitors news feeds and recommends others - Power savings = Cost savings (huge issue for server farms) | Platform | Quad-core<br>µP | Comparable<br>GPU | Stratix IV<br>FPGA | To cope with the g reasonable run-<br>omplex heuristics, solution, they do<br>ed solution is the<br>e amount of vari-<br>oile to compile. In | Search Profile # Wt 4 50 71 | |--------------------------|-----------------|-------------------|--------------------|----------------------------------------------------------------------------------------------------------------------------------------------|------------------------------| | Number of Cores | 6 | 448 | N/A | # Freq | $\mathbf{x}$ | | Performance /Watt (MT/J) | 15.9 | 15.1 | | 1<br>4<br>6<br>41<br>68<br>71<br>90<br>Document<br>Representation | Score | ### >5X Performance/Watt vs. GPU ## **Benefits of Altera OpenCL for FPGA** #### Superior Design Productivity - Quick and easy evaluation of different solutions - Fast development / debug / optimization cycles - Faster time-to-market #### ✓ Higher Performance >9X greater performance vs CPU alone running a Monte-Carlo Black Scholes simulation ### ✓ Improved Power Savings >5X performance/Watt vs GPU-based heterogeneous systems running a document search algorithm ### ✓ Greater Portability Reuse across multiple platforms, multiple generations OpenCL