A close-up of SDK 3.1, Part 2: Building examples with make.footer The Cell/B.E. SDK 3.1 supports a pseudo "build environment" by including
a make.footer file that you can include in a makefile to help you build
examples and demonstrations. In this article, you can read about some of the
features and functions available in the make.footer file and how they are used
to construct the SDK examples. |
Debugging common DMA errors To access main storage, Cell Broadband Engine(TM) SPEs use direct memory
access commands (DMA), which transfer data between the main storage and their
private local memory. Although this organization of distributed storage
promotes high performance, it requires the SPE programmer to explicitly handle
the DMA transfers between main and local storage. Errors during these
transfers can be difficult to detect and debug. This article
provides techniques for handling common problems with SPE-initiated DMA
transfers. |
|
|
Programmability, Part 1: Exploring different approaches to programming for Cell/B.E. platforms The programming flexibility available for the Cell Broadband Engine(TM) is a
hot topic in the multicore community. This article discusses leveraging your existing
skills to program for Cell/B.E.(TM), offers three programming approaches for Cell/B.E.
systems, and introduces the various tools, software, and hardware available
for the platform. |
TechReview, Part 2: Program applications with the LAPACK library For application programmers using the IBM Software Development
Kit for Multicore Acceleration (SDK), this article explains how to program
with the IBM Linear Algebra Package (LAPACK) library using a sample application
designed to get an inverse matrix. The article also offers 4 pieces of advice on optimizing
LAPACK programs, and it outlines the package's optimized APIs. LAPACK is based
on a published standard interface for commonly used linear algebra
operations in high-performance computing and other scientific domains. |
Enabling applications, Part 1: Is your application ready for Cell/B.E.? Learn from the experts how to evaluate your application's
appropriateness for the Cell/B.E.(TM) platform from the standpoints of
performance and power needs, the opportunities that exist for parallelism,
whether the algorithms line up nicely, and whether your application has access
to a Cell/B.E.-enabled library. This article is Part 1 of a 3-part series from
the IBM Redbook(R) "Programming the Cell Broadband Engine: Examples and Best
Practices." [09/10/08 update: Made various changes based on updates since the
IBM Redbook was published.--Ed.] |
TechReview, Part 1: Discover the LAPACK library For application programmers using the IBM Software Development
Kit for Multicore Acceleration (SDK), this article explains the basic
structure of the IBM Linear Algebra Package (LAPACK) library. The LAPACK is based on a
published standard interface for commonly used linear algebra operations in
high performance computing and other scientific domains. |
A close-up on SDK 3.0, Part 1: Rebuilding code from src.rpm The Cell/B.E. SDK 3.0 includes several src.rpm packages that contain the
source code for some of the SDK libraries. This article describes
the steps needed to install the src.rpm, unpack the source into a
directory where it can be viewed and changed, and rebuild a new rpm. |
|
|
Core partners, Part 5: Increasing SPU performance with instruction scheduling The collection of processors in a Cell Broadband Engine(TM) (Cell/B.E.) processor
displays a DSP-like architecture. This means that the order in which the SPUs
execute the instructions can have a significant effect on performance. Without a good scheduling
mechanism in place, data dependencies can stall processor performance. In
this article, learn from a Cmpware expert how and why to use the Cmpware
CMP-DK Cell/B.E. SPU Scheduling Tool, which permits fast and easy analysis
of SPU code in an intuitive, graphical format. |
Fun with DaCS, Part 1: Using an error handler In this Cell Broadband Engine(TM) (Cell/B.E.) series, learn how to create and
register a user error handler for use with the Data Communication and
Synchronization library (DaCS). The "Data Communication and Synchronization Library for
Cell Broadband Engine Programmer?s Guide and API Reference, Version 3.0" (see Resources) is the source for the content.
|
Complex networking using Linux on Power blades Blades are an excellent choice for many applications and services,
especially in the telecommunications service provider industry. But the unique
requirements of these provider networks often require configurations that are
complex and need up-front focus and planning so all the stringent functional
requirements are met. In this article, learn how to plan and set up the
necessary network configurations for a POWER6 JS22 blade deployment. |
The little broadband engine that could: More on rendering fractals on the SPE In the previous article in the series, you learned about the challenges
of rendering fractals on the SPE. That article focused on the SPEs copying their rendering results directly into the
target data buffer. This article shows you how the fractal generator can be
optimized further by taking advantage of the SPE's fondness for vector
operations. |
Fun with ALF, Part 6: Using task dependency In this Cell Broadband Engine(TM) (Cell/B.E.) series, learn how to use the
Accelerated Library Framework (ALF) task dependency in a two-stage pipeline
application. The "ALF for Cell/B.E.
Programmer's Guide and API Reference, Version 3.0" (see Resources) is the source for the content. |
|
|
Cell/B.E. SDK: Code sample directory In this article, you'll find tables indicating the locations of code
samples that illustrate how to use the IBM SDK for Multicore Acceleration.
This article will be updated with new code samples. |
BladeCenter QS: Maximizing memory performance This article compares the CBEA processor memory access
model (with a focus on the IBM BladeCenter(R) QS21 and QS22) with that of general
purpose processors, providing programmer guidelines to ensure that
applications can be developed for maximum memory performance. This article also
describes how to use the Cell Performance Counter tool when
monitoring memory access activities for tuning and debugging memory
performance. |
Core partners, Part 4: Managing the PlayStation 3 Wi-Fi network Terra Soft Solutions IT Manager Aaron Johnson shows you, step-by-step, how to configure and encrypt the built-in Wi-Fi network that comes with the
Cell Broadband Engine(TM)-based Sony PlayStation 3. And, as a little bonus, get 16 quick
steps that explain how to switch from a wireless network back to a wired network on the PS3. |
The little broadband engine that could: Rendering fractals on the SPE In the previous article in the series, you learned some reasons why there
were no appreciable performance gains when you migrated the
fractal-rendering program from running on one SPE to running on multiple SPEs. This
article is going to illuminate the
challenge of rendering fractals on the SPE. The focus is on the SPEs copying their
rendering results directly into the target data buffer. |
Fun with ALF, Part 5: Using overlapped I/O buffers to add matrices In this Cell Broadband Engine(TM) (Cell/B.E.) series, learn how to use the
Accelerated Library Framework (ALF) overlapped input-output buffers to perform
matrix addition. The "ALF for Cell/B.E.
Programmer's Guide and API Reference, Version 3.0" (see Resources) is the source for the content. |
The little broadband engine that could: Looking at some DaCS performance fine-tuning issues In the previous article in the series, you migrated a fractal-rendering program from earlier in the
series to run using the DaCS data library with no appreciable performance gains when
going from running on one SPE to running on multiple SPEs. This article explores ways to optimize
performance. |
Fun with ALF, Part 4: Determining the dot product of large vectors In this Cell Broadband Engine(TM) (Cell/B.E.) series, learn how to use the
Accelerated Library Framework (ALF) bundled work block distribution and
the task context to manage situations in which the work block cannot hold the
partitioned data because of a local memory size limit. The "ALF for Cell/B.E.
Programmer's Guide and API Reference, Version 3.0" (see Resources) is the source for the content.
|
IBM BladeCenter QS21 hardware performance glossary Although there is extensive published data about the hardware performance
features of a single Cell Broadband Engine(TM) (Cell/B.E.) processor (and about the performance of a
multitude of applications ported to it), there is little on the specific hardware
performance features of the IBM BladeCenter(R) QS21 using a coherent SMP node of two
Cell/B.E processors as well as an elaborate IO subsystem. This glossary goes with
the article "Evaluating IBM BladeCenter QS21 hardware performance."
In that article, the
authors close the performance gap by providing information about basic latencies, throughputs,
and relative execution times for some key computational benchmark kernels, such as
Linpack and SPEC2000. The article also delivers a basic architectural overview of
the system. And, you can get tips on how to optimize application
performance. |
Evaluating IBM BladeCenter QS21 hardware performance Although there is extensive published data about the hardware performance
features of a single Cell Broadband Engine(TM) (Cell/B.E.) processor (and about the performance of a
multitude of applications ported to it), there is little on the specific hardware
performance features of the IBM BladeCenter(R) QS21 using a coherent SMP node of two
Cell/B.E processors as well as an elaborate IO subsystem. In this article, the
authors close that gap by providing information about basic latencies, throughputs,
and relative execution times for some key computational benchmark kernels, such as
Linpack and SPEC2000. The article also delivers a basic architectural overview of
the system. And, you can get tips on how to optimize application
performance. |
Fun with ALF, Part 3: Finding minimum and maximum values In this Cell Broadband Engine(TM) (Cell/B.E.) series, learn how to use the
Accelerated Library Framework (ALF) task context to keep the partial computing
results for each task instance and then combine them. The "ALF for Cell/B.E.
Programmer's Guide and API Reference, Version 3.0" (see Resources) is the source for the content. |
The little broadband engine that could: DaCS--flexible and complex In an earlier article in this series, the author introduced a fractal-generation
program built around the IDL interface that showcased the strength of IDL's
straightforward API. Executing the program was almost like calling a function and
getting results. In this article (and using the same basic program), the author
demonstrates the Data Communication and Synchronization library's (DaCS) greater
flexibility and the tradeoff: additional complexity. With DaCS, it's possible to pass the fractal pattern in as an initial argument,
then use buffers to pass data back and forth as they are processed. While this requires
more design work, but it might actually be more efficient. This article also shows that DaCS allows
for much more carefully tuned inputs and outputs. |
Cell/B.E. SDK 3.0 tools, Part 1: Using performance tools This introductory tutorial, designed as a companion for the IBM SDK for
Multicore Acceleration, Version 3.0 (otherwise known as the Cell Broadband
Engine(R) SDK), teaches you how to use five performance tools that reside in the SDK
3.0: OProfile, Cell Performance Counter, Performance Debugging Tool, the PDT Trace
Reader, and FDPR-Pro. The Visual Performance Analyzer, available separately, is also highlighted. |
Core partners, Part 3: Transforming Gedae-built portable apps This concise study examines the portability of
applications developed in Gedae by analyzing the work required to move an example
application from a simulation on a PC to actually running on a DSP board (the
Mercury Computer System AdapDev system) to running on a multicore Cell Broadband
Engine(TM) (Cell/B.E.). The article illustrates how architecture considerations were taken into account
when porting the application to each system. You can see the amount of work required to
port the application and the performance of the application on each system. |
Fun with ALF, Part 2: Converting I/O data In this Cell Broadband Engine(TM) (Cell/B.E.) series, learn how to use the
Accelerated Library Framework (ALF) task context buffer as a large lookup table to
convert the 16-bit input data to 8-bit output data. The "ALF for Cell/B.E.
Programmer's Guide and API Reference, Version 3.0" (see Resources) is the source for the content. |
Fun with ALF, Part 1: Adding large matrices together In this Cell Broadband Engine(TM) (Cell/B.E.) series, learn how to use the
Accelerated Library Framework (ALF) in the IBM SDK for Multicore Acceleration 3.0 to
add two large matrices together. There is one example for host data
partitioning and one for accelerator data partitioning. The "ALF for Cell/B.E.
Programmer's Guide and API Reference, Version 3.0" (see Resources) is the source for the content. |
The little broadband engine that could: IDL is dead--long live DaCS! In SDK 3.0, the Data Communication and Synchronization library (DaCS)
provides a sparkling substitute for IDL. DaCS is a set of services to aid the development
of applications and application frameworks in a heterogeneous multi-tiered system.
This article takes you on a tour of the DaCS process model and
explores general DaCS principles, including communication and memory access. |
The little broadband engine that could: Reviewing the newest little SDK that installs natively on PS3 Come along on a little train tour of the SDK for Multicore Acceleration 3.0
to see what's different for developers and how you can make good use of the SDK,
including native installation on PS3, support for FC7 and RHEL 5.1, enhanced compilers,
Fortran and Ada support, BLAS, ALF, and DaCS--oh my! |
Core partners, Part 2: Using DDT to clean up Cell/B.E. app bugs Allinea Software's Distributed Debugging Tool (DDT)
provides an easy-to-use, capable debugger that is able to debug complete Cell
Broadband Engine applications, including multiple threads within a single Cell/B.E.
processor and clusters of Cell/B.E. processors. |
Cell/B.E. container virtualization, Part 2: Implementation issues This three-part series illustrates a
hardware-resource-focused form of software virtualization known as container
virtualization (or operating system virtualization), demonstrated through the open
source project OpenVZ. The series provides a comprehensive overview of all the
components and techniques needed to virtualize the Cell/B.E. processor with software
methods. This second article of the series details the implementation of
dedicated virtualization and partitioning that was described in Part 1 of the series. |
Cell/B.E. container virtualization, Part 1: Concepts, architectures, and tools This three-part series illustrates a
hardware-resource-focused form of software virtualization known as container
virtualization (or operating system virtualization), demonstrated through the open
source project OpenVZ. The series provides a comprehensive overview of all the
components and techniques needed to virtualize the Cell/B.E. processor with software
methods. This first article of the series discusses the basic concepts
involved, illustrates the salient points of the OpenVZ and Cell/B.E. architectures
and how they work together, and describes some of the OpenVZ tools. |
Cell/B.E. SDK 3.0, Part 6: Use simulator consoles, use the ALF wizard, and set IDE preferences This introductory tutorial, designed for the IBM SDK for Multicore
Acceleration, Version 3.0 (otherwise known as the Cell Broadband Engine SDK),
explores the Cell/B.E. processor IDE and gives developers a click-for-click
walk-through of building a simple project in this environment. This tutorial is broken into six quick-perform parts dealing with creating an SPU
project, creating a PPU project, creating the Cell/B.E. simulator, configuring the application
launcher, debugging and doing performance analysis, using simulator consoles,
using the ALF wizard, and setting IDE preferences. |
Cell/B.E. SDK 3.0, Part 5: Debug and complete dynamic or static performance This introductory tutorial, designed for the IBM SDK for Multicore
Acceleration, Version 3.0 (otherwise known as the Cell Broadband Engine SDK),
explores the Cell/B.E. processor IDE and gives developers a click-for-click
walk-through of building a simple project in this environment. This tutorial is broken into six quick-perform parts dealing with creating an SPU
project, creating a PPU project, creating the Cell/B.E. simulator, configuring the application
launcher, debugging and doing performance analysis, using simulator consoles,
using the ALF wizard, and setting IDE preferences. |
Cell/B.E. SDK 3.0, Part 4: Configure the application launcher This introductory tutorial, designed for the IBM SDK for Multicore
Acceleration, Version 3.0 (otherwise known as the Cell Broadband Engine SDK),
explores the Cell/B.E. processor IDE and gives developers a click-for-click
walk-through of building a simple project in this environment. This tutorial is broken into six quick-perform parts dealing with creating an SPU
project, creating a PPU project, creating the Cell/B.E. simulator, configuring the application
launcher, debugging and doing performance analysis, using simulator consoles,
using the ALF wizard, and setting IDE preferences. |
Cell/B.E. SDK 3.0, Part 3: Create the Cell/B.E. simulator environment This introductory tutorial, designed for the IBM SDK for Multicore
Acceleration, Version 3.0 (otherwise known as the Cell Broadband Engine SDK),
explores the Cell/B.E. processor IDE and gives developers a click-for-click
walk-through of building a simple project in this environment. This tutorial is broken into six quick-perform parts dealing with creating an SPU
project, creating a PPU project, creating the Cell/B.E. simulator, configuring the application
launcher, debugging and doing performance analysis, using simulator consoles,
using the ALF wizard, and setting IDE preferences. |
Cell/B.E. SDK 3.0, Part 2: Create a PPU project This introductory tutorial, designed for the IBM SDK for Multicore
Acceleration, Version 3.0 (otherwise known as the Cell Broadband Engine SDK),
explores the Cell/B.E. processor IDE and gives developers a click-for-click
walk-through of building a simple project in this environment. This tutorial is broken into six quick-perform parts dealing with creating an SPU
project, creating a PPU project, creating the Cell/B.E. simulator, configuring the application
launcher, debugging and doing performance analysis, using simulator consoles,
using the ALF wizard, and setting IDE preferences. |
Cell/B.E. SDK 3.0, Part 1: Create an SPU project This introductory tutorial, designed for the IBM SDK for Multicore
Acceleration, Version 3.0 (otherwise known as the Cell Broadband Engine SDK),
explores the Cell/B.E. processor IDE and gives developers a click-for-click
walk-through of building a simple project in this environment. This tutorial is broken into six quick-perform parts dealing with creating an SPU
project, creating a PPU project, creating the Cell/B.E. simulator, configuring the application
launcher, debugging and doing performance analysis, using simulator consoles,
using the ALF wizard, and setting IDE preferences. |
Porting workshop, Part 7: Getting the most performance The seven quick-read parts of this "Porting workshop" series take
you on a real-world trip from strategy and planning through workload execution,
performance tweaking, optimization, and a solid conclusion. The series describes how to
most effectively port compute-intensive applications to the Cell Broadband Engine
platform. In part seven, the authors evaluate the performance data to date. |
Cell/B.E.
SDK: Understanding the terminology A quick-reference glossary of terms you might encounter when installing and
using the Cell Broadband Engine (Cell/B.E.) processor
SDK. |
Porting workshop, Part 6: Tying it all together The seven quick-read parts of this "Porting workshop" series take
you on a real-world trip from strategy and planning through workload execution,
performance tweaking, optimization, and a solid conclusion. The series describes how to
most effectively port compute-intensive applications to the Cell Broadband Engine
platform. In this Part 6, the authors provide a summary of what the series has
covered so far. |
Minimize recoding impact, Part 2: Removing obstacles to speedy performance The first article in the series describes how to do a basic port to the Cell Broadband Engine process. This
second article goes further in hammering out the details, including removing limitations
based on DMA-transfer size, partitioning the program across multiple SPEs, and
improving the program's speed even more. |
Porting workshop, Part 5: Mixed-precision workloads The seven quick-read parts of this "Porting workshop" series take
you on a real-world trip from strategy and planning through workload execution,
performance tweaking, optimization, and a solid conclusion. The series describes how to
most effectively port compute-intensive applications to the Cell Broadband Engine
platform. In this Part 5, the authors determine how to make mixed-precision
calculations work with the sample application. |
PS3 fab-to-lab, Part 2: Generating and analyzing signals How do you take the Cell Broadband Engine (Cell/B.E.) processor from an
off-the-shelf Sony PLAYSTATION 3 (PS3) and use it to construct a piece of
Linux(R)-based laboratory equipment (in essence, take the Cell/B.E. from fab to hab
to lab)? In this series, Lewin Edwards shows you how to go from game console to
simple audio-bandwidth spectrum analyzer and function generator. In this article,
the author shows you how to build on the infrastructure from Part 1 to make the
system into a fully operational, if primitive, spectrum analyzer. |
IBM Installation Toolkit: Loading Linux on POWER The IBM Installation Toolkit for Linux on POWER simplifies the installation of Linux on
virtualized and non-virtualized Power machines, gives you a bootable rescue DVD, and
provides the software needed to fully exploit the Power platform. Learn to use the
toolkit to install Red Hat Enterprise Linux and SUSE Linux Enterprise Server on
IBM System p and System
i5 machines. |
Porting workshop, Part 4: Mersenne-Twister The seven quick-read parts of this "Porting workshop" series take
you on a real-world trip from strategy and planning through workload execution,
performance tweaking, optimization, and a solid conclusion. The series describes how to
most effectively port compute-intensive applications to the Cell Broadband Engine
platform. In this Part 4, the authors explore the Mersenne-Twister random-number
generator to determine its effect. |
The little broadband engine that could: Use multiple SPEs for a single task Peter Seebach uses a simple, iterative-function fractal generator program to describe how to use multiple
Synergistic Processor Engines (SPEs) to vectorize a single task using the job queue model. |
Minimize recoding impact, Part 1: How to make an SPE and existing code work together Traditional porting requires identifying and abstracting out the
architecture-dependent code: making code endian-independent, working through minor
API differences, and including the appropriate header files and libraries. While
this procedure works for getting code to run on the Cell Broadband Engine
(Cell/B.E.) processor, to actually use the extra processing elements, you have to
put in extra work, including reworking the code and rethinking the build process. In
this series, learn to take advantage of the Synergistic
Processor Elements (SPEs) in existing code and only make a minimal impact to the existing code and build process. |
Porting workshop, Part 3: Initial performance results The seven, quick-read parts of this series, "Porting workshop," take you on a real-world trip from strategy and planning through workload execution through performance tweaking through optimization to a solid conclusion -- how to most effectively port compute-intensive applications to the Cell Broadband Engine platform. In part three, the authors run and review performance tests and data on the modified code. |
Porting workshop, Part 2: Original code analysis The seven, quick-read parts of this series, "Porting workshop," take you on a real-world trip from strategy and planning through workload execution through performance tweaking through optimization to a solid conclusion -- how to most effectively port compute-intensive applications to the Cell Broadband Engine platform. In part two, explore the original code with Linux profiling tools. |