Building a Super Computer

Necessary tools are readily available see getting started guide "Building 7020_hdmi"
OH! hdl code for the 8Gbps link is available
All that is needed is hard work to add functionality and keep the rest of the system up to date (see parallella playground)
Adding hardware requires update to linux drivers

device tree updates are an output from the Xilinx tools
Parallella Yocto Build keeps fpga and linux drivers in sync

^{Parallella Yocto Build}

Parallella Yocto Build

PeterSaunderson 23rd August 2016 at 5:49pm

All the recipes to build a linux distribution in one place
https://github.com/peteasa/parallella-yoctobuild/wiki

The second major part of parallella playground
Off the self Yocto does not allow building Epiphany code

meta-exotic layer is a generic solution for this
can also choose to use official eSDK if necessary

Once built with Yocto the software packages are available for scheduled update

^meta-exotic

meta-exotic

PeterSaunderson 23rd August 2016 at 6:24pm

meta-exotic: Yocto layer to support cross compiler creation for exotic or foreign microcontrollers

Off the shelf Yocto has no support for foreign microcontrollers

the ARM cores are the main target of Yocto
the Epiphany chip does not run ARM code so is "foreign"
the compiler needed to build "foreign" code can't be used within Yocto

meta-exotic layer changes this (see meta-exotic/wiki)
A generic layer that could be used for other microcontrollers

^{Testing the System}

Testing the System

PeterSaunderson 23rd August 2016 at 8:21pm

https://wiki.yoctoproject.org/wiki/Regression_Test

Testing of Yocto releases is performed with Jenkins

this has not yet been implemented in parallella playground

A number of simple test applications have been included

hello-world - based on Adapteva epiphany-examples
mailbox - extending Adapteva epiphany-examples to use mailbox messaging
pal-hello-world - used with the pal libraries
pal-hello-team - used with the pal libraries

The tests can be built on the build machine or on the parallella

^{Future Work}

Future Work

PeterSaunderson 23rd August 2016 at 5:15pm

In no particular order:

Built in support for PAL libraries
Cluster management tool like SLURM
Update of meta-exotic for gcc 5.x tools and adding gdb

also prove with another processor (RISK-V)

North / south direct Epiphany connection
Updating the various repositories that make up https://github.com/peteasa/parallella/wiki takes time

Contributors or sponsors for this work are always welcome!

@paracpg #parapg on Twitter Peter on GitHub

^{The End}

The End

PeterSaunderson 23rd August 2016 at 4:48pm

Thank you

for reading

^Glossary
^References

Additional Material

PeterSaunderson 23rd August 2016 at 3:55pm

Architectures

Using Multiple Cores

SC.References

Online copy at https://peteasa.github.io/parapg/parapg.html

Architectures

PeterSaunderson 23rd August 2016 at 3:39pm

Taxonomy first published by Michael J Flynn (1972) now extended:

Single Instruction Single Data
SISD: simple single core computer
Multiple Instruction Single Data
MISD: fault tolerant system, or pipeline system
Single Instruction Multiple Data
SIMD: one microinstruction can operate at the same time on multiple data items
Multiple Instruction Multiple Data
MIMD: machines have a number of processors that function asynchronously and independently
Single Program Multiple Data
SPMD: multiple autonomous processors simultaneously execute the same program at independent points
Multiple Program Multiple Data
MPMD: server farms are a good example of a MPMD system

_{Single Instruction Single Data}

Multiple Instruction Multiple Data

PeterSaunderson 21st August 2016 at 3:44pm

MIMD: Multiple instruction streams, multiple data streams

Multiple instructions
- it is possible to get a lockout condition where two processors are compete for the same data resource
- one processor must delay to allow the other processor to finish its operation on the shared data
Multiple Data streams may share storage at some level in the system to enable cooperative execution of a multi task program
Symmetric Multiprocessors (SMP) and Massively Parallel Processors (MPP) are examples of MIMD architectures

_{Single Program Multiple Data}

Multiple Instruction Single Data

PeterSaunderson 21st August 2016 at 3:24pm

MISD: Multiple instruction streams, single data stream

Examples:

a pipeline where one data stream is accessed by a series of processors and each processor performs a different operation on the data stream
a redundant system where the two instruction streams by choice are chosen to be the same and processors then act on the same data and thus should in theory generate the same results

_{Single Instruction Multiple Data}

Multiple Program Multiple Data

PeterSaunderson 21st August 2016 at 3:30pm

MPMD: possibly unrelated work is split up into multiple programmes and run simultaneously on multiple processors with different input in order to obtain multiple result

server farms are a good example of a MPMD system

_{Using Multiple Cores}

Single Instruction Multiple Data

PeterSaunderson 21st August 2016 at 3:31pm

SIMD: Single Instruction Stream, Multiple Data Streams.

Multiple data streams may access the same data at some point in the system
A single instruction: so there is no lockout condition where two ALU try to access the same data
It is possible to implement an efficient data move instruction where data is moved from one memory location to another with one instruction

_{Single Instruction Multiple Data - 2}

Single Instruction Multiple Data - 2

PeterSaunderson 21st August 2016 at 1:19pm

Standardised in OpenMP 4.0
What usually requires a loop of instructions can be performed in one instruction

Examples:

 #pragma simd
 for (i=0; i<n; i++){
  a[i] = a[i] + b[i] + c[i] + d[i] + e[i];
 }

_{Multiple Instruction Multiple Data}

Single Instruction Single Data

PeterSaunderson 21st August 2016 at 3:31pm

SISD: Single Instruction Stream, Single Data Stream

The Harvard Architecture: separate instruction and data memory
The Von Neumann Architecture: a single combined data and instruction memory

_{Multiple Instruction Single Data}

Single Program Multiple Data

PeterSaunderson 22nd August 2016 at 4:59pm

SPMD: tasks are split up and run simultaneously on multiple processors with different input in order to obtain results faster.

standardized in MPI and OpenSHMEM
the most common style of parallel programming
different to SIMD that requires a Vector Processor to manipulate data streams
Shared memory could be used for message passing
SIMD and SPMD are not mutually exclusive

_{Multiple Program Multiple Data}

Using Multiple Cores

PeterSaunderson 23rd August 2016 at 6:58am

Complete Multi-Core languages

A Programming Language - array-oriented programming language that uses pictorial symbols for its language constructs
Cal Actor Language with Networking - is a dataflow language geared towards for example multimedia processing
Erlang Language - a programming language originally developed at the Ericsson Computer Science Laboratory

Augmented Language Constructs

Using Directive Pragma - using preprocessor instructions
Using Libraries - using libraries

^{A Programming Language}

A Programming Language

PeterSaunderson 21st August 2016 at 7:58pm

APL: array-oriented programming language that uses pictorial symbols for its language constructs

_{Cal Actor Language with Networking}

Bulk Synchronous Parallel

PeterSaunderson 21st August 2016 at 8:53pm

BSP: library to assist in running tasks in a parallel processing system

Relies on the processing units being tightly coupled

a network that routes messages between pairs of components
a facility (barrier) that allows for the synchronisation of all or a subset of components

Processing proceeds as follows

processors perform local computations making use of values stored in the local fast memory
asynchronous calculations proceed in parallel with put or get data calls to other processors (one sided)
the barrier is used to ensure that shared resources are handle atomically
the barrier causes all other processes to wait until they reach the same barrier avoiding deadlock

_{C++ Accelerated Massive Parallelism}

C++ Accelerated Massive Parallelism

PeterSaunderson 21st August 2016 at 10:05pm

C++AMP: a Microsoft library built on DirectX 11

hardware-agnostic interface for exploiting accelerator hardware
language extensions and an STL-like library component

_{Message Passing Interface}

Cal Actor Language with Networking

PeterSaunderson 21st August 2016 at 9:20pm

CAL: is a dataflow language geared towards for example multimedia processing

_{Erlang Language}

Erlang Language

PeterSaunderson 21st August 2016 at 8:01pm

Erlang: a programming language originally developed at the Ericsson Computer Science Laboratory

_{Using Directive Pragma}

Message Passing Interface

PeterSaunderson 23rd August 2016 at 9:22am

MPI: specification used in a distributed memory model

https://www.mpi-forum.org/

well-structured interface for message-passing between parallel systems
"industry standard" for writing message passing programs on HPC platforms
MPI is a specification for how a MPI conformant library should be used

_{Message Passing Interface 2}

Message Passing Interface 2

PeterSaunderson 22nd August 2016 at 2:34pm

Example API use:

#include "mpi.h"
..
   MPI_Status Stat;   // required variable for receive routines
   MPI_Init(&argc,&argv);
   MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
..
     MPI_Send(&outmsg, 1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);
     MPI_Recv(&inmsg, 1, MPI_CHAR, source, tag, MPI_COMM_WORLD, &Stat);

_{Open Computing Language}

Open Accelerators

PeterSaunderson 22nd August 2016 at 2:40pm

OpenACC: extends OpenMP for support of accelerators

Ref: http://www.openacc.org/

open standard started by CAPS, Cray, NVIDIA, and PGI

Examples of the pragma used are:

#pragma acc parallel
#pragma acc kernels

_{Open Hybrid Multicore Parallel Programming}

Open Computing Language

PeterSaunderson 21st August 2016 at 10:38pm

OpenCL: using task-based and data-based parallelism

https://www.khronos.org/opencl/

Rich interface specification that enables remote tasks to be managed
Message passing used to send and receive data

_{Open Computing Language 2}

Open Computing Language 2

PeterSaunderson 22nd August 2016 at 2:36pm

Example data transfer

// Create buffers on host and device
size_t size = 100000 * sizeof(int);
int* h_buffer = (int*)malloc(size);
cl_mem d_buffer = clCreateBuffer(context, CL_MEM_READ_WRITE, size, NULL, NULL);
...
// Write to buffer object from host memory
clEnqueueWriteBuffer(cmd_queue, d_buffer, CL_FALSE, 0, size, h_buffer, 0, NULL, NULL);
...
// Read from buffer object to host memory
clEnqueueReadBuffer(cmd_queue, d_buffer, CL_TRUE, 0, size, h_buffer, 0, NULL, NULL);

_{Parallel Architectures Library}

Open Hybrid Multicore Parallel Programming

PeterSaunderson 22nd August 2016 at 2:38pm

OpenHMPP: provides extensions for hardware accelerators

Hard to find specifications
Created codelet's for NVIDIA CUDA

Example of the pragma used to create codelet:

 #pragma hmpp simple1 codelet, args[outv].io=inout, target=CUDA

pragma used to run the codelet:

#pragma hmpp simple1 callsite, args[outv].size={n}

_{Open Multi Processing}

Open Multi Processing

PeterSaunderson 23rd August 2016 at 9:23am

OpenMP: shared-memory parallel programming interface

http://openmp.org/

Simple and easy to demonstrate
Built in to most compilers (gcc -fopenmp)
Build with / without OpenMP requires stubs or pre-processor switch in code

^{Open Multi Processing 2}

Parallel Architectures Library

PeterSaunderson 23rd August 2016 at 1:30pm

PAL: library for coprocessors and parallel machines

https://github.com/parallella/oh

Think of it as a POSIX library for coprocessors and parallel machines
Plan to implement maths, fft, dsp libraries

redesign c libraries from bottom up to support parallel architectures
functions such as load, open, close, read, write would be used to implement OpenMP, MPI, and OpenCL

Fine grained nature of the PAL libraries required for Epiphany-III will likely scale easily to larger more sophisticated processor architectures

_{Parallel Architectures Library 2}

Parallel Architectures Library 2

PeterSaunderson 22nd August 2016 at 2:33pm

Example host application

#include "pal_base.h"
...
    team0 = p_open(dev0, 0, NUMBEROFCORES);            // open a team
    results_mem = p_map(dev0, MEM_RESULTS, SIZERESULTS);
    err = p_run(prog0, func, team0, 0, NUMBEROFCORES, 0, NULL, 0);
    size = p_read(&results_mem, results, 0, sizeof(*results) * NUMBEROFCORES, 0);

Example device library calls

#include "pal_math.h"
...
    rank = p_team_rank(P_TEAM_DEFAULT);
...
    for (i = 0; i < n; i++)
        out[i] = p_sqrt(ai[i]);

_{Building the System}

Using Directive Pragma

PeterSaunderson 21st August 2016 at 8:02pm

The compiler preprocessor is given additional instructions to identify parts of the software that can be run in parallel

Open Accelerators - extends OpenMP for support of accelerators
Open Hybrid Multicore Parallel Programming - provides extensions for hardware accelerators
Open Multi Processing - shared-memory parallel programming interface

^{Open Accelerators}

Using Libraries

PeterSaunderson 21st August 2016 at 8:59pm

Additional software libraries are provided that enable the application code to run in parallel

Bulk Synchronous Parallel - library to assist in running tasks in a parallel processing system
C++ Accelerated Massive Parallelism - a Microsoft library built on DirectX 11
Message Passing Interface - specification used in a distributed memory model
Open Computing Language - using task-based and data-based parallelism
Parallel Architectures Library - library for coprocessors and parallel machines

^{Bulk Synchronous Parallel}

Open Multi Processing 2

PeterSaunderson 23rd August 2016 at 3:14pm

Example (http://sc14.supercomputing.org/program/tutorials.html):

#include <omp.h>
...
    #pragma omp parallel

#ifdef _OPENMP
    printf("Hello, world from thread %d\n", omp_get_thread_num());
#else
    printf("Hello, world!\n");
#endif

^{Using Libraries}

Glossary

PeterSaunderson 23rd August 2016 at 3:56pm

APL: A Programming Language: is a high level, concise, array-oriented programming language that uses pictorial symbols for its language constructs also known as an Array Manipulation Language
BSP: Bulk Synchronous Parallel: library to assist in running tasks in a parallel processing system
C++AMP: C++ Accelerated Massive Parallelism: a Microsoft library built on DirectX 11 used to exploit accelerator hardware
CAL: Cal Actor Language with Networking: a dataflow language geared towards for example multimedia processing
Erlang: Erlang Language: is a programming language for building robust fault-tolerant distributed applications
MIMD: Multiple Instruction Multiple Data: machines have a number of processors that function asynchronously and independently
MISD: Multiple Instruction Single Data: fault tolerant system, or pipeline system
MPI: Message Passing Interface: specification used in a distributed memory model
MPMD: Multiple Program Multiple Data: server farms are a good example of a MPMD system
OpenACC: Open Accelerators: extends OpenMP for support of accelerators
OpenCL: Open Computing Language: one of the most dominant libraries for parallel computing using task-based and data-based parallelism
OpenHMPP: Open Hybrid Multicore Parallel Programming: provides extensions for hardware accelerators. "What OpenMP is for multi-threaded programming"
OpenMP: Open Multi Processing: shared-memory parallel programming interface
OpenSHMEM: Open Symmetric Hierarchical Memory access: parallel programming libraries
PAL: Parallel Architectures Library: library for coprocessors and parallel machines
SHMEM: Symmetric Hierarchical Memory access: parallel programming libraries
SIMD: Single Instruction Multiple Data: one microinstruction can operate at the same time on multiple data items
SISD: Single Instruction Single Data: simple single core computer
SLURM: Simple Linux Utility for Resource Management: a fault-tolerant, and highly scalable cluster management and job scheduling system
SPMD: Single Program Multiple Data: multiple autonomous processors simultaneously execute the same program at independent points
Vector Processor: CPU with instructions that act on one dimensional arrays of data (vectors)

APL

PeterSaunderson 21st August 2016 at 7:18pm

A Programming Language: is a high level, concise, array-oriented programming language that uses pictorial symbols for its language constructs also known as an Array Manipulation Language

Back to:

BSP

PeterSaunderson 21st August 2016 at 8:11pm

Bulk Synchronous Parallel: library to assist in running tasks in a parallel processing system

Back to:

C++AMP

PeterSaunderson 21st August 2016 at 8:11pm

C++ Accelerated Massive Parallelism: a Microsoft library built on DirectX 11 used to exploit accelerator hardware

Back to:

CAL

PeterSaunderson 21st August 2016 at 8:13pm

Cal Actor Language with Networking: a dataflow language geared towards for example multimedia processing

Back to:

Erlang

PeterSaunderson 21st August 2016 at 6:39pm

Erlang Language: is a programming language for building robust fault-tolerant distributed applications

Back to:

MIMD

PeterSaunderson 21st August 2016 at 1:52pm

Multiple Instruction Multiple Data: machines have a number of processors that function asynchronously and independently

Back to:

MISD

PeterSaunderson 21st August 2016 at 1:53pm

Multiple Instruction Single Data: fault tolerant system, or pipeline system

Back to:

MPI

PeterSaunderson 21st August 2016 at 8:14pm

Message Passing Interface: specification used in a distributed memory model

Back to:

MPMD

PeterSaunderson 21st August 2016 at 1:52pm

Multiple Program Multiple Data: server farms are a good example of a MPMD system

Back to:

OpenACC

PeterSaunderson 22nd August 2016 at 8:45pm

Open Accelerators: extends OpenMP for support of accelerators

Back to:

OpenCL

PeterSaunderson 21st August 2016 at 10:30pm

Open Computing Language: one of the most dominant libraries for parallel computing using task-based and data-based parallelism

Back to:

OpenHMPP

PeterSaunderson 21st August 2016 at 8:16pm

Open Hybrid Multicore Parallel Programming: provides extensions for hardware accelerators. "What OpenMP is for multi-threaded programming"

Back to:

OpenMP

PeterSaunderson 21st August 2016 at 9:03pm

Open Multi Processing: shared-memory parallel programming interface

Back to:

OpenSHMEM

PeterSaunderson 22nd August 2016 at 3:48pm

Open Symmetric Hierarchical Memory access: parallel programming libraries

Back to:

PAL

PeterSaunderson 21st August 2016 at 9:02pm

Parallel Architectures Library: library for coprocessors and parallel machines

Back to:

SHMEM

PeterSaunderson 22nd August 2016 at 3:48pm

Symmetric Hierarchical Memory access: parallel programming libraries

Back to:

SIMD

PeterSaunderson 21st August 2016 at 1:52pm

Single Instruction Multiple Data: one microinstruction can operate at the same time on multiple data items

Back to:

SISD

PeterSaunderson 21st August 2016 at 1:52pm

Single Instruction Single Data: simple single core computer

Back to:

SLURM

PeterSaunderson 22nd August 2016 at 3:08pm

Simple Linux Utility for Resource Management: a fault-tolerant, and highly scalable cluster management and job scheduling system

Back to:

SPMD

PeterSaunderson 21st August 2016 at 1:53pm

Single Program Multiple Data: multiple autonomous processors simultaneously execute the same program at independent points

Back to:

Vector Processor

PeterSaunderson 22nd August 2016 at 5:00pm

CPU with instructions that act on one dimensional arrays of data (vectors)

Back to:

Andreas Olofsson

PeterSaunderson 22nd August 2016 at 4:31pm

Andreas Olofsson, Tomas Nordström, Zain Ul-Abdin "Kickstarting high-performance energy-efficient manycore architectures with Epiphany" 2014 48th Asilomar Conference on Signals, Systems and Computers

Back to:

David A. Richie

PeterSaunderson 22nd August 2016 at 4:31pm

David A. Richie and James A. Ross "OpenCL + OpenSHMEM Hybrid Programming Model for the Adapteva Epiphany Architecture", OpenSHMEM 2016, Third workshop on OpenSHMEM.

Back to:

Elias Kouskoumvekakis

PeterSaunderson 23rd August 2016 at 5:12pm

"RISC-V port to Parallella", Google Summer of Code 2016

Back to:

Building Open Hardware fpga

M. Mitchell Waldrop

PeterSaunderson 22nd August 2016 at 4:10pm

article: "The chips are down for Moore's law", Nature weekly journal of science, 2016

Back to:

Michael J Flynn

PeterSaunderson 21st August 2016 at 12:20pm

"Some Computer Organizations and Their Effectiveness", IEEE Transactions on Computers. Vol. c-21, No.9, September 1972

Back to:

Simon McIntosh-Smith

PeterSaunderson 22nd August 2016 at 4:11pm

slides: "It's the end of the world as we know it ...", University of Bristol, HPC Research Group, 2015

Back to:

wikipedia

PeterSaunderson 22nd August 2016 at 4:28pm

various articles

Back to:

SC.References

PeterSaunderson 21st August 2016 at 12:13pm

Andreas Olofsson: Andreas Olofsson, Tomas Nordström, Zain Ul-Abdin "Kickstarting high-performance energy-efficient manycore architectures with Epiphany" 2014 48th Asilomar Conference on Signals, Systems and Computers
David A. Richie: David A. Richie and James A. Ross "OpenCL + OpenSHMEM Hybrid Programming Model for the Adapteva Epiphany Architecture", OpenSHMEM 2016, Third workshop on OpenSHMEM.
Elias Kouskoumvekakis: "RISC-V port to Parallella", Google Summer of Code 2016
M. Mitchell Waldrop: article: "The chips are down for Moore's law", Nature weekly journal of science, 2016
Michael J Flynn: "Some Computer Organizations and Their Effectiveness", IEEE Transactions on Computers. Vol. c-21, No.9, September 1972
Simon McIntosh-Smith: slides: "It's the end of the world as we know it ...", University of Bristol, HPC Research Group, 2015
wikipedia: various articles

Contents

Proof Of Concept

Changing Environment

New Software Stack

Building the System

System Requirements

Building Open Hardware fpga

Parallella Yocto Build

meta-exotic

Testing the System

Future Work

The End

Additional Material

Architectures

Multiple Instruction Multiple Data

Multiple Instruction Single Data

Multiple Program Multiple Data

Single Instruction Multiple Data

Single Instruction Multiple Data - 2

Single Instruction Single Data

Single Program Multiple Data

Using Multiple Cores

A Programming Language

Bulk Synchronous Parallel

C++ Accelerated Massive Parallelism

Cal Actor Language with Networking

Erlang Language

Message Passing Interface

Message Passing Interface 2

Open Accelerators

Open Computing Language

Open Computing Language 2

Open Hybrid Multicore Parallel Programming

Open Multi Processing

Parallel Architectures Library

Parallel Architectures Library 2

Using Directive Pragma

Using Libraries

Open Multi Processing 2

Glossary

APL

BSP

C++AMP

CAL

Erlang

MIMD

MISD

MPI

MPMD

OpenACC

OpenCL

OpenHMPP

OpenMP

OpenSHMEM

PAL

SHMEM

SIMD

SISD

SLURM

SPMD

Vector Processor

Andreas Olofsson

David A. Richie

Elias Kouskoumvekakis

M. Mitchell Waldrop

Michael J Flynn

Simon McIntosh-Smith

wikipedia

SC.References