Having both written a thesis at Bangor University, North wales on parallel processing back in the mid 80s and also developed a few production systems for the sonar industry using up to 40 of them in a pipelined architecture, I thought it would be good to remember a star of the 80s – the transputer.

Contents

  • Introduction
  • Transputer Chips
  • Transputer Links
  • Array Structures
  • OCCAM
  • Links of interest

Introduction

Introduced in the mid 1980s in the United Kingdom by a company called INMOS (which became SGS Thomson, then STMicroelectronics), the Transputer was a physical implementation of parallel programming method based on communicating sequential processes, or CSP. A new language called OCCAM was introduced which provided a number of simple parallel programming constructs that could be easily compiled to run on the Transputer, making the combination of the Transputer and OCCAM very easy to adopt and use in many embedded and super computing applications. The main attraction of the Transputer family was the ability to directly connect to other Transputers via a high speed serial link that was an integral part of the OCCAM programming language. Adopted by a number embedded image processing applications, the Transputer had a relatively short life as processors go with the last known iteration produced by ST Microelectronics, rebranded as the non-parallel version, ST20. The Transputer was a fun and novel architecture in its day and was able to take on a number of tasks in both commercial and military based applications that were previously implemented using either bit slice technology or custom processors. The architecture eventually died with the introduction of very high speed shared memory architectures that could move data back and forth between processors more quickly. However, with the introduction of high speed general purpose PCI express and Intel’s upcoming Quickpath technology, there is a resurgence in interest in serial communications channels between processors, the heart of the Transputer architecture.

Transputer Chips

 There were several Transputers in the family: the 16 bit processor based IMS T212, T222, T225 devices, the 32 bit IMS T414, T425 versions, and the 32 bit version with integrated floating point processor IMS T800, T801 and T805 devices. One application specific version, the M212 which included disk controller interfaces was also introduced at one point. What made the Transputer particularly unique in its day was the use of four integrated 10Mbps or 20Mbps serial links (or Transputer Links) that allowed an array or grid of Transputers to easily communicate with one another. In addition, a 32 link crossbar switch was offered, the C004, that provided static switching off connected Transputers to allow for a programmable array topology.

The Transputer processor itself was either a 16 or 32 bit based CPU with only 16 directly executable op-codes defined, making it technically speaking a RISC microprocessor. The memory architecture consisted of fast on-chip RAM which was directly memory mapped (i.e. not a cache). The T805 for example had 4K bytes of on-chip memory. An external memory interface was a multiplexed address-data bus supporting DRAM memory via additional supporting components, along with any other direct memory mapped peripheral (e.g. a high speed I/O port, A/D convertor, etc).

There was no general purpose masked interrupt support. There was only a single “event” pin pair (event request and event acknowledge) that caused the Transputer to jump to a specific parallel execution process to handle the event.

Transputer Serial Links

 The key element of the Transputer architecture was the seamless connectivity between processors and execution processes. A single instruction could be used to transfer data between parallel processes regardless if they were running on a single chip (performed as a block move) or individual processors connected via serial links which operated at speeds of up to 20Mbps in each direction. A built in on-chip scheduler would take care of the setup, transfer, synchronization and eventual completion of the communication between the two processors or processes. This made for a very efficient programming model, especially when using the native high level language OCCAM. Link adapters (CO11/CO12) were also introduced that converted the serial links to parallel I/O, allowing real-time conversion to the native serial link format required to talk to the Transputers without having to create a custom interface attached to an individual Transputer chip’s memory bus. While useful to convert to serial links, in many of the early implementations however a memory mapped peripheral was typically used to increase system performance or to meet specific I/O speeds beyond that of a serial link (i.e. greater than 10 or 20Mbps).

One of the cool features of the Transputer architecture was the ability to “boot from link”. This was a hardware selectable pin on the Transputer chip that caused it to boot off an internal microcode instruction set that initialized the chip, then waited for incoming data on any of its 4 links to load into memory then execute. Those familiar with the Transputer will remember the famous “worm program” that would be loaded onto an array of Transputers for the purposes of discovering how they were all connected, the processor type (T4, T8, etc) and so on. Typically, the boot from link was used to load a bootstrap loader program which funneled the applications processors through the array to the target Transputer, before loading it’s own program and executing. This allowed for a very flexible and programmable architecture, especially when the links were connected via the INMOS C004 switch device (a 32 port Transputer Link static switching device).

Support for Parallelism

The Transputer had an integrated microcoded scheduler on-chip which meant that there was no software based kernel or multi-threading employed. Tasks were simply scheduled and the Transputer’s built in engine handled the context switching between any number of given tasks. In addition, there were two priority levels assignable to a parallel task or process, high and low. Simplistic, and sometimes limiting, but good enough for the majority of embedded applications the Transputer was considered for. The important aspect of this simplistic and hardware based approach to scheduling is that an inactive process never consumed any processor time whatsoever.

Array Structures

Individual Transputers were often connected in a pipeline architecture or simple distributed array as shown. In the early years, the pipeline became the most popular approach as it allowed a single group of parallel processes to be simply replicated using the powerful OCCAM PLACED PAR i=0 FOR N (N= total number of processors) statement. Bit mapped graphics pixel processing liked this approach as the same program handle different parts of the overall graphics display much like an NVIDIA SLI or ATI Crossfire solution works today by diving up the screen into different parts. The often demonstrated Mandelbrot program are the common users of this architecture. The distributed array model is harder to program and load balance and was used less often. Of course, each application potentially demanded a different connectivity model or array configuration which is what drove the introduction of the INMOS C004 static configuration switch. This allowed for most architectures to be setup via an independent control processor.

OCCAM

OCCAM was the initial preferred programming model for Transputers with native support for the communication links along with some basic primitives to support parallel processes.  Statements or constructs that made it very different for example were SEQ (execute the following in sequential order), PAR (execute the following processes in parallel) and the link or channel input/output commands using the ? and ! operators e.g. output_channel ! x outputs the value x to the link, while input_channel ? y receives a value from the link and assigns it to variable y. Some other concepts, such as alternation using the ALT command also made it possible to choose two alternative actions or processes depending on say which of two input channels receives data.

Links of Interest