What is a special purpose computer that functions as a component in a large product?

A general-purpose computer is one that, given the appropriate application and required time, should be able to perform most common computing tasks.

Personal computers, including desktops, notebooks, smartphones and tablets, are all examples of general-purpose computers. The term is used to differentiate general-purpose computers from other types, in particular the specialized embedded computers used in intelligent systems.

ENIAC, designed and built in the 1940s, was the first general-purpose computer. ENIAC weighed 30 tons and covered an area of about 1,800 square feet. In contrast, a current smartphone weighs a few ounces and is small enough to slip into a pocket.

This was last updated in October 2013

Embedded Systems Landscape

Peter Barry, Patrick Crowley, in Modern Embedded Computing, 2012

System Resources and Features

General-purpose and embedded computer systems differ most in the variability of system resources and features rather than in their quantity. Embedded computer systems are typically designed and deployed with a relatively static and predetermined set of system resources and features.

This fact simplifies systems software and certain system processes, such as booting the system or diagnosing problems. For example, the boot process for an IA-32-based general-purpose computer, and the design of the software that implements that process, must be organized to contend with an unpredictable set of memory and I/O resources when the system starts. This resource uncertainty is not present in most embedded computer systems; hence, embedded system boot processes are shorter and simpler.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123914903000011

Larger computers

G.R. Wilson, in Embedded Systems and Computer Architecture, 2002

14.3 Storage within a computer

A general-purpose computer, such as popular personal computers, contains various storage devices, such as main memory, magnetic disks, and optical disks. Optical disks and floppy magnetic disks are used principally to store programs and data on a device that is external to the computer. These storage media are convenient for the retail distribution of programs and for the archiving of data in a way that is secure against a failure of the computer. Magnetic hard disks are used to store programs and data in a form that is ready to be accessed by the computer without the user having to insert an optical or floppy disk. The main memory store in a computer is made from a number of RAM devices2, and is used to store code and data for programs that the computer is currently using. Finally, the microprocessor itself contains registers that store the data that is currently being processed.

We can regard these storage devices as being in a hierarchy, ordered according to how close they are to the microprocessor, Figure 14.1. In general, a high access speed implies small size and high cost per byte.

What is a special purpose computer that functions as a component in a large product?

Figure 14.1. Memory hierarchy

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780750650649500158

From the Ground Up!

Luis F. Chaparro, Aydin Akan, in Signals and Systems Using MATLAB (Third Edition), 2019

0.3 Implementation of Digital Signal Processing Algorithms

Continuous-time signals are typically processed using analog systems composed of electrical circuit components such as resistors, capacitors, and inductors together with semiconductor electronic components such as diodes, transistors, and operational amplifiers, among others. Digital signals, on the other hand, are sequences of numbers and processing them requires numerical manipulation of these sequences. Simple addition, multiplication, and delay operations are enough to implement many discrete-time systems. Thus, digital signal processing systems are easier to design, develop, simulate, test, and implement than analog systems by using flexible, reconfigurable, and reliable software and hardware tools. Digital signal processing systems are employed these days in many applications such as cell phones, household appliances, cars, ships and airplanes, smart home applications, and many other consumer electronic devices. The fast development of digital technology has enabled high-capacity processing hardware tools at reasonable costs available for real-time applications. Refer to [44,54] for in-depth details.

A digital signal processing system may be used to perform a task on an analog signal x(t), or on an inherently discrete-time signal x[n]. In the former case, the analog signal is first converted into digital form by using an analog-to-digital converter which performs sampling of the analog signal, quantization of the samples, and encoding the amplitude values using a binary representation. A digital signal processing system may be represented by a mathematical equation defining the output signal as a function of the input by using arithmetic operations. Designing these systems requires the development of an algorithm that implements arithmetic operations.

A general-purpose computer may be used to develop and test these algorithms. Algorithm development, debugging and testing steps are generally done by using a high-level programming tool such as MATLAB or C/C++. Upon successful development of the algorithm, and after running simulations on test signals, the algorithm is ready to be implemented on hardware. Digital signal processing applications often require heavy arithmetic operations, e.g., repeated multiplications and additions, and as such dedicated hardware is required. Possible implementations for a real-time implementation of the developed algorithms are:

•

General-purpose microprocessors (μPs) and micro-controllers (μCs).

•

General-purpose digital signal processors (DSPs).

•

Field-programmable gate arrays (FPGAs).

Selecting the best implementation hardware depends on the requirements of the application such as performance, cost, size, and power consumption.

0.3.1 Microprocessors and Micro-Controllers

With increasing clock frequencies (for processing fast changing signals) and lower costs, general-purpose microprocessors and micro-controllers have become capable of handling many digital signal processing applications. However, complex operations such as multiplication and division are time consuming for general-purpose microprocessors since they need a series of operations. These processors do not have the best architecture or on chip facilities required for efficient digital signal processing operations. Moreover, they are usually not cost effective or power efficient for many applications.

Micro-controllers are application-specific micro-computers that contain built-in hardware components such as central processing unit (CPU), memory, and input/output (I/O) ports. As such, they are referred to as embedded controllers. A variety of consumer and industrial electronic products such as home appliances, automotive control applications, medical devices, space and military applications, wireless sensor networks, smart phones, and games are designed using micro-controllers. They are preferred in many applications due to their small size, low cost, and providing processor, memory, and random-access memory (RAM) components, all together in one chip.

A very popular micro-controller platform is the Arduino electronic board with an on-board micro-controller necessary and input/output ports. Arduino is an open-source and flexible platform that offers a very simple way to design a digital signal processing application. The built-in micro-controller is produced in an architecture having a powerful arithmetic logic unit that enables very fast execution of the operations. User-friendly software development environment is available for free, and it makes it very easy to design digital signal processing systems on Arduino boards.

0.3.2 Digital Signal Processors

A digital signal processor is a fast special-purpose microprocessor with architecture and instruction set designed specifically for efficient implementation of digital signal processing algorithms. Digital signal processors are used for a wide range of applications, from communications and control to speech and image processing. Applications embedded digital signal processors are often used in consumer products such as mobile phones, fax/modems, disk drives, radio, printers, medical and health care devices, MP3 players, high-definition television (HDTV), and digital cameras. These processors have become a very popular choice for a wide range of consumer applications, since they are very cost effective. Software development for digital signal processors has been facilitated by especially designed software tools. DSPs may be reprogrammed in the field to upgrade the product or to fix any software bugs, with useful built-in software development tools including a project build environment, a source code editor, a C/C++ compiler, a debugger, a profiler, a simulator, and a real-time operating system. Digital signal processors provide the advantages of microprocessors, while being easy to use, flexible, and lower cost.

0.3.3 Field Programmable Gate Arrays

Another way to implement a digital signal processing algorithm is using field-programmable gate arrays (FPGAs) which are field-programmable logic elements, or programmable devices that contain fields of small logic blocks (usually NAND gates) and elements. The logic block size in the field-programmable logic elements is referred to as the “granularity” which is related to the effort required to complete the wiring between the blocks. There are three main granularity classes:

•

Fine granularity or Pilkington (sea of gates) architecture

•

Medium granularity

•

Large granularity (Complex Programmable Logic Devices)

Wiring or linking between the gates is realized by using a programming tool. The field-programmable logic elements are produced in various memory technologies that allow the device to be reprogrammable, requiring short programming time and protection against unauthorized use. For many high-bandwidth signal processing applications such as wireless, multimedia, and satellite communications, FPGA technology provides a better solution than digital signal processors.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128142042000090

Low-Level Efficiency Issues

Peter Norvig, in Paradigms of Artificial Intelligence Programming, 1992

10.1 Use Declarations

On general-purpose computers running Lisp, much time is spent on type-checking. You can gain efficiency at the cost of robustness by declaring, or promising, that certain variables will always be of a given type. For example, consider the following function to compute the sum of the squares of a sequence of numbers:

(defun sum-squares (seq)

(let ((sum 0))

(dotimes (i (length seq))

(incf sum (square (elt seq i))))

sum))

(defun square (x) (* x x))

If this function will only be used to sum vectors of fixnums, we can make it a lot faster by adding declarations:

(defun sum-squares (vect)

(declare (type (simple-array fixnum *) vect)

(inline square) (optimize speed (safety 0)))

(let ((sum 0))

(declare (fixnum sum))

(dotimes (i (length vect))

(declare (fixnum i))

(incf sum (the fixnum (square (svref vect i)))))))

sum))

The fixnum declarations let the compiler use integer arithmetic directly, rather than checking the type of each addend. The (the fixnum … ) special form is a promise that the argument is a fixnum. The (optimize speed (safety 0)) declaration tells the compiler to make the function run as fast as possible, at the possible expense of making the code less safe (by ignoring type checks and so on). Other quantities that can be optimized are compilation-speed, space and in ANSI Common Lisp only, debug (ease of debugging). Quantities can be given a number from 0 to 3 indicating how important they are; 3 is most important and is the default if the number is left out.

The (inline square) declaration allows the compiler to generate the multiplication specified by square right in the loop, without explicitly making a function call to square. The compiler will create a local variable for (svref vect i) and will not execute the reference twice—inline functions do not have any of the problems associated with macros as discussed on page 853. However, there is one drawback: when you redefine an inline function, you may need to recompile all the functions that call it.

You should declare a function inline when it is short and the function-calling overhead will thus be a significant part of the total execution time. You should not declare a function inline when the function is recursive, when its definition is likely to change, or when the function's definition is long and it is called from many places.

In the example at hand, declaring the function inline saves the overhead of a function call. In some cases, further optimizations are possible. Consider the predicate starts-with:

(defun starts-with (list x)

"Is this a list whose first element is x?"

(and (consp list) (eql (first list) x)))

Suppose we have a code fragment like the following:

(if (consp list) (starts-with list x) …)

If starts-with is declared inline this will expand to:

(if (consp list) (and (consp list) (eql (first list) x)) …)

which many compilers will simplify to:

(if (consp list) (eql (first list) x) …)

Very few compilers do this kind of simplification across functions without the hint provided by inline.

Besides eliminating run-time type checks, declarations also allow the compiler to choose the most efficient representation of data objects. Many compilers support both boxed and unboxed representations of data objects. A boxed representation includes enough information to determine the type of the object. An unboxed representation is just the "raw bits" that the computer can deal with directly. Consider the following function, which is used to clear a 1024 × 1024 array of floating point numbers, setting each one to zero:

(defun clear-m-array (array)

(declare (optimize (speed 3) (safety 0)))

(declare (type (simple-array single-float (1024 1024)) array))

(dotimes (i 1024)

(dotimes (j 1024)

(setf (aref array i j) 0.0))))

In Allegro Common Lisp on a Sun SPARCstation, this compiles into quite good code, comparable to that produced by the C compiler for an equivalent C program. If the declarations are omitted, however, the performance is about 40 times worse.

The problem is that without the declarations, it is not safe to store the raw floating point representation of 0.0 in each location of the array. Instead, the program has to box the 0.0, allocating storage for a typed pointer to the raw bits. This is done inside the nested loops, so the result is that each call to the version of clear-m-array without declarations calls the floating-point-boxing function 1048567 times, allocating a megaword of storage. Needless to say, this is to be avoided.

Not all compilers heed all declarations; you should check before wasting time with declarations your compiler may ignore. The function disassemble can be used to show what a function compiles into. For example, consider the trivial function to add two numbers together. Here it is with and without declarations:

(defun f (x y)

(declare (fixnum x y) (optimize (safety 0) (speed 3)))

(the fixnum (+ x y)))

(defun g (x y) (+ x y))

Here is the disassembled code for f from Allegro Common Lisp for a Motorola 68000-series processor:

> (disassemble 'f)
;; disassembling #
;; formals: x y
;; code vector © #x83ef44
0:	link	a6.#0
4:	move.l	a2,-(a7)
6:	move.l	a5,-(a7)
8:	move.l	7(a2),a5
12:	move.l	8(a6).d4 ; y
16:	add.l	12(a6),d4 ; x
20:	move.l	#l,dl
22:	move.l	-8(a6),a5
26:	unlk	a6
28:	rtd	#8

This may look intimidating at first glance, but you don't have to be an expert at 68000 assembler to gain some appreciation of what is going on here. The instructions labeled 0–8 (labels are in the leftmost column) comprise the typical function preamble for the 68000. They do subroutine linkage and store the new function object and constant vector into registers. Since f uses no constants, instructions 6, 8, and 22 are really unnecessary and could be omitted. Instructions 0,4, and 26 could also be omitted if you don't care about seeing this function in a stack trace during debugging. More recent versions of the compiler will omit these instructions.

The heart of function f is the two-instruction sequence 12–16. Instruction 12 retrieves y, and 16 adds y to x, leaving the result in d4, which is the "result" register. Instruction 20 sets dl, the "number of values returned" register, to 1.

Contrast this to the code for g, which has no declarations and is compiled at default speed and safety settings:

> (disassemble 'g) ;; disassembling # ;; formals: x y ;; code vector @ #x83db64
0:	add.l	#8,31(a2)
4:	sub.w	#2,dl
6:	beq.s	12
8:	jmp	16(a4)	; wnaerr
12:	link	a6,#0
16:	move.l	a2,-(a7)
18:	move.l	a5,-(a7)
20:	move.l	7(a2),a5
24:	tst.b	− 208(a4)	; signal-hit
28	beq.s	34
30:	jsr	872(a4)	; process-sig
34:	move.l	8(a6),d4	; y
38:	move.l	12(a6),d0	; x
42:	or.l	d4,d0
44:	and.b	#7,d0
48:	bne.s	62
50:	add.l	12(a6),d4 ;	x
54:	bvc.s	76
56:	jsr	696(a4)	; add-overflow
60:	bra.s	76
62:	move.l	12(a6),-(a7)	; x
66:	move.l	d4,-(a7)
68:	move.l	#2,dl
70:	move.l	-304(a4),a0	; + _2op
74:	jsr	(a4)
76:	move.l	#1,d1
78:	move.l	-8(a6),a5
82:	unlk	a6
84:	rtd	#8

See how much more work is done. The first four instructions ensure that the right number of arguments have been passed to g. If not, there is a jump to wnaerr (wrong-number-of-arguments-error). Instructions 12–20 have the argument loading code that was at 0–8 in f. At 24–30 there is a check for asynchronous signals, such as the user hitting the abort key. After x and y are loaded, there is a type check (42–48). If the arguments are not both fixnums, then the code at instructions 62–74 sets up a call to + _2op, which handles type coercion and non-fixnum addition. If all goes well, we don't have to call this routine, and do the addition at instruction 50 instead. But even then we are not done—just because the two arguments were fixnums does not mean the result will be. Instructions 54–56 check and branch to an overflow routine if needed. Finally, instructions 76–84 return the final value, just as in f.

Some low-quality compilers ignore declarations altogether. Other compilers don't need certain declarations, because they can rely on special instructions in the underlying architecture. On a Lisp Machine, both f and g compile into the same code:

6 PUSH	ARG\|0	; X
7 +	ARG\|1	; Y
8 RETURN	PDL-POP

The Lisp Machine has a microcoded + instruction that simultaneously does a fixnum add and checks for non-fixnum arguments, branching to a subroutine if either argument is not a fixnum. The hardware does the work that the compiler has to do on a conventional processor. This makes the Lisp Machine compiler simpler, so compiling a function is faster. However, on modem pipelined computers with instruction caches, there is little or no advantage to microcoding. The current trend is away from microcode toward reduced instruction set computers (RISC).

On most computers, the following declarations are most likely to be helpful:

•

fixnum and float. Numbers declared as fixnums or floating-point numbers can be handled directly by the host computer's arithmetic instructions. On some systems, float by itself is not enough; you have to say single-float or double-float. Other numeric declarations will probably be ignored. For example, declaring a variable as integer does not help the compiler much, because bignums are integers. The code to add bignums is too complex to put inline, so the compiler will branch to a general-purpose routine (like + _2op in Allegro), the same routine it would use if no declarations were given.

•

list and array. Many Lisp systems provide separate functions for the list- and array- versions of commonly used sequence functions. For example, (delete × (the list 1 )) compiles into (sys: delete-list-eql × 1) on a TI Explorer Lisp Machine. Another function, sys:delete-vector, is used for arrays, and the generic function delete is used only when the compiler can't tell what type the sequence is. So if you know that the argument to a generic function is either a 1ist or an array, then declare it as such.

•

simple-vector and simple-array. Simple vectors and arrays are those that do not share structure with other arrays, do not have fill pointers, and are not adjustable. In many implementations it is faster to aref a simple-vector than a vector. It is certainly much faster than taking an elt of a sequence of unknown type. Declare your arrays to be simple (if they in fact are).

•

(array type). It is often important to specialize the type of array elements. For example, an (array short-f1oat) may take only half the storage of a general array, and such a declaration will usually allow computations to be done using the CPU's native floating-point instructions, rather than converting into and out of Common Lisp's representation of floating points. This is very important because the conversion normally requires allocating storage, but the direct computation does not. The specifiers (simple-array type) and (vector type) should be used instead of (array type) when appropriate. A very common mistake is to declare (simple-vector type). This is an error because Common Lisp expects (simple-vector size)—don't ask me why.

•

(array *dimensions). The full form of an array or simple-array type specifier is (array type dimensions). So, for example, (array bit (* *)) is a two-dimensional bit array, and (array bit (1024 1024)) is a 1024 × 1024 bit array. It is very important to specify the number of dimensions when known, and less important to specify the exact size, although with multidimensional arrays, declaring the size is more important. The format for a vector type specifier is (vector type size).

Note that several of these declarations can apply all at once. For example, in

(position # \ . (the simple-string file-name))

the variable filename has been declared to be a vector, a simple array, and a sequence of type string-char. All three of these declarations are helpful. The type simple-string is an abbreviation for (simple-array string-char).

This guide applies to most Common Lisp systems, but you should look in the implementation notes for your particular system for more advice on how to fine-tune your code.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780080571157500108

Microcomputer Buses and Links

J.D. Nicoud, in Encyclopedia of Physical Science and Technology (Third Edition), 2003

I.A Introduction

In any general-purpose computer, workstation, or dedicated controller based on a microprocessor, data transfers are continuously being performed between the processor, the memory, and the input/output (I/O) devices. Frequent transfers imply a high bandwidth, economically feasible only for short distances. For distances greater than a few meters, the cost of the electrical or optical lines forces the serialization of information.

A typical computer system consists of the processor (master) and several memory and I/O devices (slaves) interconnected by a set of data and control lines named buses (Fig. 1). These devices are generally clearly recognizable when they are connected by a backplane bus (Fig. 12). They are frequently mixed on a single board computer. The bus allows bidirectional transfers between a possibly variable set of devices. The links toward the peripherals have a simpler structure since they are point to point. Connecting several devices on a bus, or transferring data over long distances, implies solving many electrical problems correctly and taking care of the propagation time inside devices and over the transmission lines.

FIGURE 1. Typical computer system.

FIGURE 12. Typical board size for standard buses.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B0122274105004397

Symmetric Multiprocessor Architecture

Thomas Sterling, ... Maciej Brodowicz, in High Performance Computing, 2018

6.2 Architecture Overview

An SMP is a full-standing self-sufficient computer system with all subsystems and components needed to serve the requirements and support actions necessary to conduct the computation of an application. It can be employed independently for user applications cast as shared-memory multiple-threaded programs or as one of many equivalent subsystems integrated to form a scalable distributed-memory massively parallel processor (MPP) or commodity cluster. It can also operate as a throughput computer supporting multiprogramming of concurrent independent jobs or as a platform for multiprocess message passing jobs, even though the interprocess data exchange is achieved through shared memory transparent to the parallel programming interface. The following sections describe the key subsystems in some detail to convey how they contribute to achieving performance, principally through parallelism and diverse functionality with distinct technologies. This section begins with a brief overview of the full organization of an SMP architecture and the basic purposes of its major components, to provide a context for the later detailed discussions.

Like any general-purpose computer, an SMP serves a key set of functions on behalf of the user application, either directly in hardware or indirectly through the supporting operating system. These are typically:

•

instruction issue and operation functions through the processor core

•

program instruction storage and application data storage upon which the processor cores operate

•

mass and persistent storage to hold all information required over long periods of time

•

internal data movement communication paths and control to transfer intermediate values between subsystems and components within the SMP

•

input/output (I/O) interfaces to external devices outside the SMP, including other mass storage, computing systems, interconnection networks, and user interfaces, and

•

control logic and subsystems to manage SMP operation and coordination among processing, memory, internal data paths, and external communication channels.

The SMP processor cores perform the primary execution functions for the application programs. While these devices incorporate substantial complexity of design (described later), their principal operation is to identify the next instruction in memory to execute, read that instruction into a special instruction register, and decode the binary instruction coding to determine the purpose of the operation and the sequence of hardware signals to be generated to control the execution. The instruction is issued to the pipelined execution unit, and with its related data it proceeds through a sequence of microoperations to determine a final result. Usually the initial and resulting data is acquired from and deposited to special storage elements called registers: very high-speed (high bandwidth, low latency) latches that hold temporary values. Somewhat simplistically, there are five classes of operations that make up the overall functionality of the processor core.

The basic register-to-register integer, logic, and character operations.

Floating-point operations on real values.

Conditional branch operations to control the sequence of operations performed dependent on intermediate data values (usually Boolean).

Memory access operations to move data to and from registers and the main memory system.

Actions that initiate control of data through external I/O channels, including transfer to mass storage.

Until 2005 essentially all processors in the age of very large-scale integration (VLSI) technology were single-microprocessor integrated circuits. But with the progress of semiconductor technology reflecting Moore's law and the limitations of instruction-level parallelism (ILP) and clock rates due to power constraints, multicore processors (or sockets) starting with dual-core sockets have dominated the processor market over the last decade. Today processors may comprise a few cores, 6–16, with new classes of lightweight architectures permitting sockets of greater than 60 cores on a chip. An SMP may incorporate one or more such sockets to provide its processing capability (Fig. 6.1). Peak performance of an SMP is approximated by the product of the number of sockets, the number of cores per socket, the number of operations per instruction, and the clock rate that usually determines the instruction issue rate. This is summarized in Eq. (6.1).

Figure 6.1. Internal to the SMP are the intranode data paths, standard interfaces, and motherboard control elements.

(6.1)Ppeak∼Nsockets∗Ncorespersocke t∗Rclock∗Noperat ionsperinstruction

The SMP memory consists of multiple layers of semiconductor storage with complex control logic to manage the access of data from the memory by the processor cores, transparent vertical migration through the cache hierarchy, and cache consistency across the many cache stacks supporting the processor core and processor stack caches. The SMP memory in terms of the location of data that is being operated on is, in fact, three separate kinds of hardware. Already mentioned are the processor core registers; very fast latches that have their own namespace and provide the fastest access time (less than one cycle) and lowest latency. Each core has its own sets of registers that are unique to it and separated from all others. The main memory of the SMP is a large set of memory modules divided into memory banks that are accessible by all the processors and their cores. Main memory is implemented on separate dynamic random access memory (DRAM) chips and plugged into the SMP motherboard's industry-standard memory interfaces (physical, logical, and electrical). Data in the main memory is accessed through a virtual address that the processor translates to a physical address location in the main memory. Typically an SMP will have from 1–4 gigabytes of main memory capacity per processor core.

Between the processor core register sets and the SMP main memory banks are the caches. Caches bridge the gap of speeds between the rate at which the processor core accesses data and the rate at which the DRAM can provide it. The difference between these two is easily two orders of magnitude, with a core fetch rate in the order of two accesses per nanosecond and the memory cycle time in the order of 100 ns. To achieve this, the cache layers exploit temporal and spatial locality. In simple terms, this means that the cache system relies on data reuse. Ideally, data access requests will be satisfied with data present in the level 1 (L1) cache that operates at a throughput equivalent to the demand rate of a processor core and a latency of one to four cycles. This assumes that the sought-after data has already been acquired before (temporal locality) or that it is very near data already accessed (spatial locality). Under these conditions, a processor core could operate very near its peak performance capability. But due to size and power requirements, L1 caches (both data and instruction) are relatively small and susceptible to overflow; there is a need for more data than can be held in the L1 cache alone. To address this, a level 2 (L2) cache is almost always incorporated, again on the processor socket for each core or sometimes shared among cores. The L2 cache holds both data and instructions and is much larger than the L1 caches, although much slower. L1 and L2 caches are implemented with static random access memory (SRAM) circuit design. As the separation between core clock rates and main memory cycle times grew, a third level of cache, L3, was included, although these were usually implemented as a DRAM chip integrated within the same multi-chip module packaging of the processor socket. The L3 cache will often be shared among two or more cores on the processor package.

This contributes to achieving the second critical property of the SMP memory hierarchy: cache coherency. The symmetric multiprocessing attribute requires copies of main memory data values that are held in caches for fast access to be consistent. When two or more copies of a value with a virtual address are in distinct physical caches, a change to the value of one of those copies must be reflected in the values of all others. Sometimes the actual value may be changed to the updated value, although more frequently the other copies are merely invalidated so an obsolete value is not read and used. There are many hardware protocols that ensure the correctness of data copies, started as early as the 1980s with the modified exclusive shared invalid [1] family of protocols. The necessity to maintain such data coherence across caches within an SMP adds design complexity, time to access data, and increased energy.

Many SMP systems incorporate their own secondary storage to hold large quantities of information, both program codes and user data, and do so in a persistent manner so as to not lose stored information after the associated applications finish, other users employ the system, or the system is powered down. Mass storage has usually been achieved through hard magnetic disk technology with one or more spinning disk drives. More recently, although with somewhat lower density, solid-state drives (SSDs) have served this purpose. While more expensive, SSDs exhibit superior access and cycle times and better reliability as they have no moving parts. Mass storage presents two logic interfaces to the user. Explicitly, it supports the file system consisting of a graph structure of directories, each holding other directories and end-user files of data and programs. A complete set of specific file and directory access service calls is made available to users as part of the operating system to use the secondary storage. A second abstraction presented by mass storage is as part of the virtual memory system, where “pages” of block data with virtual addresses may be kept on disk and swapped in and out of main memory as needed. When a page request is made for data that is not found in memory, a page fault is indicated and the operating system performs the necessary tasks to make room for the requested page in main memory by moving a less-used page on to disk and then bringing the desired page into memory while updating various tables. This is performed transparently to the user, but can take more than a million times longer than a similar data access request to cache. Some SMP nodes, especially those used as subsystems of commodity clusters or MPPs, may not include their own secondary storage. Referred to as “diskless nodes”, these will instead share secondary storage which is itself a subsystem of the supercomputer or even external file systems shared by multiple computers and workstations. Diskless nodes are smaller, cheaper, lower energy, and more reliable.

Every SMP has multiple I/O channels that communicate with external devices (outside the SMP), user interfaces, data storage, system area networks, local area networks, and wide area networks, among others. Every user is familiar with many of these, as they are also found on deskside and laptop systems. For local area and system area networks, interfaces are most frequently provided to Ethernet and InfiniBand (IB) to connect to other SMPs of a larger cluster or institutional environments such as shared mass storage, printers, and the internet. The universal serial bus (USB) has become so widely employed for diverse purposes, including portable flash drives, that it is ubiquitous and available on essentially everything larger than a screen pad or laptop, and certainly on any deskside or rack-mounted SMP. JTAG is widely employed for system administration and maintenance. The Serial Advanced Technology Attachment (SATA) is widely used for external disk drives. Video graphics array and high-definition multimedia interface provide direct connection to high-resolution video screens. There is usually a connection specifically provided for a directly connected user keyboard. Depending on the system, there may be a number of other I/O interfaces.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B978012420158300006X

Sensor Network Platforms and Tools

Feng Zhao, Leonidas J. Guibas, in Wireless Sensor Networks, 2004

7.1 Sensor Node Hardware

Sensor node hardware can be grouped into three categories, each of which entails a different set of trade-offs in the design choices.

•

Augmented general-purpose computers: Examples include low-power PCs, embedded PCs (e.g., PC104), custom-designed PCs (e.g., Sensoria WINS NG nodes),1 and various personal digital assistants (PDA). These nodes typically run off-the-shelf operating systems such as Win CE, Linux, or real-time operating systems and use standard wireless communication protocols such as Bluetooth or IEEE 802.11. Because of their relatively higher processing capability, they can accommodate a wide variety of sensors, ranging from simple microphones to more sophisticated video cameras.

Compared with dedicated sensor nodes, PC-like platforms are more power hungry. However, when power is not an issue, these platforms have the advantage that they can leverage the availability of fully supported networking protocols, popular programming languages, middleware, and other off-the-shelf software.

•

Dedicated embedded sensor nodes: Examples include the Berkeley mote family [98], the UCLA Medusa family [202], Ember nodes,2 and MIT µAMP [32]. These platforms typically use commercial off-the-shelf (COTS) chip sets with emphasis on small form factor, low power processing and communication, and simple sensor interfaces. Because of their COTS CPU, these platforms typically support at least one programming language, such as C. However, in order to keep the program footprint small to accommodate their small memory size, programmers of these platforms are given full access to hardware but barely any operating system support. A classical example is the TinyOS platform and its companion programming language, nesC. We will discuss these platforms in Section 7.3.1 and 7.3.2

•

System-on-chip (SoC) nodes: Examples of SoC hardware include smart dust [109], the BWRC picoradio node [187], and the PASTA node.3 Designers of these platforms try to push the hardware limits by fundamentally rethinking the hardware architecture trade-offs for a sensor node at the chip design level. The goal is to find new ways of integrating CMOS, MEMS, and RF technologies

to build extremely low power and small footprint sensor nodes that still provide certain sensing, computation, and communication capabilities. Since most of these platforms are currently in the research pipeline with no predefined instruction set, there is no software platform support available.

Among these hardware platforms, the Berkeley motes, due to their small form factor, open source software development, and commercial availability, have gained wide popularity in the sensor network research community. In the following section, we give an overview of the Berkeley MICA mote.

7.1.1 Berkeley Motes

The Berkeley motes are a family of embedded sensor nodes sharing roughly the same architecture.Figure 7.1 shows a comparison of a subset of mote types.

Figure 7.1. A comparison of Berkeley motes.

Let us take the MICA mote as an example. The MICA motes have a two-CPU design, as shown in Figure 7.2. The main microcontroller (MCU), an Atmel ATmega103L, takes care of regular processing. A separate and much less capable coprocessor is only active when the MCU is being reprogrammed. The ATmega103L MCU has integrated 512 KB flash memory and 4 KB of data memory. Given these small memory sizes, writing software for motes is challenging. Ideally, programmers should be relieved from optimizing code at assembly level to keep code footprint small. However, high-level support and software services are not free. Being able to mix and match only necessary software components to support a particular application is essential to achieving a small footprint. A detailed discussion of the software architecture for motes is given in Section 7.3.1.

Figure 7.2. MICA mote architecture.

In addition to the memory inside the MCU, a MICA mote also has a separate 512 KB flash memory unit that can hold data. Since the connection between the MCU and this external memory is via a low-speed serial peripheral interface (SPI) protocol, the external memory is more suited for storing data for later batch processing than for storing programs. The RF communication on MICA motes uses the TR1000 chip set (from RF Monolithics, Inc.) operating at 916 MHz band. With hardware accelerators, it can achieve a maximum of 50 kbps raw data rate. MICA motes implement a 40 kbps transmission rate. The transmission power can be digitally adjusted by software though a potentiometer (Maxim DS1804). The maximum transmission range is about 300 feet in open space.

Like other types of motes in the family, MICA motes support a 51 pin I/O extension connector. Sensors, actuators, serial I/O boards, or parallel I/O boards can be connected via the connector. A sensor/actuator board can host a temperature sensor, a light sensor, an accelerometer, a magnetometer, a microphone, and a beeper. The serial I/O (UART) connection allows the mote to communicate with a PC in real time. The parallel connection is primarily for downloading programs to the mote.

It is interesting to look at the energy consumption of various components on a MICA mote. As shown in Figure 7.3, a radio transmission bears the maximum power consumption. However, each radio packet (e.g., 30 bytes) only takes 4 ms to send, while listening to incoming packets turns the radio receiver on all the time. The energy that can send one packet only supports the radio receiver for about 27 ms. Another observation is that there are huge differences among the power consumption levels in the active mode, the idle mode, and the suspend mode of the MCU. It is thus worthwhile from an energy-saving point of view to suspend the MCU and the RF receiver as long as possible.

Figure 7.3. Power consumption of MICA motes.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9781558609143500079

The iPod

Mike Kuniavsky, in Smart Things, 2010

9.2.2.1 iTunes

iTunes is a general purpose computer avatar. It is distinguished from other Store avatars by its breadth of functionality and its role as a gateway to the Store for devices that could not directly connect. Other store avatars specialize in a certain subset of functionality, but iTunes (Figure 9-7) contains nearly all of the management and playback functionality of the other products. This functional heterogeneity also distinguishes it from Apple's other software products, which typically focus on creating and editing only a single media format. In contrast, the iTunes feature list includes functions as varied as CD burning, Internet radio listening, podcast subscription, ringtone creation, and digital video downloading.

Figure 9-7. The author's iTunes 8, showing basic music playing mode.

In addition to delivering all of the elements of the service described above, it also controls the other avatars. It is used to load content onto iPods, to synchronize downloaded video content with Apple TV, and to send content to AirTunes, an Apple technology for streaming music between devices over a local network.

With the iPhone, Apple placed the iTunes Store on the actual device. By making iTunes unnecessary to buy music, it moved the control point directly to the hardware avatar. Until this change, the iTunes service was organized as a hub-and-spoke model, in which iTunes was the hub, and each specialized avatar spoke.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123748997000096

Concurrency in the Cloud

Dan C. Marinescu, in Cloud Computing (Second Edition), 2018

3.14 Multithreading and Concurrency in Java; FlumeJava

Java is a general-purpose computer programming language designed with portability in mind at Sun Microsystems.4 Java applications are typically compiled to bytecode and can run on a Java Virtual Machine (JVM) regardless of the computer architecture. Java is a class-based, object-oriented language with support for concurrency. It is one of the most popular programming language and it is widely used for a wide range of applications running on mobile devices and computer clouds.

Java Threads. Java supports processes and threads. Recall that a process has a self-contained execution environment, has its own private address space and run-time resources. A thread is a lightweight entity within a process. A Java application starts with one thread, the main thread which can create additional threads.

Memory consistency errors occur when different threads have inconsistent views of the same data. Synchronized methods and synchronized statements are the two idioms for synchronization. Serialization of critical sections is protected by specifying the synchronized attribute in the definition of a class or method. This guarantees that only one thread can execute the critical section and each thread entering the section sees the modification done. Synchronized statements must specify the object that provides the intrinsic lock.

The current versions of Java, support atomic operations of several datatypes with methods such as getAndDecrement(), getAndIncrement() and getAndSet(). An effective way to control data sharing among threads is to share only immutable data among threads. A class is made immutable by marking all its fields as final and declaring the class as final.

A Thread in the java.lang.Thread class executes an object of type java.lang.Runnable. The java.util.concurrent package provides better support for concurrency than the Thread class. This package reduces the overhead for thread creation and prevents too many threads overloading the CPU and depleting the available storage. A thread pool is a collection of Runnable objects and contains a queue of tasks waiting to get executed.

Threads can communicate with one another via interrupts. A thread sends an interrupt by invoking an interrupt on the Thread object to the thread to be interrupted. The thread to be interrupted is expected to support its own interruption. Thread.sleep causes the current thread to suspend execution for a specified period.

The executor framework works with Runnable objects which cannot return results to the caller. The alternative is to use java.util.concurrent.Callable. A Callable object returns an object of type java.util.concurrent.Future. The Future object can be used to check the status of a Callable object and to retrieve the result from it. Yet, the Future interface has limitations for the asynchronous execution and the CompletableFuture extends the functionality of the Future interface for asynchronous execution.

Non-blocking algorithms based on low-level atomic hardware primitives such as compare-and-swap (CAS) are supported by Java 5.0 and later versions. The fork-join framework introduced in Java 7 supports the distribution of work to several workers and then waiting for their completion. The join method allows one thread to wait for completion of another.

FlumeJava. A Java library used to develop, test, and run efficient data parallel pipelines is described in [92]. FlumeJava is used to develop data parallel applications such as MapReduce discussed in Section 7.5.

At the heart of the system is the concept of parallel collection which abstracts the details of data representation. Data in a parallel collection can be an in-memory data structure, one or more files, BigTable discussed in Section 6.9, or a MySQL database. Data-parallel computations are implemented by composition of several operations for parallel collections.

In turn, parallel operations are implemented using deferred evaluation. The invocation of a parallel operation records the operation and its arguments in an internal graph structure representing the execution plan. Once completed, the execution plan is optimized.

The most important classes of the FlumeJava library are the Pcollection used to specify a immutable bag of elements of type T and the PTable representing an immutable multi-map with keys of type K and values of type V. The internal state of a PCollection object is either deferred or materialized, i.e. not yet computed or computed, respectively. The PObject class is a container for a single Java object of type T and can be either deferred or materialized.

parallelDo() supports element-wise computation over an input PCollect ion to produce a new output PCollection. This primitive takes as the main argument a DoFn, a function-like object defining how to map each value in the input into zero or more values in the output. In the following example from [92] collectionOf(strings()) specifies that the parallelDo() operation should produce an unordered PCollection whose String elements should be encoded using UTF-85

~~Other primitive operations are groupByKey(), combineValues() and flatten ().~~

•
groupByKey() converts a multi-map of type PTable. Multiple key/value pairs may share the same key into a uni-map of type PTable> where each key maps to an unordered, plain Java Collection of all the values with that key.
•
co mbineValues() takes an input PTable> and an associative combining function on Vs, and returns a PTabl e where each input collection of values has been combined into a single output value.
•
flatten() takes a list of PCollections and returns a single PCollection that contains all the elements of the input PCollections.
Pipelined operations are implemented by concatenation of functions. For example, if the output of function f is applied as input of function g in a ParallelDo operation then two P arallelDo compute f and f⊗g. The optimizer is only concerned with the structure of the execution plan and not with the optimization of user-defined functions.
FlumeJava traverses the operations in the plan of a batch application in forward topological order, and executes each operation in turn. Independent operations are executed simultaneously. FlumeJava exploits not only the task parallelism but also the data parallelism within operations.

~~Read full chapter~~

~~URL: https://www.sciencedirect.com/science/article/pii/B9780128128107000042~~

Database Machines

~~Catherine M. Ricardo, in Encyclopedia of Information Systems, 2003~~

II. Functions of a Database Machine

In a traditional database environment, a general-purpose computer is used to run the database management system (DBMS), as well as a variety of other software and applications under its operating system. The database files reside on a disk that is under the computer's control. When a user or application program requests data, the computer processes the request and manages the disk controllers to access the data files. In a database machine environment, the general-purpose computer, called the host, does not run the DBMS software. Instead the DBMS runs on the database machine, a separate computer that controls the devices on which the database files reside. When a user or program requests data access, the request is submitted to the host, which passes it to the database machine for processing. The dedicated machine then performs the following functions:

•
Accepts the data request and identifies which stored records will be needed to satisfy the request
•
Checks that the user is authorized to access those items and to perform the requested operations on them
•
Chooses the best path for data access
•
Performs concurrency control so that other data requests submitted at the same time do not cause errors; this is necessary if at least one of the requests is an update
•
Handles the recovery subsystem, to ensure that the database can be restored to a correct state in the event of a transaction or system failure
•
Maintains data integrity, checking that no integrity constraints are violated
•
Directs the actual data access using its device controllers
•
Handles data encryption, if used
•
Formats the retrieved data, if any
•
Returns the data or results to the host machine

~~Read full chapter~~

~~URL: https://www.sciencedirect.com/science/article/pii/B0122272404000277~~

What type of computer is special purpose computer?

Examples include Personal Digital Assistants (PDAs), mobile phones, palm-top computers, pocket PCs etc. As they are handheld devices, their weights and sizes have certain limitations as a result they are equipped with small memories, slow processors and small display screens, etc.

What are special purpose computers used for?

Special purpose computer It is designed to perform a variety of tasks. It is designed for a specific application. It is more versatile.

What computer is a special purpose computer that is used inside of a device and is usually dedicated to specific functions?

d An embedded computer is a special-purpose computer used inside a device and is usually dedicated to specific functions. They are commonly used in items such as answering machines, washing machines, cameras, cars, motors, sewing machines, clocks and microwaves.

Embedded computer A computer is

What is a special purpose computer that functions as a component in a large product?

Embedded Systems Landscape

System Resources and Features

Larger computers

14.3 Storage within a computer

From the Ground Up!

0.3 Implementation of Digital Signal Processing Algorithms

0.3.1 Microprocessors and Micro-Controllers

0.3.2 Digital Signal Processors

0.3.3 Field Programmable Gate Arrays

Low-Level Efficiency Issues

10.1 Use Declarations

Microcomputer Buses and Links

I.A Introduction

Symmetric Multiprocessor Architecture

6.2 Architecture Overview

Sensor Network Platforms and Tools

7.1 Sensor Node Hardware

7.1.1 Berkeley Motes

The iPod

9.2.2.1 iTunes

Concurrency in the Cloud

3.14 Multithreading and Concurrency in Java; FlumeJava

Database Machines

II. Functions of a Database Machine

What type of computer is special purpose computer?

What are special purpose computers used for?

What computer is a special purpose computer that is used inside of a device and is usually dedicated to specific functions?

Bài Viết Liên Quan

Quảng Cáo

Có thể bạn quan tâm

Toplist được quan tâm

Quảng cáo

Xem Nhiều

Quảng cáo

Chúng tôi

Điều khoản

Trợ giúp

Mạng xã hội