Embedded Systems

Monday, August 25, 2008

Intel, x86 processors, and the IBM PC

Despite the ultimate importance of the microprocessor, the 4004 and its successors the 8008 and the 8080 were never major revenue contributors at Intel. As the next processor, the 8086 (and its variant the 8088) was completed in 1978, Intel embarked on a major marketing and sales campaign for that chip nicknamed "Operation Crush", and intended to win as many customers for the processor as possible. One design win was the newly-created IBM PC division, though the importance of this was not fully realized at the time.

IBM introduced its personal computer in 1981, and it was rapidly successful. In 1982, Intel created the 80286 microprocessor, which, two years later, was used in the IBM PC/AT. Compaq, the first IBM PC "clone" manufacturer, in 1985 produced a desktop system based on the faster 80286 processor and in 1986 quickly followed with the first 80386-based system, beating IBM and establishing a competitive market for PC-compatible systems and setting up Intel as a key component supplier.

In 1975 the company had started a project to develop a highly-advanced 32-bit microprocessor, finally released in 1981 as the Intel iAPX 432. The project was too ambitious and the processor was never able to meet its performance objectives, and it failed in the marketplace. Intel extended the x86 architecture to 32 bits instead.

Intel Corporation

Intel Corporation (NASDAQ: INTC is the world's largest semiconductor company and the inventor of the x86 series of microprocessors, the processors found in most personal computers. Founded on July 18, 1968 as Integrated Electronics Corporation and based in Santa Clara, California, USA, Intel also makes motherboard chipsets, network cards and ICs, flash memory, graphic chips, embedded processors, and other devices related to communications and computing. Founded by semiconductor pioneers Robert Noyce and Gordon Moore, and widely associated with the executive leadership and vision of Andrew Grove, Intel combines advanced chip design capability with a leading-edge manufacturing capability. Originally known primarily to engineers and technologists, Intel's successful "Intel Inside" advertising campaign of the 1990s made it and its Pentium processor household names.

Intel was an early developer of SRAM and DRAM memory chips, and this represented the majority of its business until the early 1980s. While Intel created the first commercial microprocessor chip in 1971, it was not until the success of the personal computer (PC) that this became their primary business. During the 1990s, Intel invested heavily in new microprocessor designs and in fostering the rapid growth of the PC industry. During this period Intel became the dominant supplier of microprocessors for PCs, and was known for aggressive and sometimes controversial tactics in defense of its market position, as well as a struggle with Microsoft for control over the direction of the PC industry. The 2007 rankings of the world's 100 most powerful brands published by Millward Brown Optimor showed the company's brand value falling 10 places – from number 15 to number 25.

In addition to its work in semiconductors, Intel has begun research in electrical transmission and generation.

History of ARM

The ARM design was started in 1983 as a development project at Acorn Computers Ltd to build a compact RISC CPU. Led by Sophie Wilson and Steve Furber, a key design goal was achieving low-latency input/output (interrupt) handling like the MOS Technology 6502 used in Acorn's existing computer designs. The 6502's memory access architecture allowed developers to produce fast machines without the use of costly direct memory access hardware. The team completed development samples called ARM1 by April 1985, and the first "real" production systems as ARM2 the following year.

The ARM2 featured a 32-bit data bus, a 32-bit (4 Gbyte) address space and sixteen 32-bit registers. Program code had to lie within the first 64 Mbyte of the memory, as the program counter was limited to 26 bits because the top 6 bits of the 32-bit register served as status flags. The ARM2 was possibly the simplest useful 32-bit microprocessor in the world, with only 30,000 transistors (compare with Motorola's six-year older 68000 model with around 70,000 transistors). Much of this simplicity comes from not having microcode (which represents about one-quarter to one-third of the 68000) and, like most CPUs of the day, not including any cache. This simplicity led to its low power usage, while performing better than the Intel 80286. A successor, ARM3, was produced with a 4KB cache, which further improved performance.

In the late 1980s Apple Computer and VLSI Technology started working with Acorn on newer versions of the ARM core. The work was so important that Acorn spun off the design team in 1990 into a new company called Advanced RISC Machines Ltd. For this reason, ARM is sometimes expanded as Advanced RISC Machine instead of Acorn RISC Machine. Advanced RISC Machines became ARM Ltd when its parent company, ARM Holdings plc, floated on the London Stock Exchange and NASDAQ in 1998.

The new Apple-ARM work would eventually turn into the ARM6, first released in 1991. Apple used the ARM6-based ARM 610 as the basis for their Apple Newton PDA. In 1994, Acorn used the ARM 610 as the main CPU in their Risc PC computers. DEC licensed the ARM6 architecture (which caused some confusion because they also produced the DEC Alpha) and produced the StrongARM. At 233 MHz this CPU drew only 1 watt of power (more recent versions draw far less). This work was later passed to Intel as a part of a lawsuit settlement, and Intel took the opportunity to supplement their aging i960 line with the StrongARM. Intel later developed its own high performance implementation known as XScale which it has since sold to Marvell.

The ARM core has remained largely the same size throughout these changes. ARM2 had 30,000 transistors, while the ARM6 grew to only 35,000. ARM's business has always been to sell IP cores, which licensees use to create microcontrollers and CPUs based on this core. The most successful implementation has been the ARM7TDMI with hundreds of millions sold in almost every kind of microcontroller equipped device. The idea is that the Original Design Manufacturer combines the ARM core with a number of optional parts to produce a complete CPU, one that can be built on old semiconductor fabs and still deliver substantial performance at a low cost. As of January 2008, over 10 billion ARM cores have been built, and iSuppli predicts that 5 billion a year will ship in 2011.

The common architecture supported on smartphones, Personal Digital Assistants and other handheld devices is ARMv4. XScale and ARM926 processors are ARMv5TE, and are now more numerous in high-end devices than the StrongARM, ARM925T and ARM7TDMI based ARMv4 processors.

ARM architecture

The ARM architecture (previously, the Advanced RISC Machine, and prior to that Acorn RISC Machine) is a 32-bit RISC processor architecture developed by ARM Limited that is widely used in embedded designs. Because of their power saving features, ARM CPUs are dominant in the mobile electronics market, where low power consumption is a critical design goal[citation needed].

Today, the ARM family accounts for approximately 75% of all embedded 32-bit RISC CPUs,[1] making it one of the most widely used 32-bit architectures. ARM CPUs are found in most corners of consumer electronics, from portable devices (PDAs, mobile phones, media players, handheld gaming units, and calculators) to computer peripherals (hard drives, desktop routers); however it no longer has significant penetration as the main processor in the desktop computer market and has never been used in a supercomputer or cluster. Important branches in this family include Marvell's XScale and the Texas Instruments OMAP series.

Dubious features of 6052

The original 6502 and its NMOS derivatives are noted for having a variety of undocumented instructions, which vary from one chip manufacturer to the next. The 6502's instruction decoding is implemented in a hardwired logic array (similar to a programmable logic array) which is only defined for 151 of the 256 available opcodes. The remaining 105 trigger strange and hard-to-predict actions (e.g., immediately crashing the processor, performing several valid instructions at once, or simply doing nothing at all). Eastern House Software developed the "Trap65", a device that plugged between the processor and its socket to convert (trap) unimplemented opcodes into BRK (software interrupt) instructions. Some programmers utilized this feature to extend the 6502's instruction set by providing functionality for the unimplemented opcodes with specially written software intercepted at the BRK instruction's 0xFFFE vector. All of the undefined opcodes have been replaced by NOP instructions in the 65C02 CMOS version (although with varying byte sizes and execution times).

The 6502's memory indirect jump instruction, JMP (

), is partially broken. If

was hex xxFF (i.e. any word ending in FF), the processor would not jump to the address stored in xxFF and xxFF+1, but rather the one in xxFF and xx00. This defect continued through the entire NMOS line, but was fixed in the CMOS derivatives.

The N (result negative), V (sign bit overflow) and Z (result zero) status flags are not valid when performing arithmetic operations while the processor is in BCD mode, as these flags reflect the binary, not BCD, result. This limitation was removed in the CMOS derivatives. Therefore, this feature may be used to cleanly distinguish CMOS from NMOS CPU versions without using any illegal opcodes.

If the processor happens to be in BCD mode when a hardware interrupt occurs it will not revert to binary mode. This quirk could result in hard-to-solve bugs in the interrupt service routine if it failed to clear BCD mode before performing any arithmetic operations. For example, the Commodore 64's kernel did not correctly handle this processor characteristic, requiring that IRQs be disabled or revectored during BCD math operations. This issue was addressed in the CMOS derivatives as well.

The SO pin (Set Overflow) was intended for use in high-speed device drivers. Asserting it would immediately set the processor's Overflow (V) status register bit. Successful use of this feature could eliminate a load instruction from a high-speed device driver, reducing the number of instructions in a data transfer loop by 25%, but obviously great care was required in the system design in order not to corrupt general computation.

The 6502 instruction set includes BRK (opcode $00), which is technically a software interrupt (similar in spirit to the SWI mnemonic of the 6800). BRK is most often used to interrupt program execution and start a machine code monitor for testing and debugging during software development. It could also be used to route program execution using a simple jump table (analogous to the manner in which the 8088 and derivatives handle software interrupts by number). Unfortunately, if a hardware interrupt occurs at the same time the processor is fetching a BRK instruction, the NMOS version of the processor will fail to execute BRK and instead proceed as if only a hardware interrupt had occurred. This fault was corrected in the CMOS implementation of the processor.

The JSR (call subroutine) instruction pushes the address of the last byte of the call instruction on to the stack (the program counter would have been increased after execution has been done) . The RTS (return) instruction pulls the return address off the stack and increments it before placing it into the program counter, resulting in automatic compensation for this design quirk. This characteristic would go unnoticed unless you pulled the return address to pick up parameters in the code stream (a common 6502 programming idiom). It remains a characteristic of 6502 derivatives to this day.

Technical description of 6052

The 6502 is an 8-bit processor with a 16-bit address bus. The internal logic runs at the same speed as the external clock rate, but despite the slow clock speeds (typically in the neighborhood of 1 or 2 MHz), the 6502's performance was actually competitive with other CPUs using significantly faster clocks. This is partly due to a simplistic state machine implemented by combinatorial (clockless) logic to a greater extent than in many other designs; the two phase clock (supplying two synchronizations per cycle) can thereby control the whole machine-cycle directly. Like most simple CPUs of the era, the dynamic NMOS 6502 chip was not sequenced by a microcode ROM but used a PLA (which occupied about 15% of the chip area) for instruction decoding and sequencing. Like most typical eight-bit microprocessors, the chip does some limited overlapping of fetching and execution.

The low clock frequency moderated the speed requirement of memory and peripherals attached to the CPU, as only about 50% of the clock cycle was available for memory access (due to the asynchronous design, this percentage varied strongly among chip versions). This was critical at a time when affordable memory had access times in the range 450-250ns. The original NMOS 6502 was minimalistically engineered and efficiently manufactured and therefore cheap—an important factor in getting design wins in the very price-sensitive game console and home computer markets.

6502 Pin configuration (40-Pin DIP)Like its precursor, the Motorola 6800 (but unlike Intel 8080 and similar microprocessors) the 6502 has very few registers. At the time the processor was designed, small bipolar memories were relatively fast, so it made sense to rely on RAM instead of wasting expensive NMOS chip area on CPU-registers.

The 6502's registers included one 8-bit accumulator register (A), two 8-bit index registers (X and Y), an 8-bit processor status register (P), an 8-bit stack pointer (S), and a 16-bit program counter (PC). The subroutine call/scratchpad stack's address space was hardwired to memory page $01, i.e. the address range $0100–$01FF (256–511). Software access to the stack was done via four implied addressing mode instructions whose functions were to push or pop (pull) the accumulator or the processor status register. The same stack was also used for subroutine calls via the JSR (Jump to Subroutine) and RTS (Return from Subroutine) instructions, and for interrupt handling.

The chip used the index and stack registers effectively with several addressing modes, including a fast "direct page" or "zero page" mode, similar to that found on the PDP-8, that accessed memory locations from address 0 to 255 with a single 8-bit address (saving the cycle normally required to fetch the high-order byte of the address)—code for the 6502 used the zero page much as code for other processors would have used registers. On some 6502-based microcomputers with an operating system, the OS would use most of zero page, leaving only a handful of locations for the user.

Addressing modes also included implied (1 byte instructions); absolute (3 bytes); indexed absolute (3 bytes); indexed zero-page (2 bytes); relative (2 bytes); accumulator (1); indirect,x and indirect,y (2); and immediate (2). Absolute mode was a general-purpose mode. Branch instructions used a signed 8-bit offset relative to the instruction after the branch; the numerical range -128..127 therefore translates to 128 bytes backward and 127 bytes forward from the instruction following the branch (which is 126 bytes backward and 129 bytes forward from the start of the branch instruction). Accumulator mode used the accumulator as an effective address, and did not need any operand data. Immediate mode used an 8-bit literal operand.

The indirect modes were useful for array processing and other looping. With the 5/6 cycle "(indirect),y" mode, the 8-bit Y register was added to a 16-bit base address in zero page, located by a single byte following the opcode. As the resulting address could be anywhere in the 16-bit memory range, the Y register was a true index register, as opposed to the 6800, which had one 16-bit address register. Incrementing the index register to walk the array byte-wise took only two additional cycles. With the less frequently used "(indirect,x)" mode the effective address for the operation was found at the zero page address formed by adding the second byte of the instruction to the contents of the X register. Using the indexed modes, the zero page effectively acted as a set of 128 additional (though very slow) address registers.

The 6502 also included a set of binary coded decimal (BCD) instructions, a feature normally implemented in software. Placing the CPU into BCD allowed numbers to be manipulated in base-10, with a set of conversion instructions to convert between base-10 and binary (base-2). For instance, with the "D" flag set, 99+1 would result in 00 and the carry flag being set. These instructions made implementing a BASIC programming language easier, removing the need to convert numbers for display in the BASIC interpreter itself. However, this feature meant other useful instructions could not be implemented due to a lack of CPU real estate, and was sometimes removed to make room for custom instructions.

A Byte magazine article once referred to the 6502 as "the original RISC processor," due to its efficient, simplistic, and nearly orthogonal instruction set (most instructions work with most addressing modes), as well as its 256 zero-page "registers". The 6502 is technically not a RISC design however, as arithmetic operations can read any memory cell (not only zero-page), and some instructions (inc, rol etc.) even modify memory contrary to the basic load/store philosophy of RISC. Furthermore, orthogonality is equally often associated with "CISC". However the 6502 performed reasonably well compared to other contemporaneous processors such as the Z80, which used a much faster clock rate, and the 6502 has been credited as being inspirational to RISC processors such as the ARM.

History and use of 6052

The 6502 was designed primarily by the same engineering team that had designed the Motorola 6800. After resigning from Motorola en masse, the team went looking for another company that would be interested in hosting a design team, and found MOS Technology, then a small chipmaking company whose main product was a single-chip implementation of the popular Pong video game.

At MOS, they quickly designed the 6501, a completely new processor that was pin-compatible with the 6800 (that is, it could be plugged into motherboards designed for the Motorola processor, although its instruction set was different). Motorola sued immediately, and MOS agreed to stop producing the 6501 and went back to the drawing board. The result was the "lawsuit-compatible" 6502, which was by design unusable in a 6800 motherboard but otherwise identical to the 6501. Motorola had no objection. However, this left MOS with the problem of getting developers to try their processor, so engineer Peddle designed the KIM-1 simple single-board computer. Much to their surprise, the KIM-1 sold well to hobbyists and tinkerers as well as to the engineers it was intended for. The related Rockwell AIM 65 control/training/development system also did well. Another roughly similar product was the Synertek SYM-1.

Apple IIeThe 6502 was introduced at $25 at the Westcon show in September 1975. The company had an off-floor suite with a wooden barrel full of the chips, although this early run meant only the ones at the top of the barrel worked. At the same show the 6800 and Intel 8080 were selling for $179. At first many people thought the new chip's price was a hoax or a mistake, but while the show was still ongoing both Motorola and Intel had dropped their chips to $79. These price reductions legitimized the 6502, which started selling by the hundreds.

One of the first "public" uses for the design was the Apple I computer, introduced in 1976. The 6502 was next used in the Commodore PET and the Apple II. It was later used in the Atari home computers, the BBC Micro family, the Commodore VIC-20 and a large number of other designs both for home computers and business, such as Ohio Scientific.

Commodore 64The 6510, a direct successor of the 6502 with a digital I/O port and a three-state address bus, was the CPU utilized in the Commodore 64 home computer. (Commodore's disk drive, the 1541, had a processor of its own—it too was a 6502.)

Atari 2600Another important use of the 6500 family was in video games. The first to make use was the Atari 2600 videogame console. The 2600 used an offshoot of the 6502 called the 6507, which had fewer pins and, as a result, could address only 8 KB of memory. Millions of the Atari consoles would be sold, each with a MOS processor. Another significant use was by the Nintendo Famicom, a Japanese video game console. Its international equivalent, the Nintendo Entertainment System, also used the processor. The 6502 used in the NES was a second source version by Ricoh, a partial system-on-a-chip, that lacked a binary-coded decimal mode but added 22 memory-mapped registers for sound generation, joypad reading, and sprite list DMA. Called 2A03 in NTSC consoles and 2A07 in PAL consoles (the difference being the memory divider ratio and a lookup table for audio sample rates), this processor was produced exclusively for Nintendo.

Even as of 2006, some universities, including the Eindhoven University of Technology, the Netherlands, University of Tasmania, the University of Applied Sciences in Cologne, Germany, University of Exeter in Devon, England, Carleton College, Hull University, Matthew Boulton College, University of Brescia in Italy and Universidad APEC in Santo Domingo, Dominican Republic still use the processor to teach assembly language, computer architecture and digital integrated systems.