Sunday 29 January 2017

The complete Papilio One constraint file

##################################################################################
## BPC3003_2.03+.ucf
##
## Author: Jack Gassett
##
## Details: http://gadgetforge.gadgetfactory.net/gf/project/butterfly_one/
##
## Contains assignment and iostandard information for
## all used pins as well as timing and area constraints for Papilio One 2.03 and higher  -
boards. Papilio One boards started using 32Mhz oscillators at version 2.02 and above.
##
##################################################################################

# Crystal Clock - use 32MHz onboard oscillator
NET "clk" LOC = "P89" | IOSTANDARD = LVCMOS25 | PERIOD = 31.25ns ;

# Wing1 Column A
NET "W1A<0>" LOC = "P18" ; # LogicStart 7Seg anode(0)
NET "W1A<1>" LOC = "P23" ; # LogicStart 7seg Decimal Point
NET "W1A<2>" LOC = "P26" ; # LogicStart 7Seg anode(1)
NET "W1A<3>" LOC = "P33" ; # LogicStart 7seg segment E
NET "W1A<4>" LOC = "P35" ; # LogicStart 7seg segment F
NET "W1A<5>" LOC = "P40" ; # LogicStart 7seg segment C
NET "W1A<6>" LOC = "P53" ; # LogicStart 7seg segment D
NET "W1A<7>" LOC = "P57" ; # LogicStart 7seg segment A
NET "W1A<8>" LOC = "P60" ; # LogicStart 7seg anode(2)
NET "W1A<9>" LOC = "P62" ; # LogicStart 7seg segment G
NET "W1A<10>" LOC = "P65" ; # LogicStart 7seg segment B
NET "W1A<11>" LOC = "P67" ; # LogicStart 7seg anode(3)
NET "W1A<12>" LOC = "P70" ; # LogicStart A2D SPI_CS
NET "W1A<13>" LOC = "P79" ; # LogicStart A2D SPI_DOUT
NET "W1A<14>" LOC = "P84" ; # LogicStart A2D SPI_DIN
NET "W1A<15>" LOC = "P86" ; # LogicStart A2D SPI_SCLK

# Wing1 Column B
NET "W1B<0>" LOC = "P85" ; # LogicStart vsync
NET "W1B<1>" LOC = "P83" ; # LogicStart hsync
NET "W1B<2>" LOC = "P78" ; # LogicStart blue1
NET "W1B<3>" LOC = "P71" ; # LogicStart blue2
NET "W1B<4>" LOC = "P68" ; # LogicStart green0
NET "W1B<5>" LOC = "P66" ; # LogicStart green1
NET "W1B<6>" LOC = "P63" ; # LogicStart green2
NET "W1B<7>" LOC = "P61" ; # LogicStart red0
NET "W1B<8>" LOC = "P58" ; # LogicStart red1
NET "W1B<9>" LOC = "P54" ; # LogicStart red2
NET "W1B<10>" LOC = "P41" ; # LogicStart audio
NET "W1B<11>" LOC = "P36" ; # LogicStart joystick right
NET "W1B<12>" LOC = "P34" ; # LogicStart joystick left
NET "W1B<13>" LOC = "P32" ; # LogicStart joystick down
NET "W1B<14>" LOC = "P25" ; # LogicStart Joystick up
NET "W1B<15>" LOC = "P22" ; # LogicStart Joystick Select

# Wing2 Column C
NET "W2C<0>" LOC = "P91" ; # LogicStart Switch 7
NET "W2C<1>" LOC = "P92" ; # LogicStart Switch 6
NET "W2C<2>" LOC = "P94" ; # LogicStart Switch 5
NET "W2C<3>" LOC = "P95" ; # LogicStart Switch 4
NET "W2C<4>" LOC = "P98" ; # LogicStart Switch 3
NET "W2C<5>" LOC = "P2" ; # LogicStart Switch 2
NET "W2C<6>" LOC = "P3" ; # LogicStart Switch 1
NET "W2C<7>" LOC = "P4" ; # LogicStart Switch 0
NET "W2C<8>" LOC = "P5" ; # LogicStart LED 7
NET "W2C<9>" LOC = "P9" ; # LogicStart LED 6
NET "W2C<10>" LOC = "P10" ; # LogicStart LED 5
NET "W2C<11>" LOC = "P11" ; # LogicStart LED 4
NET "W2C<12>" LOC = "P12" ; # LogicStart LED 3
NET "W2C<13>" LOC = "P15" ; # LogicStart LED 2
NET "W2C<14>" LOC = "P16" ; # LogicStart LED 1
NET "W2C<15>" LOC = "P17" ; # LogicStart LED 0

## RS232
NET "rx" LOC = "P88" | IOSTANDARD = LVCMOS25 ;
NET "tx" LOC = "P90" | IOSTANDARD = LVCMOS25 | DRIVE = 4 | SLEW = SLOW ;

The complete Basys2 constraint file

# This file is a general .ucf for Basys2 rev C board
# To use it in a project:
# - remove or comment the lines corresponding to unused pins
# - rename the used signals according to the project
# clock pin for Basys2 Board

NET "mclk" LOC = "B8"; # Bank = 0, Signal name = MCLK
NET "uclk" LOC = "M6"; # Bank = 2, Signal name = UCLK
NET "mclk" CLOCK_DEDICATED_ROUTE = FALSE;
NET "uclk" CLOCK_DEDICATED_ROUTE = FALSE;

# Pin assignment for EppCtl
# Connected to Basys2 onBoard USB controller

NET "EppAstb" LOC = "F2"; # Bank = 3
NET "EppDstb" LOC = "F1"; # Bank = 3
NET "EppWR" LOC = "C2"; # Bank = 3
NET "EppWait" LOC = "D2"; # Bank = 3
NET "EppDB<0>" LOC = "N2"; # Bank = 2
NET "EppDB<1>" LOC = "M2"; # Bank = 2
NET "EppDB<2>" LOC = "M1"; # Bank = 3
NET "EppDB<3>" LOC = "L1"; # Bank = 3
NET "EppDB<4>" LOC = "L2"; # Bank = 3
NET "EppDB<5>" LOC = "H2"; # Bank = 3
NET "EppDB<6>" LOC = "H1"; # Bank = 3
NET "EppDB<7>" LOC = "H3"; # Bank = 3

# Pin assignment for DispCtl
# Connected to Basys2 onBoard 7seg display

NET "seg<0>" LOC = "L14"; # Bank = 1, Signal name = CA
NET "seg<1>" LOC = "H12"; # Bank = 1, Signal name = CB
NET "seg<2>" LOC = "N14"; # Bank = 1, Signal name = CC
NET "seg<3>" LOC = "N11"; # Bank = 2, Signal name = CD
NET "seg<4>" LOC = "P12"; # Bank = 2, Signal name = CE
NET "seg<5>" LOC = "L13"; # Bank = 1, Signal name = CF
NET "seg<6>" LOC = "M12"; # Bank = 1, Signal name = CG
NET "dp" LOC = "N13"; # Bank = 1, Signal name = DP
NET "an<3>" LOC = "K14"; # Bank = 1, Signal name = AN3
NET "an<2>" LOC = "M13"; # Bank = 1, Signal name = AN2
NET "an<1>" LOC = "J12"; # Bank = 1, Signal name = AN1
NET "an<0>" LOC = "F12"; # Bank = 1, Signal name = AN0

# Pin assignment for LEDs

NET "Led<7>" LOC = "G1" ; # Bank = 3, Signal name = LD7
NET "Led<6>" LOC = "P4" ; # Bank = 2, Signal name = LD6
NET "Led<5>" LOC = "N4" ; # Bank = 2, Signal name = LD5
NET "Led<4>" LOC = "N5" ; # Bank = 2, Signal name = LD4
NET "Led<3>" LOC = "P6" ; # Bank = 2, Signal name = LD3
NET "Led<2>" LOC = "P7" ; # Bank = 3, Signal name = LD2
NET "Led<1>" LOC = "M11" ; # Bank = 2, Signal name = LD1
NET "Led<0>" LOC = "M5" ; # Bank = 2, Signal name = LD0

# Pin assignment for SWs

NET "sw<7>" LOC = "N3"; # Bank = 2, Signal name = SW7
NET "sw<6>" LOC = "E2"; # Bank = 3, Signal name = SW6
NET "sw<5>" LOC = "F3"; # Bank = 3, Signal name = SW5
NET "sw<4>" LOC = "G3"; # Bank = 3, Signal name = SW4
NET "sw<3>" LOC = "B4"; # Bank = 3, Signal name = SW3
NET "sw<2>" LOC = "K3"; # Bank = 3, Signal name = SW2
NET "sw<1>" LOC = "L3"; # Bank = 3, Signal name = SW1
NET "sw<0>" LOC = "P11"; # Bank = 2, Signal name = SW0
NET "btn<3>" LOC = "A7"; # Bank = 1, Signal name = BTN3
NET "btn<2>" LOC = "M4"; # Bank = 0, Signal name = BTN2
NET "btn<1>" LOC = "C11"; # Bank = 2, Signal name = BTN1
NET "btn<0>" LOC = "G12"; # Bank = 0, Signal name = BTN0

# Loop back/demo signals
# Pin assignment for PS2

NET "PS2C" LOC = "B1" | DRIVE = 2 | PULLUP ; # Bank = 3, Signal name = PS2C
NET "PS2D" LOC = "C3" | DRIVE = 2 | PULLUP ; # Bank = 3, Signal name = PS2D

# Pin assignment for VGA

NET "HSYNC" LOC = "J14" | DRIVE = 2 | PULLUP ; # Bank = 1, Signal name = HSYNC
NET "VSYNC" LOC = "K13" | DRIVE = 2 | PULLUP ; # Bank = 1, Signal name = VSYNC
NET "OutRed<2>" LOC = "F13" | DRIVE = 2 | PULLUP ; # Bank = 1, Signal name = RED2
NET "OutRed<1>" LOC = "D13" | DRIVE = 2 | PULLUP ; # Bank = 1, Signal name = RED1
NET "OutRed<0>" LOC = "C14" | DRIVE = 2 | PULLUP ; # Bank = 1, Signal name = RED0
NET "OutGreen<2>" LOC = "G14" | DRIVE = 2 | PULLUP ; # Bank = 1, Signal name = GRN2
NET "OutGreen<1>" LOC = "G13" | DRIVE = 2 | PULLUP ; # Bank = 1, Signal name = GRN1
NET "OutGreen<0>" LOC = "F14" | DRIVE = 2 | PULLUP ; # Bank = 1, Signal name = GRN0
NET "OutBlue<2>" LOC = "J13" | DRIVE = 2 | PULLUP ; # Bank = 1, Signal name = BLU2
NET "OutBlue<1>" LOC = "H13" | DRIVE = 2 | PULLUP ; # Bank = 1, Signal name = BLU1

# Loop Back only tested signals

NET "PIO<72>" LOC = "B2" | DRIVE = 2 | PULLUP ; # Bank = 1, Signal name = JA1
NET "PIO<73>" LOC = "A3" | DRIVE = 2 | PULLUP ; # Bank = 1, Signal name = JA2
NET "PIO<74>" LOC = "J3" | DRIVE = 2 | PULLUP ; # Bank = 1, Signal name = JA3
NET "PIO<75>" LOC = "B5" | DRIVE = 2 | PULLUP ; # Bank = 1, Signal name = JA4
NET "PIO<76>" LOC = "C6" | DRIVE = 2 | PULLUP ; # Bank = 1, Signal name = JB1
NET "PIO<77>" LOC = "B6" | DRIVE = 2 | PULLUP ; # Bank = 1, Signal name = JB2
NET "PIO<78>" LOC = "C5" | DRIVE = 2 | PULLUP ; # Bank = 1, Signal name = JB3
NET "PIO<79>" LOC = "B7" | DRIVE = 2 | PULLUP ; # Bank = 1, Signal name = JB4
NET "PIO<80>" LOC = "A9" | DRIVE = 2 | PULLUP ; # Bank = 1, Signal name = JC1
NET "PIO<81>" LOC = "B9" | DRIVE = 2 | PULLUP ; # Bank = 1, Signal name = JC2
NET "PIO<82>" LOC = "A10" | DRIVE = 2 | PULLUP ; # Bank = 1, Signal name = JC3
NET "PIO<83>" LOC = "C9" | DRIVE = 2 | PULLUP ; # Bank = 1, Signal name = JC4
NET "PIO<84>" LOC = "C12" | DRIVE = 2 | PULLUP ; # Bank = 1, Signal name = JD1
NET "PIO<85>" LOC = "A13" | DRIVE = 2 | PULLUP ; # Bank = 2, Signal name = JD2
NET "PIO<86>" LOC = "C13" | DRIVE = 2 | PULLUP ; # Bank = 1, Signal name = JD3
NET "PIO<87>" LOC = "D12" | DRIVE = 2 | PULLUP ; # Bank = 2, Signal name = JD4

Saturday 28 January 2017

Using tri-state logic

After reviewing all the learning to date I realised that I have failed to cover tri-state logic! Although common when building projects using individual chips it only really makes an appearance in FPGA designs when interfacing to external components (explaining why it was only seen when interfacing to the Basys2’s bidirectional EPP port).

What is tri-state logic?

Put simply, tri-state logic is where a signal can be either "logic high level", "logic low level" or "not actively driven" - 1, 0 and Z in VHDL. This allows the same wire / signal to be used as both an input or output, or allow multiple devices to "share" a common bus.

The most familiar example is a RAM chip’s data bus. During the read cycles the memory chip drives the data bus, and during write cycles the memory controller drives the data bus. To enable this, most RAM chips have a signal called "Output Enable" ("OE") that tells the chip when to drive the bus.

On a tri-state bus all devices on the bus can read the value of the bus at any time, but to avoid data corruption your design must ensure that one device should drive the bus at any time. Should two or more devices try to drive the bus to different values at the same time the data on the bus will be corrupted. If this overlap of multiple devices driving the bus lasts for only a short time then an error may not occur, but you will get increased power usage and signal integrity issues as the output drivers are saturated.

How is tri-state logic used within an FPGA

In short, for the Spartan 3E it isn’t. To avoid timing and power issues, the design tools ensure that any signals are only ever driven by one device.

Any internal tri-state logic within a design is mapped into hidden "input" and "output" signals. The bus is then implemented with a multiplexer that selects the active output signal and then delivers that signal to all the inputs.

How is tri-state logic is use when interfacing with an FPGA

Most general purpose I/O pins of an FPGA are driven by a tri-state driver, and the pin is monitored by an input buffer. When any internal tri-state signal is attached to an I/O pin it is implemented as three signals driving an IOBUF component:


• T controls the state of the tri-state driver
• O is the value of the pin
• I is the value that will be sent to the pin when T is asserted

Yes, the signal names do seem the wrong way around, but they are from the IOBUF’s point of view. 

Project - using tri-state logic

Sadly this project is Basys2 only - as on the Papilio One the LogicStart MegaWing uses all the I/O pins. It is possible to remove the MegaWing and connect directly to the headers on the Papilio One if you want. . .

• Create a new project
• Configure two of the PMOD pins. Remember to define the PMOD pins as "INOUT"!
• Have 2 LEDs show the status of the two pins on a PMOD connector,

led(0) <= pmod(0);
led(1) <= pmod(1);

• Connect two slide switches to these pins

pmod(0) <= sw(0);
pmod(1) <= sw(1);

• Put a 300 Ohm + resistor between the two pins (to limit the current if both pins are driven at once)
• Put a voltmeter across the resistor
• Play around with the design
– What is the highest voltage you can place over the resistor?
– How much power is this ( remember P=Vˆ2/R)
• Using a third slide switch decide which of the pins will be in high-Z mode. Something like:

process(sw)
begin
if sw(2) = ’1’ then
pmod(0) <= ’Z’;
pmod(1) <= sw(1);
else
pmod(0) <= sw(0);
pmod(1) <= ’Z’;
end if;
end process;

• Play around with it
– What is the highest voltage you can get over the resistor now?
– How much power is this?

Friday 27 January 2017

Using an ADC

This chapter is only applicable to the Papilio One board when used with the LogicStart MegaWing, as the Basys2 does not include any ADC functionality - it is still a useful read as it shows how simple peripherals can be to interface to.

Unlike other projects so far, I’ve included the full code for the module, giving some sort of reference implementation that can be used to verify your own design.

The ADC

The ADC on the LogicStart is an eight-channel 12-bit ADC, with a serial interface compatible with the Serial Peripheral Interface Bus ("SPI") standard. The reference voltage for the ADC is 3.3V, giving a resolution of about 0.8mV.

The official SPI bus specifications uses four logic signals. They are called:

• SCLK: serial clock (output from master);
• MOSI; SIMO: master output, slave input (output from master);
• MISO; SOMI: master input, slave output (output from slave);
• SS: slave select (active low, output from master).

But for this design I’m following the names used in the datasheet - which are named from the perspective of the slave device:

• CS; Chip Select
• DIN; Data In
• DOUT; Data Out
• SCLK; Serial Clock

To read channel 0 of the ADC it is pretty simple:

• Hold DIN low (this ensures that you read channel 0)
• Hold CS high while the ADC is idle
• Lower CS when you are ready to convert a sample
• Send 16 clock pulses with a frequency somewhere between 8MHz and 16MHz
• Raise CS when finished

The data bits will be available on DOUT for clock pulses 4 through 16.


Reading a different channel is a little harder - you need to give the ADC the bits to select the channel for the next sample on clock pulses 2, 3 and 4. These bits are sent in MSB first order. This sounds simple enough, but as ever the difficulty is in the details. To make this work the setup and holdup times must be factored in:

• CS must go low a few ns before the SCLK line drops for the first time
• The DOUT signal transitions just after the rising edge of the SCLK signal. For reliable results it needs to be sampled in the middle of the clock pulse
• The DIN signal must be given enough time to be stable before the SCLK falls

I decided that the easiest way to do this is to run a counter at the 32MHz clock of the crystal, then the gross timings for the signals are:
• the SCLK signal is generated from bit 2 of a counter running at the system clock of 32MHz
• bits 3 through 6 indicate what bit of the frame we are on
• if bit 7 or over are set, then CS is held high
• data is sampled when the lowest two bits are "10"

To ensure that I don’t have any setup and holdup time issues with the interface, a shift register is used to delay the SCLK signal by one cycle, and a second shift register is used to delay DIN by three clocks. This ensures that CS and DIN have plenty of setup and holdup time.

VHDL for the interface

library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_unsigned.all;
entity AtoD is
port
(
clk : IN std_logic;
-- user interface
switches : IN std_logic_vector(2 downto 0);
leds : OUT std_logic_vector(7 downto 0);
-- Signals to the ADC
ADC_CS_N : OUT std_logic;
ADC_SCLK : OUT std_logic;
ADC_DIN : OUT std_logic;
ADC_DOUT : IN std_logic
);
end entity;
architecture rtl of AtoD is
-- Counter - the lowest 6 bits are used to control signals to the ADC.
-- The rest are used to activate the ADC when 0
signal counter : std_logic_vector(22 downto 0) := (others =>’0’);
-- shift registers fo delay output signals
signal clk_shiftreg : std_logic_vector( 1 downto 0) := (others =>’0’);
signal dataout_shiftreg : std_logic_vector( 2 downto 0) := (others =>’0’);
-- shift register to collect incoming bits
signal datain_shiftreg : std_logic_vector(11 downto 0) := (others =>’0’);
-- register to hold the current channel
signal channel_hold : std_logic_vector( 2 downto 0) := (others =>’0’);
signal adc_active : std_logic;
begin
-- set outoging signals
adc_din <= dataout_shiftreg(2);
adc_sclk <= clk_shiftreg(1);
with counter(22 downto 6) select adc_active <= ’1’ when "00000000000000000",
’0’ when others;
process (clk)
begin
if rising_edge(clk) then
-- A small shift register delays the clk by one cycle (31.25ns) to ensure timings are  -
met.
clk_shiftreg(1) <= clk_shiftreg(0);
-- Including adc_cs_n in a clocked process to ensure that it is adc_cs is implemented  -
in a flipflop
adc_cs_n <= not(adc_active);
if adc_active = ’1’ then
clk_shiftreg(0) <= counter(1);
else
clk_shiftreg(0) <= ’1’;
end if;
-- This controls where we send out the address to the ADC (bits 2,3 and 4 of the  -
stream)
-- we use a short shift register to ensure that the ADC_DOUT transistions are delayed
-- 31 ns or so from the clk transitions
dataout_shiftreg(2 downto 1) <= dataout_shiftreg(1 downto 0);
if adc_active = ’1’ then
case counter(5 downto 2) is
when "0010" => dataout_shiftreg(0) <= channel_hold(2);
when "0011" => dataout_shiftreg(0) <= channel_hold(1);
when "0100" => dataout_shiftreg(0) <= channel_hold(0);
when others => dataout_shiftreg(0) <= ’0’;
end case;
-- As counter(2) is used used to generate sclk, this test ensures that we
-- capture bits right in the middle of the clock pulse
if counter(5 downto 0) = "000000" then
channel_hold <= switches;
end if;
if counter(1 downto 0) = "11" then
datain_shiftreg <= datain_shiftreg(10 downto 0) & adc_dout;
end if;
-- When we have captured the last bit it is the time to update the output.
if counter(5 downto 0) = "111111" then
-- Normally you would grab "datain_shiftreg(10 downto 0) & adc_dout" for 12 bits
LEDs <= datain_shiftreg(10 downto 3);
end if;
else
dataout_shiftreg(0) <= ’0’;
end if;
counter <= counter+1;
end if;
end process;
end rtl;

Constraints for the Papilio One board:

The constraints required to implement the interface are:
--------------------------------------
NET LEDs(7) LOC = "P5";
NET LEDs(6) LOC = "P9";
NET LEDs(5) LOC = "P10";
NET LEDs(4) LOC = "P11";
NET LEDs(3) LOC = "P12";
NET LEDs(2) LOC = "P15";
NET LEDs(1) LOC = "P16";
NET LEDs(0) LOC = "P17";
NET switches(2) LOC = "P2";
NET switches(1) LOC = "P3";
NET switches(0) LOC = "P4";
NET ADC_CS_N LOC="P70";
NET ADC_SCLK LOC="P86";
NET ADC_DOUT LOC="P79";
NET ADC_DIN LOC="P84";
NET "clk" LOC="P89" | IOSTANDARD=LVCMOS25 | PERIOD=31.25ns;
--------------------------------------
Project - Playing with the ADC

• Modify the above project to output all 12 bits, and display the value on the Seven Segment display in hex. 

A jumper wire with a 100 Ohm resistor is useful for testing, but only test using the GND, 2.5V and 3.3V signals - connecting the ADC to 5V will damage it! Another option is to use one of the colour channels on the VGA socket, giving you a range of sixteen test values.

• If you multiply the value received by 129/16, you have a range of 0 to 33016 - very close to 10,000*Vin. The multiplication is easy to do in logic, but can you convert the resulting binary back to decimal to display on the seven segment display? One easy way would be to build a decimal counter, that counts up to the sampled value.

Wednesday 25 January 2017

Binary Multiplication

Up to now we have managed to complete all the projects using only logical operations and addition or subtraction. But then there comes a time when you need multiplication - and this is where FPGAs really shine.
After going through the basics of binary multiplication you’ll be introduced to the embedded multiplier blocks in the Spartan 3E. The XC3S250E FPGAs have twelve of these blocks, allowing you to do number crunching of well over two billion 18-bit multiplications per second, allowing it to compete with a desktop CPU core.

Performance of binary multiplication

Binary multiplication is complex - implementing multiplication of any two n-bit numbers requires approximately n*(n-1) full adders, n half adders and n*n AND operations.

To multiply the four-bit binary numbers "abcd" by "efgh" the following operations are needed (where & is a binary AND):

+                                           a&h    b&h      c&h     d&h
+                               a&g     b&g    c&g      d&g      0
+                   a&f      b&f     c&f     d&f       0           0
+         a&e   b&e      c&e     d&e      0          0           0
     ---   ---      ---         ---       ---       ---        ---          ---
=    ?     ?        ?            ?        ?          ?           ?           ?

Multiplication also has a big implication for your design’s performance - because of the carries required, multiplying two n-bit numbers takes around twice as long as adding two n-bit numbers.

It also consumes a very large amount of logic if multiplication is implemented in the generic logic blocks within the FPGA.

Multiplication in FPGAs

To get around this, most FPGAs include multiple multiplier blocks - an XC3S100 has four 18 bit x 18 bit multipliers, and a XC3S250 has twelve!

To improve performance, multipliers also include additional registers, allowing the multiplicands and result to be registeredwithin the multiplier block. There are also optional registers within the multiplier that hold the partial resulthalf way through the multiplication.

Using these internal registers greatly improves throughput performance by removing the routing delays to get the inputs to and from the multipliers, but at the cost of increased latency - measured in either time between two numbers being given to the multiplier and the result being available, or the number of clock cycles.

When all these internal registers are enabled the multiplier works as follows:

Clock                          cycle Action
 0                                A and B inputs are latched
 1                                The first half of the multiplication is performed
 2                                The second half of the multiplication is performed
 3                                The result of the multiplication is available on the P output

Multipliers can accept a new set of A and B values each clock cycle, so up to three can be in flight at any one time. In some cases this is useful but in other cases it can be annoying.

A useful case might be processing Red/Green/Blue video values, where each channel is separate.

An annoying case is where feedback is needed of the output value back into the input. If the math isn’t in your favor you may be better off not using any registers at all - it may even be slightly faster and running at one-third the clock speed will use less power.

What if 18x18 isn’t wide enough?

What if you want to use bigger numbers? Say 32 bits? Sure!

Just like in decimal when multiplying pairs of two-digit numbers "ab" and "cd" is calculated as "a*c*10*10 + b*c*10 + a*d*10 + b*d" the same works - just replace each of a,v,c,d with an 18 bit number, and each 10 with 2ˆ18.

As the designer you have the choice of either:

• using four multipliers and three adders, with a best-performance latency of 5 cycles, and with a throughput of one pair of A and B values per clock
• using the same multiplier to calculate each of the four intermediate products, with a best-performance latency of 13 cycles (four 3-cycle multiplications plus the final addition) and with careful scheduling you can process three input pairs every 12 cycles

Project - Digital volume control

• Revisit the Audio output project
• Use the CORE Generator to add an IP multiplier, with an 8 bit unsigned input for the volume and the other matching your BRAM’s sample size
• Add a multiplier between the block BRAM and the DAC, using the value of switches as the other input
• Use the highest output bits of the multiplier to feed the DAC
• If you get the correct signed / unsigned settings for each input of the multiplier you will now be able to control the volume with the switches

Unless you are careful you may have issues with mismatching signed and unsigned values - it pays to simulate this carefully!

• You can also implement multiplication using the generic logic ("LUT"s). If interested, you can change the IP multiplier to use LUTs instead of the dedicated multiplier blocks and compare maximum performance and resource usage.

A high speed external interface

This chapter is only applicable to Basys2 board - The Papilio board only has a serial port. It also assumes that you are using the

Windows OS - but I’m sure that only minor changes are needed for it all to work under Linux too.

The Digilent Parallel Interface

Digilent FPGA boards have a port of the USB interface wired to the FPGA. I’ve used this to transfer data at up to 11 megabytes per second (but only on a Nexys2 - the interface Basys2 is much slower!). The supplied documentation is pretty terse, so here is a quick start guide.

The interface implements the long obsolete EPP protocol that was traditionally used to talk to parallel port scanners. It allows the connected device to address up to 256 8-bit registers that can be implemented within the FPGA.

These registers can either be read by the host PC one byte at a time, or a "Repeat" function can be called to read multiple bytes from the same register.

The "make or break" shortcoming of this interface is that there is no interrupt signal going back to the host which would allow the FPGA get its attention. Unlike when using RS-232 this forces the host software to poll the FPGA at regular intervals - which is not ideal for responsiveness or CPU usage.

The FPGA side of the interface

The following signals make up the interface:

Name                               Type                             Description
DB(7 downto 0)             INOUT                          Data bus
WRITE                           IN                                  Write enable (active low) - data will be written from                                                                               the host during this cycle
ASTB                             IN                                  Address strobe (active low) - data bus will be                                                                                          captured into the address register
DSTB                             IN                                 Data strobe (active low) - the bus will be captured                                                                                    into the currently selected data register
WAIT                           OUT                               Asserted when FPGA is ready to accept data
INT                               OUT                               Interrupt request - not used
RESET                          IN                                   Reset - not used

Read Transaction

The steps in a read transaction are:

• Host lowers ASTB or DSTB to commence read of either the address register or the selected data register
• FPGA presents data on data bus
• FPGA raises WAIT indicating that the data is valid
• Host captures the data
• Host raises ASTB or DSTB
• FPGA removes the data from the data bus
• FPGA lowers WAIT to finish transaction

Write Transaction

The steps in a write transaction are:

• Host presents data on the data bus
• Host lowers write wnable to 0
• Host lowers either ASTB or DSTB to commence write of either the address register or the selected data register
• FPGA raises WAIT once data is captured
• Host raises ASTB or DSTB, removes data from bus and raises write enable
• FPGA lowers WAIT to finish transaction

FSM diagram


Constraints for the BASYS2 board

The constraints required to implement the interface are:

NET "EppAstb" LOC = "F2"; # Bank = 3
NET "EppDstb" LOC = "F1"; # Bank = 3
NET "EppWR" LOC = "C2"; # Bank = 3
NET "EppWait" LOC = "D2"; # Bank = 3
NET "EppDB<0>" LOC = "N2"; # Bank = 2
NET "EppDB<1>" LOC = "M2"; # Bank = 2
NET "EppDB<2>" LOC = "M1"; # Bank = 3
NET "EppDB<3>" LOC = "L1"; # Bank = 3
NET "EppDB<4>" LOC = "L2"; # Bank = 3
NET "EppDB<5>" LOC = "H2"; # Bank = 3
NET "EppDB<6>" LOC = "H1"; # Bank = 3
NET "EppDB<7>" LOC = "H3"; # Bank = 3

VHDL for the FPGA interface

This source allows you to set the LEDs and read the switches from the PC. It has a few VHDL features that you won’t have seen up to now:

• The EppDB (EPP Data Bus) is INOUT - a tri-state bidirectional bus. When you assign "ZZZZZZZZ" (high impedance) to the signal it will then ’read’ as the input from the outside world. This is only really useful on I/O pins - within the FPGA all tri-state logic is implemented using multiplexers.
• It uses an enumerated type to hold the FSM state. This is only really useful if you don’t want to use individual bits within the state value to drive logic (which is usually a good way to get glitch free outputs)

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity epp_interface is
port (Clk : in std_logic;
-- EPP interface
EppAstb : in std_logic;
EppDstb : in std_logic;
EppWR : in std_logic;
EppWait : out std_logic;
EppDB : inout std_logic_vector(7 downto 0);
-- Feedback
switches: in std_logic_vector(7 downto 0);
leds : out std_logic_vector(7 downto 0)
);
end epp_interface;
architecture Behavioral of epp_interface is
type epp_state is (idle, data_read, data_write, addr_read, addr_write);
signal state : epp_state := idle;
signal address : std_logic_vector(7 downto 0) := (others => ’0’);
signal port0data : std_logic_vector(7 downto 0) := (others => ’0’);
begin
process(clk)
begin
if rising_edge(clk) then
case state is
when data_read =>
EppWait <= ’1’;
case address is
when "00000000" =>
EppDB <= not port0data;
when "00000001" =>
EppDB <= switches;
when others =>
end case;
if EppDstb = ’1’ then
state <= idle;
end if;
when data_write =>
EppWait <= ’1’;
case address is
when "00000000" =>
port0data <= EppDB;
when "00000001" =>
leds <= EppDB;
when others =>
end case;
if EppDstb = ’1’ then
state <= idle;
end if;
when addr_read =>
EppWait <= ’1’;
EppDB <= address;
if EppAstb = ’1’ then
state <= idle;
end if;
when addr_write =>
EppWait <= ’1’;
address <= eppDB;
if EppAstb = ’1’ then
state <= idle;
end if;
when others =>
EppWait <= ’0’;
EppDB <= "ZZZZZZZZ";
if EppWr = ’0’ then
if EppAstb = ’0’ then
state <= addr_write;
elsif EppDstb = ’0’ then
state <= data_write;
end if;
else
if EppDstb = ’0’ then
state <= data_read;
elsif EppAstb = ’0’ then
state <= addr_read;
end if;
end if;
end case;
end if;
end process;
end Behavioral;

The PC side of the interface

Header files and libraries

These are in the Adept SDK, which can be downloaded from http://www.digilentinc.com/Products/Detail.cfm?Prod=ADEPT2
The zip file includes all the files you need, including documentation, libraries and examples.

The following header files are needed in your C code:
• gendefs.h
• dpcdefs.h
• dpcutil.h
You will also need to add the path to the libraries into your project’s linking settings.

Connecting to a device

Connecting isn’t that simple, but it’s not that hard either. Three functions are needed:

• DpcInit()
• DvmgGetDefaultDev()
• DvmgGetDevName()

if (!DpcInit(&erc)) {
printf("Unable to initialise\n");
return 0;
}
id = DvmgGetDefaultDev(&erc);
if (id == -1) {
printf("No default device\n");
goto error;
}
if(!DvmgGetDevName(id, device, &erc)) {
printf("No device name\n");
goto error;
}

The first time you make use of the interface you may need to call one more function only once to present a dialogue box allowing you to select which FPGA board will be your default device:

• DvmgStartConfigureDevices() Once used, the settings will be saved in the registry and will persist.

Connecting to the EPP port of that device

One function is used to connect to the device (vs connecting to the JTAG port):
• DpcOpenData()

if (!DpcOpenData(&hif, device, &erc, NULL)) {
goto fail;
}

 Reading a port

Reading a port is achieved with either of these functions:

• DpcGetReg() - Read a single byte from a register
• DpcGetRegRepeat() - Read multiple bytes from a register
Here’s an example function that opens the EPP port and reads a single register:

static int GetReg(unsigned char r) {
unsigned char b;
ERC erc;
HANDLE hif;
if (!DpcOpenData(&hif, device, &erc, NULL)) {
goto fail;
}
if (!DpcGetReg(hif, r, &b, &erc, NULL)) {
DpcCloseData(hif,&erc);
goto fail;
}
erc = DpcGetFirstError(hif);
DpcCloseData(hif, &erc);
if (erc == ercNoError)
return b;
fail:
return -1;
}

Writing to a register

Writing to a port is achieved with either of these functions:

• DpcPutReg() - Read a single byte from a register
• DpcPutRegRepeat() - Read multiple bytes from a register
Here’s an example function that opens the EPP port and writes to a single register

static int PutReg(unsigned char r, unsigned char b) {
ERC erc;
HANDLE hif;
printf("Put %i %i\n",r,b);
if (!DpcOpenData(&hif, device, &erc, NULL)) {
goto fail;
}
if(!DpcPutReg(hif, r, b, &erc, NULL)) {
DpcCloseData(hif,&erc);
goto fail;
}
erc = DpcGetFirstError(hif);
DpcCloseData(hif, &erc);
if (erc == ercNoError)
return 0;
fail:
return -1;
}

Closing the EPP port

One function is used to close the EPP port:
• DpcCloseData()

DpcCloseData(hif, &erc);
if (erc == ercNoError)
return b;

Closing the interface

It is always good to clean up after yourself. Use the following function to do so:
• DpcTerm()
DpcTerm();

Project - Using the PC end of the interface

• Download and configure your board with the "Adept I/O expansion reference design" project from http://www.digilentinc.com/-
Products/Detail.cfm?Prod=BASYS2
• Check that the Adept I/O expansion tab responds to changes in the switches


• Create a C program that opens the interface and reads a single byte from registers 5 and 6 and displays the value to the screen
• Close off Adept and check that your C program also shows the state of the switches on the Basys2
• Expand your C program to write to the value of the switches to register 1 - this is the LEDs You now have the host side of bidirectional communication sorted!

Project - Implementing the FPGA end of the interface

• Create a new FPGA project
• Create a module that implements the EPP protocol - or use the one of Digilent’s reference designs if you want
• Connect writes of register 1 to the LEDs
• Connect reads of register 5 or 6 to the switches
• Test that your design works just as well with your program as Digilent’s reference design

Monday 23 January 2017

Communicating with the outside world

So, after displaying something on a VGA monitor, how do we talk to a PC?

In this chapter you will build the transmit part of a serial (RS-232) interface, using shift registers. On the Papilio One you can talk directly to the USB interface, but on the Basys2 you will need a USB to 3.3V Serial breakout board.

What is RS-232?

RS-232 is a very old standard originally used to interface digital systems with analogue phone lines and other data circuits. It enables relatively low speed communication between devices, and is relatively simple to implement.

If hardware handshaking is not used only three wires are needed:

Wire                         Use
GND                   Signal Ground
TX                      Transmitted data
RX                      Received data

Which signal a device listens to for incoming data and which signal it actively sends data is very confusing. If the device is a "DTE" (Data Terminating Equipment) it transmits on TX and listens on RX. If the device is "Data Communicating Equipment" (e.g., a modem) it listens for data on TX and transmits on RX.

The standard speeds range from 75 baud up to 115,200 baud, with 9600 or 19200 being the most common speeds for data that is presented to people (such as on a serial console).

As well as baud speed, both ends of a connection must be using the same frame parameters - the most common being one start bit, eight data bits, no parity bit and one stop bit. As the frame is ten bits long, at 9600 baud you can send 960 bytes per second.

There is a whole lot more to the standard, mostly around how senders and receivers control the flow of data to ensure that data does not overrun receiving buffers. When using modern hardware at slow speeds handshaking isn’t really an issue.

Here is what the signal should look like on the wire:


Generating an RS-232 signal

For this project we need a shift register (well two actually). So what does a shift register look like in VHDL?

Here is a 16-bit register that loops from bit 0 to bit 15 - a much simpler way to generate one pulse every 16 cycles than using a counter.
...
signal shiftreg : std_logic_vector(15 downto 0) := "0000000000000001";
...
if rising_edge(clk) then
shiftreg <= shiftreg(0) & shiftreg(15 downto 1);
end if;

For RS-252 we use pretty much this construct, but feed in the idle bit value (1). This code will send the Z character once (after which the shift register is filled with ’1’s):
...
signal shiftreg : std_logic_vector(9 downto 0) := "1010110100";
...
data_out <= shiftreg(0);
...
if rising_edge(clk) then
shiftreg <= ’1’ & shiftreg(9 downto 1)
end if;

The user data is bits 8 downto 1 - this is the "byte" of user data - bit 0 is the start bit, and bit 9 is the stop bit. I chose the ASCII code for Z as it will still be a Z regardless of if the least or most significant bit gets transferred first - very useful for initial testing!

The only problem with the code so far is that we are transmitting at the clock speed - either 32,000,000 or 50,000,000 baud! To control the rate of sending we also need a counter that allows a bit to be sent at 9600 baud - once every 3,333 cycles (at 32MHz) or once every 5,208 cycles (@50MHz):
...
signal shiftreg : std_logic_vector(9 downto 0) := "1010110100";
signal counter : std_logic_vector(12 downto 0) := (others => ’0’);
...
data_out <= shiftreg(0);
...
if rising_edge(clk) then
if counter = 3332 then
shiftreg <= ’1’ & shiftreg(9 downto 1);
counter <= (others => ’0’);
else
counter <= counter+1;
end if;
end if;

We can make it send the same data over and over again by making the shift register longer and looping the shift register’s output back on its input. To do this it needs a longer shift register, ensuring that we have some quiet space following the stop bit to allow the receiver to frame the data correctly:
...
signal shiftreg : std_logic_vector(15 downto 0) := "1111111010110100";
signal counter : std_logic_vector(12 downto 0) := (others => ’0’);
...
data_out <= shiftreg(0);
...
if rising_edge(clk) then
if counter = 3332 then
shiftreg <= shiftreg(0) & shiftreg(15 downto 1);
counter <= (others => ’0’);
else
counter <= counter+1;
end if;
end if;

This code should be enough to enable you to test your RS-232 port actually sends data as expected.

Sending variable data

To make this useful you really need to be able to send different data bytes. And to do this correctly you have to know when the interface is busy. 

The easiest way to do this is to have a second shift register which is filled with ’1’s when the character is loaded into ’shiftreg’ and filled with \’0’s as bits are transmitted. Once this second shift register is all zeros, then things are ready for the next byte to be sent:

...
signal busyshiftreg : std_logic_vector(9 downto 0) := (others => ’0’);
signal datashiftreg : std_logic_vector(9 downto 0) := (others => ’1’);
signal counter : std_logic_vector(12 downto 0) := (others => ’0’);
...
data_out <= datashiftreg(0);
busy_out <= busyshiftreg(0);
...
if rising_edge(clk) then
if busyshiftreg(0) = ’0’ then
busyshiftreg <= (others => ’1’);
datashiftreg <= ’1’ & databyte & ’0’;
counter <= (others <= ’0’);
else
if counter = 3332 then
datashiftreg <= ’1’ & datashiftreg(9 downto 1);
busyshiftreg <= ’0’ & busyshiftreg(9 downto 1);
counter <= (others => ’0’);
else
counter <= counter+1;
end if;
end if;
end if;

The important bit is to remember to reset counter when a new byte is loaded into datashiftreg. Failing to do this will cause the start bit to be of different lengths - the project will work correctly when streaming bytes to the host, but will sometimes get garbage for the first few bytes of a message until it recovers from the bad bit.

Connecting your FPGA board to a PC

Caution

Connecting the FPGA directly to your serial port will most likely ruin your FPGA Most modern PCs do not have RS-232 ports, and if they do they are expecting the higher voltage levels that standard RS-232 uses - the standard uses up to +/- 25V!

To connect to a PC over USB you can use something like Sparkfun’s "FTDI Basic 3.3V - USB to Serial" (http://www.sparkfun.com/- products/9893) and jumper wires. Here’s my setup:



Tip
If you are using the Basys2 and want to talk to a true standards compliant RS-232 port, or if you want to avoid issues caused by loose wires you can use the RS-232 PMOD http://www.digilentinc.com/Products/Detail.cfm?Prod=PMOD-RS232 with your Basys2.

Project 16

• Create a project that sends ’Z’ over RS-232
• Create a project that sends the state of switches(3 downto 0) over RS-232
– You could increase the length of the shift register and send multiple bytes
– You could convert the data to ASCII and send four switches in a single byte
– You could map the 16 possible values into 16 contiguous printable characters (maybe characters A through P)
• Change it to only send a byte when the switches change
• Extend the project to send the state of all eight switches

Challenge

• What would happen if the input to the RS-232 TX component was to change, and then change back to its original state in less than 1/960th of a second? Can loss of data be avoided?

Sunday 22 January 2017

Generating a VGA signal

Aims of module

• Generate tight tolerance signals
• Display something on a VGA monitor

Let me know if I haven’t given enough directions on how to implement this module. I think that the less hand-holding given the greater the joy when your project actually displays something for the first time.

Special note for Basys2 users

The Basys2 reference manual infers that the oscillator on the board isn’t too stable. Digilent recommends using a quality aftermarket oscillator to correct this, but the reference manual has the wrong part number - you want to order a SGR-8002DCPCC- N from DigiKey (the only place that seems to have it!). You can test your board/monitor compatibility using the board self test that is in the flash, or from the file from Digilent’s web site if you suspect that this is an issue. I have not been able to get a current 1080p HD LCD monitor to display a picture (although I’ve only tried two), but it works on plenty of CRTs. A cheap fix may be adding additional load to the power supply with a 150 Ohm resistor will help - see http://www.youtube.com/- watch?v=bVee4dDwO1k I have had no such issues with my Papilio One - I’ve even generated signals 1920 x 1080 @ 60Hz (145MHz).

VGA signal timing

For this demo we will be aiming at 640x480. As detailed on http://tinyvga.com/vga timing/640x480@60Hz this required a pixel clock of 25.175MHz . 25MHz is close enough for most monitors to sync up and display a stable image, and using a DCM we can generate that frequency from either 32MHz of the Papilo’s crystal or the 50MHz of the Basys2 clock generator.

How does the VGA interface work?

In the "good ol’ days" monitors were analogue devices, using Cathode Ray Tubes. Two signals are used control the position of the electron beam on the display.

Vertical sync (vsync)

This signal is pulsed every 60th of a second, and takes the electron beam back to the top of the screen. Once back at the top of the screen the monitor would scan the beam slowly down the screen.
In this video mode the pulse is negative pulse of 0.063555ms duration, every 16.6832ms.

Horizontal sync (hsync)

This signal is a pulsed every 1/31,468th of a second, and takes the electron beam to the left hand side of the monitor. Once there the beam scans rather more rapidly to the right hand side. In this video mode, it is a positive pulse of 3.8133068us duration every 31.777557us. When properly timed, the correct hsync and vsync timings caused the electron beam to scan the whole visible area, so all that is
needed is the colour signals.

The colour signals - red, green and blue

These are analogue signals which control the intensity of each colour, and each pixel lasts 1/25,175,000th of a second. These signals should only be driven for the correct portion of the horizontal scan, as the monitor uses the "blanking interval" to register what voltages are used for black. There is two blanking intervals - the horizontal blanking interval (either side of the hsync pulse) and the vertical blacking interval (either side of the vsync pulse.

Pins used to drive the VGA connector

Ten pins are used to drive the VGA connecter - the Red, Green and Blue signals use a passive D2A convertor made out of resisters

The constraints for the Papilio board are:

NET "HSYNC" LOC = "J14" | DRIVE = 2;
NET "VSYNC" LOC = "K13" | DRIVE = 2;
NET "Red<2>" LOC = "F12" | DRIVE = 2;
NET "Red<1>" LOC = "D13" | DRIVE = 2;
NET "Red<0>" LOC = "C14" | DRIVE = 2;
NET "Green<2>" LOC = "G14" | DRIVE = 2;
NET "Green<1>" LOC = "G13" | DRIVE = 2;
NET "Green<0>" LOC = "F14" | DRIVE = 2;
NET "Blue<2>" LOC = "J13" | DRIVE = 2;
NET "Blue<1>" LOC = "H13" | DRIVE = 2;
The constraints for the Basys2 board are:
NET "HSYNC" LOC = "J14" | DRIVE = 2;
NET "VSYNC" LOC = "K13" | DRIVE = 2;
NET "Red<2>" LOC = "F13" | DRIVE = 2;
NET "Red<1>" LOC = "D13" | DRIVE = 2;
NET "Red<0>" LOC = "C14" | DRIVE = 2;
NET "Green<2>" LOC = "G14" | DRIVE = 2;
NET "Green<1>" LOC = "G13" | DRIVE = 2;
NET "Green<0>" LOC = "F14" | DRIVE = 2;
NET "Blue<2>" LOC = "J13" | DRIVE = 2;
NET "Blue<1>" LOC = "H13" | DRIVE = 2;

Making the timings easy implement

If you multiply the hsync and vsync timings by the pixel clock you will get something close to the following numbers:

Scanline (Horizontal) timing                      Duration in pixel clocks
Visible area                                                            640
Front porch                                                            16
Sync pulse                                                             96
Back porch                                                            48
Whole line                                                             800

The horizontal blanking interval is the front porch + sync pulse + back porch = 160 pixel clocks

Frame (vertical) timing                 Duration in lines (800 pixel clocks)
Visible area                                                    480
Front porch                                                     10
Sync pulse                                                       2
Back porch                                                      33
Whole frame                                                   525

The vertical blanking interval is the front porch + sync pulse + back porch = 45 lines

The RGB signal

Both baords can generate only 256 colours - eight shades of red, eight shades of green and four shades of blue. It does this using a passive D2A converter made up of a dozen or so resistors. There really isn’t much more to say!

Pseudo-code implementation

Implementation of the hsync and vsync signals should be coming clear. Here it is in pseudo-code:

hcounter and vcounter are 10 bit counters
every 1/25,000,000th of a second
if hcount == 799 then
hcount = 0
if vcount == 524 then
vcount = 0
else
vcount = vcount + 1
end if
else
hcount = hcount + 1
end if
if vcount >= 490 and vcount < 492 then
vsync = ’0’
else
vsync = ’1’
end if
if hcount >= 656 and hcount < 752 then
hsync = 0
else
hsync = 1
end if
if hcount < 640 and vcount < 480 then
display a colour on the RGB signals
else
display black colour on the RGB signals
end if

Project - Displaying something on a VGA monitor

• Create a new project to drive the VGA device. It needs to accept a clk signal and generate hsync, vsync, red(2 downto 0), green(2 downto 0) and blue(2 downto 1) outputs. ”Note that it is Blue(2 downto 1) not Blue(2 downto 0)”
• Add an implementation constraint file and add the definitions for clk and the 10 VGA signals.
• Implement the horizontal counter (you will need a ten-bit counter). Remember to include the unsigned library so you will be able to do numeric operations on STD_LOGIC_VECTOR signals.
• Run it in the simulator, and verify the pulse widths and direction.
• Implement the vertical counter (once again you will need a ten-bit counter). You can also verify this in the simulator, but as you need to simulate 16,667us to see the whole frame it can take a while!
• To generate a white image, assign ’1’s to all the RGB signals during the active time. Test this too in the simulator. You only want to see ’1’s for the first 640 pixel clocks of the first 480 lines.
• If all looks correct, plug a VGA monitor into your board. It should detect the signal and display an image.
• Rather than assigning ’1’s to the RGB values, experiment with assigning different bits out of hcounter and vcounter - you can make colour bars and check-board patterns.
• Look really closely at the simulation. Do the RGB values go to 1 when hcounter transitions from 799 back to 0? If not, why not?

 A common cause of problems

It looks as though this code doesn’t need to go into an "if rising_edge(clk) then. . . " block:

if hcount >= 656 and hcount < 752 then
hsync = ’0’
else
hsync = ’1’
end if
if vcount >= 490 and vcount < 492 then
vsync = ’0’
else
vsync = ’1’
end if

For maximum reliably, it does. As the counters ripple between two values (remember, at about 0.1ns per bit) the binary value of the counters will be in transition. If the signals are not buffered in a flip-flop, the hsync and vsync can contain unpredictable pulses of around 1ns wide. You won’t see these in simulation, and not many of us have a 1GHz Logic Analyser or ’scope, but it is really there.

I’ve generated a 1440x900 signal (105MHz clock rate) and used logic to display objects on the screen. If I didn’t buffer the RGB outputs, the objects wouldn’t show correctly or had fuzzy edges. Registering all the VGA signals made these problems go away, as the signals were solidly high or low for the entire clock duration.

This is only an annoyance while generating VGA signals, but if you are interfacing into other devices (e.g., SRAM) this can cause you no end of heartache. A few implementation time tool options are available that can alter this too, by forcing all the I/O flip-flops to be put as close to the pin as possible, instead of being buried away in the middle of the FPGA fabric. It is also possible to add an "IOB=TRUE" constraint to your UCF file to enable this behaviour on a pin by pin basis