RAM address conflicts and a Vivado synthesis bug

Introduction

My post yesterday got me digging into some issues regarding Vivado and BRAM usage. Vivado has a synthesis bug where it generates incorrect logic when you use the so-called “asynchronous” style of reading from a block RAM. You can read my previous post about placing bypass logic to emulate the asynchronous read style in a block RAM.

A quick note though. Xilinx calls this style of RAM read asynchronous. But it is only asynchronous if the read address is combinational. If you register the read address then the access is really synchronous.

Also, note that Xilinx will surely say that this is not a bug. The Vivado synthesis user’s guide says that in this case simulation won’t match synthesized results. I would argue that that is precisely the definition of a synthesis bug. Vivado should raise an error or at least a warning in this case. In fact, if you try to do a truly asynchronous read on a RAM, it will indicate that it cannot use a block RAM and it will instead use a distributed RAM.

An example

Here is a rather contrived example of a module with two RAMs. One using the Xilinx “synchronous” read and one using the “asynchronous” read.

Example Module

`timescale 1ns/1ns
module read_first_example
  (
   input clk,
   input reset,
   input [31:0] sync_wr_data,
   input [31:0] async_wr_data,
   input [8:0] sync_wr_addr,
   input [8:0] sync_rd_addr,
   input [8:0] async_wr_addr,
   input [8:0] async_rd_addr,
   input [31:0] sync_compare_value,
   input [31:0] async_compare_value,
   output sync_compare,
   output async_compare
   );

  reg [31:0] ram_sync_out;

  reg [8:0] sync_rd_addr_q, async_rd_addr_q;
  always @(posedge clk)
    sync_rd_addr_q <= sync_rd_addr;
  
  always @(posedge clk)
    async_rd_addr_q <= async_rd_addr;
  
  (* ram_style = "block" *) reg [31:0] ram_sync [511:0];
  always @(posedge clk)
    ram_sync[sync_wr_addr] <= sync_wr_data;

  always @(posedge clk)
    ram_sync_out <= ram_sync[sync_rd_addr_q];

  assign sync_compare = ram_sync_out == sync_compare_value;

  (* ram_style = "block" *) reg [31:0] ram_async [511:0];
  always @(posedge clk)
    ram_async[async_wr_addr] <= async_wr_data;
  
  wire [31:0] ram_async_out;
  assign ram_async_out = ram_async[async_rd_addr_q];

  assign async_compare = ram_async_out == async_compare_value;

endmodule

Example Test Driver

And here is some test code that uses the module.

`timescale 1ns/1ns
module read_first_example_test;

  reg clk = 1;
  always #5 clk = ~clk;
  
  reg reset = 1;
  initial begin repeat(10) @(posedge clk); reset <= 0; end
  
  wire sync_compare, async_compare;

  reg [31:0] counter, counter1, counter2;
  
  always @(posedge clk)
    begin
      counter1 <= counter;
      counter2 <= counter1;
      if (reset)
	counter <= 0;
      else
	counter <= counter+1;
    end
  
  read_first_example
    read_first_example
      (
       .clk(clk),
       .reset(reset),
       .sync_wr_data(counter),
       .async_wr_data(counter),
       .sync_wr_addr(counter[8:0]),
       .sync_rd_addr(counter[8:0]),
       .async_wr_addr(counter[8:0]),
       .async_rd_addr(counter[8:0]),
       .sync_compare_value(counter2),
       .async_compare_value(counter1),
       .sync_compare(sync_compare),
       .async_compare(async_compare)
       );
  
  initial
    begin
      wait(!reset);
      repeat (1000) @(posedge clk);
      $finish;
    end

endmodule

Operation

I use a counter here to generate the RAM write and read addresses, and I also use it for RAM write data and output comparisons. You can see that both RAMs write and read from the same address at the same time. You can also see that the compare value for the synchronous read is one cycle delayed from the read value from the asynchronous read.

Discussion

If you simulate the design you will see that the two comparison values are true throughout the simulation. However, if you perform a gate-level simulation you will discover that the asynchronous compare output will be false. Looking at the synthesis log file you will discover that there is no error or warning. You do however get a message INFO: [Synth 8-6430] The Block RAM "read_first_example/ram_async_reg" may get memory collision error if read and write address collide. Use attribute (* rw_addr_collision= "yes" *) to avoid collision. But to my mind, we don't have that situation because the read address is the write address delayed by one clock cycle.

Now, here's the strange part. You can add the rw_addr_collision attribute to the ram register declaration like this.

  (* rw_addr_collision = "yes" *)
  (* ram_style = "block" *) reg [31:0] ram_async [511:0];

Synthesize again and the design will work. Vivado will now add bypass logic to the design much like in the synchronous FIFO design in my previous post. Even more curious is that you can add the attribute to the synchronous RAM declaration and it will still behave correctly. So it appears that Vivado is capable of generating correct logic, you just need to use the magic rw_addr_collision attribute.

Conclusion

I certainly think this is a Vivado synthesis bug. It should never generate logic that does not match the RTL without some kind of complaint. And further, it should generate bypass logic on the asynchronous style of RAM read unless you tell it not to.

Synchronous FIFO Redux

So, almost ten years ago to the day, I posted an article on implementing a synchronous FIFO. Well, take the read portion of that FIFO implementation with a grain of salt.

Asynchronous Read

To summarize, here is the portion of the FIFO which implements the memory.

 reg [width-1:0] mem [depth-1:0];
  always @(posedge clk)
    begin
      if (writing)
	mem[wr_ptr] <= wr_data;
      rd_ptr <= next_rd_ptr;
    end

  assign rd_data = mem[rd_ptr];

Here is a timing diagram of writing one piece of data to the FIFO and then immediately reading it out.

Looks great, doesn’t it?

The Problem

This code, however, does not work on Xilinx FPGAs from 6 series and above. Both XST and Vivado will happily implement the equivalent of this for the read logic. And by happily, I mean without error or warning.

always @(posedge clk)
  rd_data <= mem[rd_ptr];

As you and I can tell, that is not the same thing. And it causes the design in my old post to behave in subtly incorrect ways. You can see this when the FIFO is leaving the empty state.

The Solution

To solve this problem we need to implement a bypass register to hold the write data along with an output mux to select the bypass register when we are reading immediately following a write.

Here is the code which does the synchronous read from the RAM.

reg [width-1:0] rd_mem;
  always @(posedge clk)
    rd_mem <= mem[next_rd_ptr];

Here is the bypass register, and the code to determine when the bypass register should be used instead of the RAM output.

reg [width-1:0] bypass_reg;
always @(posedge clk)
  bypass_reg <= wr_data;

reg use_bypass;
always @(posedge clk)
  use_bypass <= writing && wr_ptr == next_rd_ptr;

And here is the output MUX.

assign rd_data = use_bypass ? bypass_reg : rd_mem;

An Alternate Approach

In looking at the Vivado Synthesis Guide (ug901), you can see in the section called RW_ADDR_COLLISION on page 63, a description of an attribute which allows the write data to take priority over the read data. If you synthesize the sync_fifo design with the synchronous_read parameter set to zero or one you will see that the same muxing logic is created either way. With synchronous_read set to one, the muxing is explicit. With it set to zero, then the mux logic is implicit and Vivado will add it for you. It looks like it is safe to use either way. Unfortunately, the attribute is not supported in XST, so there you will need to use the explicit bypass logic with the synchronous read style.

Here is an example of how to set the attribute when you declare the RAM in Verilog.

(* rw_addr_collision = "yes" *)
reg [width-1:0] mem [depth-1:0];

Conclusion

I actually find this pretty shocking that both XST and Vivado generate incorrect code without any error or warning. Clearly, this is a bug and makes me wonder what other constructs it is implementing incorrectly. You can download the complete synchronous FIFO design here.

ZYNQ FSBL – The Saga Continues

Building an FSBL for the ZC706 using Petalinux

Well, another blog post on how to build a modified FSBL for ZYNQ. Using the patch which I demonstrated how to make in the previous post and a modified version of the fsbl_%.bbappend file which I received from the Xilinx Forum post regarding this I was able to make a working FSBL with my patch. The modified fsbl_%.bbappend file is shown below.

# Force to use embeddedsw repository
EXTERNALXSCTsrc=""
EXTERNALXSCTSRC_BUILD = ""

#Enable FSBL debug flags
YAML_COMPILER_FLAGS_append = " -DFSBL_DEBUG"

# Patch FSBL
SRC_URI_append += "file://0001-fsbl.patch"
FILESEXTRAPATHS_prepend := "${THISDIR}/files:"

Here are the steps used to create the FSBL

First run petalinux-create to build a petalinux project. Point it at the appropriate BSP.

$ petalinux-create --type project --source /tools/xilinx/bsp/xilinx-zc706-v2017.4-final.bsp

Then copy the fsbl_%.bbappend file into the project.

$ cp fsbl_%.bbappend xilinx-zc706-2017.4/project-spec/meta-user/recipes-bsp/fsbl
$ cp 0001-fsbl.patch xilinx-zc706-2017.4/project-spec/meta-user/recipes-bsp/fsbl/files

Next run a petalinux-build to make the bootloader

$ petalinux-build --project xilinx-zc706-2017.4 -c bootloader

When all is said and done you will get a zynq_fsbl file in xilinx-zc706-2017.4/images/linux/zynq_fsbl.elf.

This procedure works for 2017.4 and 2016.4.

Building from the git checkout

Alternatively you can build from the git checkout where you made the patch. This seems much simpler but it doesn’t work for me with petalinux 2016.4.

From the embeddesw checkout directory cd to lib/sw_apps/zynq_fsbl_src and run the following command.

$ make BOARD=zc706

This should build the file fsbl.elf in the src directory.

When I try to build this for 2016.4 I get a bunch of errors. They differ depending on how I set the CC variable on the make command line. If I don’t set it I get errors about arm-xilinx-eabi-gcc: Command not found. If I set CC=arm-none-eabi-gcc then I get this

arm-none-eabi-gcc    -c pcap.c -o pcap.o -I../misc//ps7_cortexa9_0/include -I.
In file included from pcap.c:96:0:
pcap.h:65:21: fatal error: xdevcfg.h: No such file or directory
compilation terminated.
Makefile:97: recipe for target 'pcap.o' failed

If anyone knows what to do about this I’m all ears.

Updating the First Stage Bootloader in Petalinux v2017.4

Overview

I am developing a prototype system that uses a lot (9) ZC706 boards. I need each board to boot with a different Ethernet MAC and IP address. I wanted this to be configured using the four DIP switches on the board.

Approach

To do this you need to modify the first stage bootloader (FSBL) to read the dip switch values and then pass the result to U-Boot. There is no built-in mechanism to do this so you need to modify U-Boot as well.

The plan is to modify the fsbl_hooks.c file in the FSBL to read the DIP switch value. Then it will place a message in the on-chip RAM at address 0xfffffc00. This location is unused by U-Boot. We then get U-Boot to look at the message to set the Ethernet address. Simple? Not hardly.

Modify fsbl_hooks.c

I’m not going to go over the FPGA logic for this change. It’s really pretty simple. Hook up the dip switch inputs to a GPIO unit in the PL. See my tutorials if you don’t already know how to do this. Next we modify the FsblHookBeforeHandoff function. First we need to read the DIP switch value. My GPIO block is at address 0x41200000 and I wired the switches into bits 8-11. So we first set those bits to inputs by writing 0xf00 to address 0x41200004 and then we read the dip switch values from address 0x41200000. We need to shift the result right by 8 to align the switch input values.

Next we create a string for U-Boot to consume. This is in the form of a command setting the U-Boot ethaddr variable to the desired MAC address.

u32 FsblHookBeforeHandoff(void)
{
	u32 Status;
	u32 dip_sw;
	int i;
	char ethaddr[40];

	Status = XST_SUCCESS;

	/*
	 * User logic to be added here.
	 * Errors to be stored in the status variable and returned
	 */
	Xil_Out32(0x41200004,0xf00); /* Set DIP switch as inputs */
	dip_sw = Xil_In32(0x41200000)>>8; /* Read values */

	fsbl_printf(DEBUG_INFO,"In FsblHookBeforeHandoff function \r\n");
	sprintf(ethaddr,"ethaddr=%02x:%02x:%02x:%02x:%02x:%02x\n",0x00,0x0a,0x35,0x00,0x00,dip_sw);
	for (i=0; i<strlen(ethaddr)+1; i++)
	  Xil_Out8(0xfffffc00+i,ethaddr[i]);

	return (Status);
}

Modify U-Boot

Next we need to get U-Boot to pay attention to the message. To do this we need to add the following to the project-name/project-spec/meta-user/recipes-bsp/u-boot/files/platform-top.h file.

#undef CONFIG_PREBOOT
#define CONFIG_PREBOOT	"echo U-BOOT for ${hostname};setenv preboot; echo;env import -t 0xFFFFFC00"

The trick is the env import -t 0xFFFFFCOO command. This tells U-Boot to import the string that the FSBL conveniently placed at that address.

That’s not so hard…

Great. That’s all we need to do right? Child, you’ve got a thing or two to learn. So far things have been easy. But even though petalinux-create creates the file components/plnx_workspace/fsbl/fsbl/src/fsbl_hooks.c for you to modify, it actually ignores that file completely. Likewise you could make a copy of the ZC706 BSP file and change the copy in there and then use your modified BSP file. Hah, amateur. That file is also ignored. You can modify the file in the Petalinux install directory but then all your Petalinux builds will get this little change. Clearly not desirable

What we need here is a patch file. We need to place a patch file in the components/plnx_workspace/fsbl/fsbl/src directory and then make a bbappend file that Petalinux will use to patch the fsbl_hooks.c file after it obtains the original file from god knows where but before it builds the FSBL.

Making the FSBL patch file

In order to build the patch you first need to check out the Xilinx embedded software git repository.

git clone https://github.com/Xilinx/embeddedsw.git

Now you need to checkout the branch for the version of Petalinux you are interested in. Here I’m using 2017.4.

$ cd embeddedsw/
$ git checkout tags/xilinx-v2017.4

Next create a branch and check it out. It doesn’t matter what you call the branch. Here I call the branch fsbl_mods_2017.4

$ git branch fsbl_mods_2017.4
$ git checkout fsbl_mods_2017.4
Now edit the file and make the changes.
$ emacs lib/sw_apps/zynq_fsbl/src/fsbl_hooks.c

Now we are ready to make the patch file. First add the file to the branch, and then commit the change.

$ git add lib/sw_apps/zynq_fsbl/src/fsbl_hooks.c
$ git commit -m "Patch to set MAC address based on dip switches" -s

Now we can create the patch.

$ git format-patch -1

This will create a lovely, albeit verbosely named, patch file

0001-Patch-to-set-MAC-address-based-on-dip-switches.patch

. How cool is that?

Here are the contents of my patch file.

From 1bc864fc9d064fd57c0721e27ca04e348d594bd9 Mon Sep 17 00:00:00 2001
From: Pete Johnson <pete@beyond-circuits.com>
Date: Mon, 21 May 2018 17:02:50 -0700
Subject: [PATCH] Patch to set MAC address based on dip switches

Signed-off-by: Pete Johnson <pete@beyond-circuits.com>
---
 lib/sw_apps/zynq_fsbl/src/fsbl_hooks.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/lib/sw_apps/zynq_fsbl/src/fsbl_hooks.c b/lib/sw_apps/zynq_fsbl/src/fsbl_hooks.c
index 304a6db..d86522a 100644
--- a/lib/sw_apps/zynq_fsbl/src/fsbl_hooks.c
+++ b/lib/sw_apps/zynq_fsbl/src/fsbl_hooks.c
@@ -130,6 +130,9 @@ u32 FsblHookAfterBitstreamDload(void)
 u32 FsblHookBeforeHandoff(void)
 {
 	u32 Status;
+	u32 dip_sw;
+	int i;
+	char ethaddr[40];
 
 	Status = XST_SUCCESS;
 
@@ -137,7 +140,13 @@ u32 FsblHookBeforeHandoff(void)
 	 * User logic to be added here.
 	 * Errors to be stored in the status variable and returned
 	 */
+	Xil_Out32(0x41200004,0xf00); /* Set DIP switch as inputs */
+	dip_sw = Xil_In32(0x41200000)>>8; /* Read values */
+
 	fsbl_printf(DEBUG_INFO,"In FsblHookBeforeHandoff function \r\n");
+	sprintf(ethaddr,"ethaddr=%02x:%02x:%02x:%02x:%02x:%02x\n",0x00,0x0a,0x35,0x00,0x00,dip_sw);
+	for (i=0; i<strlen(ethaddr)+1; i++)
+	  Xil_Out8(0xfffffc00+i,ethaddr[i]);
 
 	return (Status);
 }
-- 
1.8.3.1

Adding the patch file to Petalinux

Now we have our magic patch file. Next we need to add it to our Petalinux project. First we copy the patch file to the Petalinux project and rename it to something a little more sane.

$ cp 0001-Patch-to-set-MAC-address-based-on-dip-switches.patch petalinux-project/project-spec/meta-user/recipes-bsp/fsbl/files/0001-fsbl.patch

Next we create a file called petalinux-project/project-spec/meta-user/recipes-bsp/fsbl/fsbl_%.bbappend with the following contents

# Patch for FSBL
# Note: do_configure_prepend task section is required only for 2017.1 release
# Refer https://github.com/Xilinx/meta-xilinx-tools/blob/rel-v2017.2/classes/xsctbase.bbclass#L29-L35
 
do_configure() {
    if [ -d "${S}/patches" ]; then
       rm -rf ${S}/patches
    fi
 
    if [ -d "${S}/.pc" ]; then
       rm -rf ${S}/.pc
    fi
}
 
SRC_URI_append = " \
        file://0001-fsbl.patch \
        "
 
FILESEXTRAPATHS_prepend := "${THISDIR}/files:"

EXTERNALXSCTSRC = ""
EXTERNALXSCTSRC_BUILD = ""

This bbappend script is run during the bitbake Petalinux build and it will patch the appropriate fsbl_hooks.c file during the build.

But wait there’s more…

Now of course we still need to modify the platform-top.h file for U-Boot. Hilariously I went down the same route for U-Boot only to find that there is no platform-top.h file in the git repository for U-Boot. However there is one in the project-spec directory created by petalinux-config. Hmmm… fool me once…

But left with no other choice I tried it – and what do you know? The build does pay attention to this file and U-Boot seems to be build with the addition that we made.

Are we done yet?

Not hardly. As of this writing when I do a Petalinux build the version that is built will not boot. But if I compile the FSBL in the tree that I checked out with my changes I can use that FSBL and boot successfully and the MAC address will be set according to the dip switches. If anyone knows what is going on here I would sure appreciate some advice.

Conclusion

When all is said and done I can build a Petalinux which does what I want. But I don’t understand why modifying platform-top.h in the Petalinux directory works but modifying the fsbl_hooks.c does not. And I’d like to know why the FSBL which gets built with my petalinux build does not work. When I do a find while bitbake is making the FSBL I see the appropriately patched fsbl_hooks.c down in the bowels of the bitbake build. And if there are any (and I mean any) differences in the patch file or the bbappend file then I get errors when bitbake runs. And where did the platform-top.h file come from anyway? That doesn’t seem to be iin the U-Boot tree at all.

Sometimes I think this may all be by design in order to keep embedded Linux consultants employed. Let me know if you have any insights.

GPIO with ZYNQ and PetaLinux

Accessing GPIO controllers is pretty straightforward with PetaLinux, but there are a few tricks you need to know.

Locating the GPIO controller

In the example FPGA I am using, there are two GPIO controllers in the programmable logic. These are at address 0x4120_0000 and 0x4121_0000. If you look in the pl.dtsi file in your PetaLinux project, in the directory subsystems/linux/configs/device-tree, you will see entries for the GPIO devices. There’s no need to modify the entire device tree.

If you make a PetaLinux build and boot it, you can look in the /sys/class/gpio directory.

root@pz-7015-2016-2:~# ls /sys/class/gpio/                       
export       gpiochip901  gpiochip902  gpiochip906  unexport

You can see that there is a gpiochip directory for each GPIO controller. The gpiochip901 and gpiochip902 directories correspond to the PL controllers that I added in my design. The gpiochip906 directory is for the GPIO controller in the PS.

How will you know which is which, though? Each directory contains a label file which tells you the device tree label for the controller. You can go ahead and look at the contents:

root@pz-7015-2016-2:~# cat /sys/class/gpio/gpiochip901/label 
/amba_pl/gpio@41210000
root@pz-7015-2016-2:~# cat /sys/class/gpio/gpiochip902/label 
/amba_pl/gpio@41200000
root@pz-7015-2016-2:~# cat /sys/class/gpio/gpiochip906/label 
zynq_gpio

Looking at it, you’ll see that gpiochip901 corresponds to my controller at 0x4120_0000 and gpiochip902 corresponds to the controller at 0x4121_0000. Gpiochip906 is different, and corresponds to the built-in controller on the ZYNQ. Why those numbers? In my FPGA, the first GPIO controller controls only a single GPIO bit, while the second controls four bits. We can tell how many bits each controller controls by looking in the ngpio file for the controller.

root@pz-7015-2016-2:~# cat /sys/class/gpio/gpiochip901/ngpio 
1
root@pz-7015-2016-2:~# cat /sys/class/gpio/gpiochip902/ngpio
4
root@pz-7015-2016-2:~# cat /sys/class/gpio/gpiochip906/ngpio
118

It looks to me like the numbering starts at 901. Since that controller has only a single GPIO bit, the next controller is 902. That one has four bits, so the ZYNQ PS controller goes at 906, which has 118 bits.

Enabling the GPIO bits

To access a GPIO bit, you need to enable the correct GPIO pin. You do that by writing to the export file in the /sys/class/gpio directory. Here is an example of enabling the LSB of my second controller:

root@pz-7015-2016-2:~# echo -n 902 > /sys/class/gpio/export 

Now if you look in the /sys/class/gpio directory, you will see a new directory created which allows you to control the individual GPIO pin.

root@pz-7015-2016-2:~# ls /sys/class/gpio
export       gpio902      gpiochip901  gpiochip902  gpiochip906  unexport

If you look in that directory you see a number of controls:

root@pz-7015-2016-2:~# ls /sys/class/gpio/gpio902
active_low  direction   power       subsystem   uevent      value

Accessing the GPIO bits

You can determine the GPIO direction by looking at the direction file. Since my GPIO pin is an output, it gives the value out.

root@pz-7015-2016-2:~# cat /sys/class/gpio/gpio902/direction
out

You can change the value to a 1 by writing to the value file.

root@pz-7015-2016-2:~# echo 1 > /sys/class/gpio/gpio902/value           

Conclusion

So there you have it. The “official” way to access GPIO on PetaLinux.

SPI with PetaLinux on ZYNQ

Recently, I spent a lot of time trying to get SPI working on a PicoZed ZYNQ board under Linux. It was absolutely shocking how complicated this ended up being. One issue, I think, is that the device tree options differ depending on which version of PetaLinux you’re using. In this post, I’m going to document here how to do it with PetaLinux 2016.2.

Modify the device tree

First, you need to modify the system-top.dts file located in your PetaLinux project’s subsystems/linux/configs/device-tree directory. You need to add an entry that extends the existing entry for the SPI device. In the example, I am using spi0 on the processor subsystem. You can see the base definition for the SPI interface in the zynq-7000.dtsi include file in the same directory.

It’s important to note that PetaLinux will create an entry for the SPI device when you configure Linux– however, you won’t get a device file unless you add the entry for your particular SPI device. The trick is to add the SPI device information to the file system-top.dts. The device tree specification syntax allows you to make changes to the automatic entry for the SPI device by labeling a a node, then overlaying additional information onto the labeled node in other parts of the device tree specification.

In our case, the processor built-in SPI devices are labeled spi0 and spi1. I wanted to use spi0, so I added an entry in the system-top.dts file to add to the spi0 definition. In the example below, I’ve added three devices.

&spi0 {
  is-decoded-cs = <0>;
  num-cs = <3>;
  status = "okay";
  spidev@0x00 {
    compatible = "spidev";
    spi-max-frequency = <1000000>;
    reg = <0>;
  };
  spidev@0x01 {
    compatible = "spidev";
    spi-max-frequency = <1000000>;
    reg = <1>;
  };
  spidev@0x02 {
    compatible = "spidev";
    spi-max-frequency = <1000000>;
    reg = <2>;
  };
};

Rebuild linux and reboot your PicoZed board and you can now see the device files.

root@pz-7015-2016-2:~# ls -l /dev/spi*
crw-rw----    1 root     root      153,   0 Jan  1 00:00 /dev/spidev32766.0
crw-rw----    1 root     root      153,   1 Jan  1 00:00 /dev/spidev32766.1
crw-rw----    1 root     root      153,   2 Jan  1 00:00 /dev/spidev32766.2

Testing the SPI interface

In order to test the SPI interface, I built an FPGA with the SPI ports marked for debug. This allows me to use the embedded logic analyzer to view the pin activity from Vivado. PetaLinux ships with a program to test the SPI interface called spidev_test. I compiled it with the following command:

arm-xilinx-linux-gnueabi-gcc -o spidev_test /tools/xilinx/petalinux-v2016.2-final/components/linux-kernel/xlnx-4.4/Documentation/spi/spidev_test.c

Then, I copied it to my board using ssh, configured the logic analyzer to capture SPI activity, and ran the following command:

root@pz-7015-2016-2:~# ./spidev_test -D /dev/spidev32766.0 --speed 10000000
spi mode: 0x0
bits per word: 8
max speed: 10000000 Hz (10000 KHz)
RX | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  | ................................

I could see the SPI pins wiggle in the logic analyzer view.

Conclusion

Anyway, I hope that this you save some time getting SPI to work.

PetaLinux for the PicoZed

I recently ran into a SNAFU when trying to build a PetaLinux image for a PicoZed board on a rev 2 FMC carrier board. The instructions assume that you have built an FPGA image and exported it from Vivado. You can check out my tutorial on PetaLinux for more background information, just in case you’re unfamiliar with the process.

You’re going to need to make sure that you have the board support package for the PicoZed board and carrier board you are using, which can download here. I like to keep the BSP in my Xilinx tool area, but you can store it anywhere you like. Be sure to unzip the download file.

Create the project

First, I source the settings file for PetaLinux. Don’t worry about the tftp warnings.

% . /tools/xilinx/petalinux-v2015.4-final/settings.sh

Now create your PetaLinux project. I created it as a sibling of the Vivado project directory.

% petalinux-create --type project --name pz_linux --source /tools/xilinx/pz_7015_2015_4.bsp

Configure the project

Next, you configure the project with the FPGA export. This sets the device tree for the build:

% petalinux-config --get-hw-description my_fpga/my_fpga.sdk --project pz_linux

Then, configure your filesystem:

% petalinux-config -c rootfs --project pz_linux

PetaLinux build fails

If we run petalinux-build right now, we run in to some trouble:

% petalinux-build --project pz_linux
INFO: Checking component...
INFO: Generating make files and build linux
INFO: Generating make files for the subcomponents of linux
INFO: Building linux
[INFO ] pre-build linux/rootfs/fwupgrade
[INFO ] pre-build linux/rootfs/gpio-demo
[INFO ] pre-build linux/rootfs/httpd_content
[INFO ] pre-build linux/rootfs/iperf3
[INFO ] pre-build linux/rootfs/peekpoke
[INFO ] pre-build linux/rootfs/weaved
[INFO ] build system.dtb
[ERROR] ERROR (phandle_references): Reference to non-existent node or label "usb_phy0"
[ERROR] ERROR: Input tree has errors, aborting (use -f to force output)
[ERROR] make[1]: *** [system.dtb] Error 255
ERROR: Failed to build linux

Fixing the device tree

The problem is that the device tree was overwritten by the petalinux-config. To fix that, we need to add the following lines to the system-conf.dtsi file located in pz_linux/subsystems/linux/configs/device-tree. Place the text following the entry for memory:

usb_phy0:phy0 {
  compatible="ulpi-phy";
  #phy-cells = ;
  reg = ;
  view-port = ;
  drv-vbus;
};

With this change in place, we can go ahead and run petalinix-build again, and the build will complete. I’m not exactly sure why this procedure is required or how to patch the PetaLinux install to fix this. I will explore that later.

Packaging

After building, you will need to package the distribution.

% petalinux-package --boot --format BIN --project pz_linux --fsbl pz_linux/images/linux/zynq_fsbl.elf --fpga my_fpga/impl_1/my_fpga.bit --u-boot

Copy the BOOT.BIN and pz_linux/images/linux/image.ub files to your MicroSD care and boot your PicoZed board.

Let me know if you have any ideas why this problem exists or how you can fix it.

Getting the Log Base 2 Algorithm to Synthesize

My last post introduced an algorithm for finding the log base 2 of a fixed point number. However, it had a gotcha. It had to use some floating point functions to initialize a table, and even though it is not synthesizing floating point, ISE, Vivado, and Quartus II all refuse to synthesize the design. What should we do?

Perl Preprocessor to the Rescue

In an older blog post, I discuss a Verilog Preprocessor that I wrote years ago. In the old days of Verilog ’95, preprocessors like this were practically required. The language was missing lots of features that made external solutions necessary,  but got largely fixed with Verilog 2001 and Verilog 2005. Now with Systemverilog, things are even better. However, sometimes tools don’t support all of the language features. In the last blog post, we discovered that the FPGA synthesis tools can’t handle the floating point functions used in the ROM initialization code.

Computing the lookup table in Perl

What we’ll do is write the ROM initialization code in Perl and use the preprocessor to generate the table before the synthesis tool sees it.

The code which gives the synthesis tools fits is this:

  function real rlog2;
    input real x;
    begin
      rlog2 = $ln(x)/$ln(2);
    end
  endfunction

  reg [out_frac-1:0] lut[(1<<lut_precision)-1:0];
  integer i;
  initial
    begin : init_lut
      lut[0] = 0;
      for (i=1; i<1<<lut_precision; i=i+1)
        lut[i] = $rtoi(rlog2(1.0+$itor(i)/$itor(1<<lut_precision))*$itor(1<<out_frac)+0.5);
    end

We can essentially turn this code into Perl and embed it, so that the preprocessor will be able to generate it for us. Remember, the Perl code goes inside special Verilog comments.

First, we’re going to need to define some parameter values in Perl and their counterparts in Verilog. These define the maximum allowable depth and width of the lookup table.

  //@ my $max_lut_precision = 12;
  localparam max_lut_precision = $max_lut_precision;
  //@ my $max_lut_frac = 27;
  localparam max_lut_frac = $max_lut_frac;

Here is the Perl code to compute the lookup table values:

  /*@
   sub compute_lut_value {
     my $i = shift;
     return log(1.0+$i/(1<<$lut_precision))*(1<<$lut_frac)/log(2.0);
   }
   @*/

Then, we’ll embed the lookup table inside a Verilog function.

  function [max_lut_frac-1:0] full_lut_table;
    input integer i;
    begin
      case (i)
	//@ for my $i (0..(1<<$max_lut_precision)-1) {
	//@ my $h = sprintf("${max_lut_frac}'h%x",int(compute_lut_value($i)+0.5));
	$i: full_lut_table = $h;
	//@ }
      endcase
    end
  endfunction

We’re also going to parallel the Perl code with a pure Verilog function, just like our local parameters.

  function [out_frac-1:0] compute_lut_value;
    input integer i;
    begin
      compute_lut_value = $rtoi(rlog2(1.0+$itor(i)/$itor(1<<lut_precision))*$itor(1<<out_frac)+0.5);
    end
  endfunction

But how do you parameterize it?

There happens to be one gotcha when working with preprocessors: you really don’t want to use them. Back in the Verilog ’95 days, my colleagues and I just used the preprocessor for every Verilog file. All of our parameterization was done there. We used a GNU Make flow, and it built all the Verilog files for us automatically. But with the advent of Verilog 2001 and SystemVerilog, a lot of things that we used the preprocessor for could be done within the language– and much better, too. One of the crufty things about using the preprocessor was that you needed to embed the parameter values in the module names. Otherwise, you would have module name conflicts for different modules with different parameter values. In this case, we actually want the preprocessor to generate a parameterized Verilog module. We still want to use the normal Verilog parameter mechanism to control the width and depth of the lookup table.

To do this, we must generate a maximal table in the preprocessor, and then cut from that table, using Verilog, a subtable that has the desired width and depth based on our Verilog parameter values.

Here is the code to do that. If the values of the depth and width (precision and fractional width) exceed the maximal Perl values, then we just use the pure Verilog implementation and the code will not be synthesizable, but it will still work in simulation, at least. If the parameter values are “in bounds” for the preprocessor-computed lookup table, then we’re going to go ahead and cut our actual table from the Perl generated lookup table, and the design will be synthesizable.

  reg [out_frac-1:0] lut[(1<<lut_precision)-1:0];
  integer i;
  generate
    // If the parameters are outside the bounds of the static lookup table then
    // compute the lookup table dynamically. This will not be synthesizable however
    // by most tools.
    if (lut_precision > max_lut_precision || out_frac > max_lut_frac)
      initial
	begin : init_lut_non_synth
	  lut[0] = 0;
	  for (i=1; i<1<<lut_precision; i=i+1)
	    begin
	      lut[i] = compute_lut_value(i);
	    end
	end
    else
      // The parameters are within bounds so we can use the precomputed table
      // and synthesize the design.
      initial
	begin : init_lut_synth
	  for (i=0; i<1<<lut_precision; i=i+1)
	    begin : loop
	      reg [max_lut_frac-1:0] table_value;
	      table_value = full_lut_table(i<<(max_lut_precision-lut_precision));
	      lut[i] = table_value[max_lut_frac-1:max_lut_frac-out_frac];
	    end
	end
  endgenerate

We can then finish off the design, just like in Computing the Logarithm Base 2

Conclusion

This may be a lot to take in. But the gist is that you build a table using the Perl preprocessor, which is large enough to use for all parameter values. Then in Verilog, you use the actual parameter values to cut out the portion of the pre-computed table that you need. This cutting out can be done during the elaboration or initialization stages of the synthesis. Of course, our job would be much easier if the synthesis tool developers got it into their heads that using floating point and math during elaboration or initialization does not necessitate synthesizing floating point logic.

If anyone knows of a software floating point library written purely in Verilog, please let me know. We could then use that to trick the synthesis tools into doing what we want.

Oh, and here’s a link to the complete file.

Detecting the rising edge of a short pulse

A reader is going through my ZedBoard tutorial and had some questions about detecting the rising edge of a pulse. The tutorial in question is using a ZedBoard to make a stopwatch. Kind of overkill in terms of hardware, but you have to start somewhere when you’re learning to code.

TRAN MINHHAI’s question asked: what do you do when the rising edge might just be a pulse, and the pulse might last less than a single clock cycle? The answer is to use a flip-flop with the input signal going to an asynchronous set input. The data input is just zero, and the clock signal is the one we are synchronizing to.

Here is the code:


`timescale 1ns/1ns
module edge_detect
  (
   input clk,
   input btnl,
   output btnl_rise
   );

  reg btnl_held = 0;
  always @(posedge clk or posedge btnl)
    if (btnl)
      btnl_held <= 1;
    else
      btnl_held <= 0;

  reg [1:0] btnl_shift = 0;
  always @(posedge clk)
    btnl_shift <= {btnl_shift,btnl_held};
  
  assign btnl_rise = btnl_shift == 2'b01;
endmodule

I also wrote a little test to go with it. Notice how short little pulses can come anywhere with respect to the clock edge:


`timescale 1ns/1ns
module edge_detect_test;

  reg clk = 0;
  always #10 clk = ~clk;

  reg btnl = 0;

  wire btnl_rise;

  edge_detect edge_detect(clk,btnl,btnl_rise);

  initial
    begin
      $dumpvars(0);
      
      @(posedge clk)
	btnl <= 1;
      repeat (10) @(posedge clk);
      btnl <= 0;
      repeat (4) @(posedge clk);
      #10 btnl <= 1;
      #1 btnl <= 0;
      repeat (4) @(posedge clk);
      #19 btnl <= 1;
      #1 btnl <= 0;
      repeat (4) @(posedge clk);
      #20 btnl <= 1;
      #1 btnl <= 0;
      repeat (4) @(posedge clk);
      $finish;
    end
endmodule

The long pulse case

In the case of a long pulse, the design works just like it would without the asynchronous set flip-flop. Here’s a timing diagram:

The short pulse case

But if there is a short pulse, the asynchronous set flip-flop holds the input value until there is a clock edge.

If the pulse appears in the middle of the clock cycle, then the timing diagram looks like this:

If the pulse appears right before the clock edge, then the timing diagram looks like this:

Now you know how to synchronize the rising edge of a pulse even if you have a slow clock.

Computing the Logarithm Base 2

Algorithm

Computing the log base 2 of a whole number is easy: just use a priority encoder. But what about a fixed point number? That sounds a lot harder. But, like many things, there’s a trick to make things easier.

Consider the following equation:

\log_2{x} = \log_2 (2^n \cdot 2^{-n} \cdot x) = n + \log_2(2^{-n} \cdot x)

If we choose n such that 2^{-n}\cdot x\in [1,2) , then n is just the whole number portion of the log. and we can compute that with a priority encoder. Furthermore, we can compute the 2^{-n}\cdot x portion by barrel-shifting the original number. Finally, we can use a lookup table to compute the lower bits. The nice thing about this is that the lookup table can be much smaller, since we only need to store the values between 1 and 2.

Another way to think about this is that the logarithm curve from [2,4) is the same as the one from [1,2), just shifted up by 2 and scaled horizontally by 2. Likewise the curve from [4,8) is scaled the same way. The algorithm is just taking advantage of this symmetry.

Oh– additionally, this algorithm only works on input values greater or equal to one, so the input and output values are always positive.

Implementation

Now, let’s code up our Verilog module. We’ll start with the module declaration:

module log2
 #(
 parameter in_int = 16,
 parameter in_frac = 8,
 parameter out_int = $clog2(in_int),
 parameter out_frac = in_int+in_frac-out_int,
 parameter lut_precision = 6,
 parameter register_output_stage = 0
 )

Since the module is dealing with a fixed-point input and output value, we need to specify how many integer and fractional bits there are in the input and output values. The lut_precision parameter specifies the log base 2 of the number of entries in our lookup table, and the default setting will be 64 entries in the table. There is also a parameter which allows for an optional final output register stage.

Next come the module ports:

  (
   input clk,
   input [in_int+in_frac-1:0] din,
   input din_valid,             // data in is valid
   output din_ready,            // ready to receive data
   output reg [out_int+out_frac-1:0] dout,
   output reg dout_valid,       // data out is ready (valid)
   output reg dout_error        // data out is incorrect - input data was less than 1.0
   );

We have the clock clk and the input value din. A handshake input indicating that the din value is valid. There is an output signal din_ready indicating that the module is ready to accept an input value. There is, of course, the output value dout, and a valid signal dout_valid. Since it is possible to provide an illegal input between 0 and 1, there is also a dout_error signal indicating the output value is not valid due to an inadmissible input value.
Now, let’s look at the body of the module:

  assign din_ready = 1;

Ready? Please. We’re always ready.

Pipeline Stage 1

Next, we instantiate the recursive priority encoder from my previous blog post. This takes the integer portion of the input value and produces the integer portion of the output value. There is also a prienc_error output which indicates that none of the input bits were set. This means we were given an input value that was less than one.

  wire [$clog2(in_int)-1:0] prienc_out;
  wire prienc_error;
  
  priencr #(in_int)
  priencr
    (
     .decode(din[in_frac+:in_int]),
     .encode(prienc_out),
     .valid(prienc_error)
     );

Next, we have the stage one pipeline logic:

  reg [$clog2(in_int)-1:0] stage1_prienc_out;
  reg stage1_error;
  reg stage1_valid;
  reg [in_int+in_frac-1:0] stage1_din;
  
  always @(posedge clk)
    begin
      stage1_din <= din;
      stage1_prienc_out <= prienc_out;
      stage1_error <= prienc_error;
      stage1_valid <= din_valid;
    end

Pipeline Stage 2

Stage two of the pipeline is the barrel shift logic. This shifts the input value to the left, based on the priority encoder output. Things are flipped, though, so lower priority encoder output values cause a larger shift. We’ll also pipeline the other signals:

  reg [in_int+in_frac-1:0] stage2_barrel_out;
  reg [out_int-1:0] stage2_barrel_out_int;
  reg stage2_error;
  reg stage2_valid;
  always @(posedge clk)
    begin
      stage2_barrel_out <= stage1_din << (in_int-stage1_prienc_out-1);
      stage2_barrel_out_int <= stage1_prienc_out;
      stage2_error <= stage1_error;
      stage2_valid <= stage1_valid;
    end

Pipeline Stage 3

The third pipeline stage is the lookup table, which computes the fractional part of the output. We will use an initial block to fill the table. First, we define a function to take the floating point log base-2 of an input value, since Verilog does not have one built in. Remember this rule from high school algebra?

  function real rlog2;
    input real x;
    begin
      rlog2 = $ln(x)/$ln(2);
    end
  endfunction

Next we declare the table and fill it in an initial block:

  reg [out_frac-1:0] lut[(1<<lut_precision)-1:0];
  integer i;
  initial
    begin : init_lut
      lut[0] = 0;
      for (i=1; i<1<<lut_precision; i=i+1)
        lut[i] = $rtoi(rlog2(1.0+$itor(i)/$itor(1<<lut_precision))*$itor(1<<out_frac)+0.5);
    end

Now, we need to use the barrel shift output as an address to the lookup table. We also carry along some of the stage 2 results:

  reg [out_frac-1:0] stage3_lut_out;
  reg [out_int-1:0] stage3_out_int;
  reg stage3_error;
  reg stage3_valid;
  always @(posedge clk)
    begin
      stage3_out_int <= stage2_barrel_out_int;
      stage3_lut_out <= lut[stage2_barrel_out[in_int+in_frac-2-:lut_precision]];
      stage3_error <= stage2_error;
      stage3_valid <= stage2_valid;
    end

Finally we have the code for the optional stage 4 pipeline. This is controlled by the register_output_stage parameter:

  generate
    if (register_output_stage)
      begin
        always @(posedge clk) dout = {stage3_out_int,stage3_lut_out};
        always @(posedge clk) dout_error = stage3_error;
        always @(posedge clk) dout_valid = stage3_valid;
      end
    else
      begin
        always @(*) dout = {stage3_out_int,stage3_lut_out};
        always @(*) dout_error = stage3_error;
        always @(*) dout_valid = stage3_valid;
      end
    endgenerate

Caveats

So there you have it: a module that computes log base 2 for fixed point inputs. However, there ‘s still one tiny problem: the design is not synthesizable by XST or Vivado. I haven’t tried other tools, but they may have issues with it as well.

The issue is that the tools don’t seem to implement the built-in Verilog floating point functions during elaboration. Essentially, the synthesis tool needs to run the initial block in order to know how to populate the lookup table. In general, the tools can do this. However, if they catch wind of a float value, they tuck their tail between their legs and run.

My workaround for this is to use the Perl preprocessor I describe in a previous blog post. But that’s a topic for another time.