RAM address conflicts and a Vivado synthesis bug

Introduction

My post yesterday got me digging into some issues regarding Vivado and BRAM usage. Vivado has a synthesis bug where it generates incorrect logic when you use the so-called “asynchronous” style of reading from a block RAM. You can read my previous post about placing bypass logic to emulate the asynchronous read style in a block RAM.

A quick note though. Xilinx calls this style of RAM read asynchronous. But it is only asynchronous if the read address is combinational. If you register the read address then the access is really synchronous.

Also, note that Xilinx will surely say that this is not a bug. The Vivado synthesis user’s guide says that in this case simulation won’t match synthesized results. I would argue that that is precisely the definition of a synthesis bug. Vivado should raise an error or at least a warning in this case. In fact, if you try to do a truly asynchronous read on a RAM, it will indicate that it cannot use a block RAM and it will instead use a distributed RAM.

An example

Here is a rather contrived example of a module with two RAMs. One using the Xilinx “synchronous” read and one using the “asynchronous” read.

Example Module

`timescale 1ns/1ns
module read_first_example
  (
   input clk,
   input reset,
   input [31:0] sync_wr_data,
   input [31:0] async_wr_data,
   input [8:0] sync_wr_addr,
   input [8:0] sync_rd_addr,
   input [8:0] async_wr_addr,
   input [8:0] async_rd_addr,
   input [31:0] sync_compare_value,
   input [31:0] async_compare_value,
   output sync_compare,
   output async_compare
   );

  reg [31:0] ram_sync_out;

  reg [8:0] sync_rd_addr_q, async_rd_addr_q;
  always @(posedge clk)
    sync_rd_addr_q <= sync_rd_addr;
  
  always @(posedge clk)
    async_rd_addr_q <= async_rd_addr;
  
  (* ram_style = "block" *) reg [31:0] ram_sync [511:0];
  always @(posedge clk)
    ram_sync[sync_wr_addr] <= sync_wr_data;

  always @(posedge clk)
    ram_sync_out <= ram_sync[sync_rd_addr_q];

  assign sync_compare = ram_sync_out == sync_compare_value;

  (* ram_style = "block" *) reg [31:0] ram_async [511:0];
  always @(posedge clk)
    ram_async[async_wr_addr] <= async_wr_data;
  
  wire [31:0] ram_async_out;
  assign ram_async_out = ram_async[async_rd_addr_q];

  assign async_compare = ram_async_out == async_compare_value;

endmodule

Example Test Driver

And here is some test code that uses the module.

`timescale 1ns/1ns
module read_first_example_test;

  reg clk = 1;
  always #5 clk = ~clk;
  
  reg reset = 1;
  initial begin repeat(10) @(posedge clk); reset <= 0; end
  
  wire sync_compare, async_compare;

  reg [31:0] counter, counter1, counter2;
  
  always @(posedge clk)
    begin
      counter1 <= counter;
      counter2 <= counter1;
      if (reset)
	counter <= 0;
      else
	counter <= counter+1;
    end
  
  read_first_example
    read_first_example
      (
       .clk(clk),
       .reset(reset),
       .sync_wr_data(counter),
       .async_wr_data(counter),
       .sync_wr_addr(counter[8:0]),
       .sync_rd_addr(counter[8:0]),
       .async_wr_addr(counter[8:0]),
       .async_rd_addr(counter[8:0]),
       .sync_compare_value(counter2),
       .async_compare_value(counter1),
       .sync_compare(sync_compare),
       .async_compare(async_compare)
       );
  
  initial
    begin
      wait(!reset);
      repeat (1000) @(posedge clk);
      $finish;
    end

endmodule

Operation

I use a counter here to generate the RAM write and read addresses, and I also use it for RAM write data and output comparisons. You can see that both RAMs write and read from the same address at the same time. You can also see that the compare value for the synchronous read is one cycle delayed from the read value from the asynchronous read.

Discussion

If you simulate the design you will see that the two comparison values are true throughout the simulation. However, if you perform a gate-level simulation you will discover that the asynchronous compare output will be false. Looking at the synthesis log file you will discover that there is no error or warning. You do however get a message INFO: [Synth 8-6430] The Block RAM "read_first_example/ram_async_reg" may get memory collision error if read and write address collide. Use attribute (* rw_addr_collision= "yes" *) to avoid collision. But to my mind, we don't have that situation because the read address is the write address delayed by one clock cycle.

Now, here's the strange part. You can add the rw_addr_collision attribute to the ram register declaration like this.

  (* rw_addr_collision = "yes" *)
  (* ram_style = "block" *) reg [31:0] ram_async [511:0];

Synthesize again and the design will work. Vivado will now add bypass logic to the design much like in the synchronous FIFO design in my previous post. Even more curious is that you can add the attribute to the synchronous RAM declaration and it will still behave correctly. So it appears that Vivado is capable of generating correct logic, you just need to use the magic rw_addr_collision attribute.

Conclusion

I certainly think this is a Vivado synthesis bug. It should never generate logic that does not match the RTL without some kind of complaint. And further, it should generate bypass logic on the asynchronous style of RAM read unless you tell it not to.

Synchronous FIFO Redux

So, almost ten years ago to the day, I posted an article on implementing a synchronous FIFO. Well, take the read portion of that FIFO implementation with a grain of salt.

Asynchronous Read

To summarize, here is the portion of the FIFO which implements the memory.

 reg [width-1:0] mem [depth-1:0];
  always @(posedge clk)
    begin
      if (writing)
	mem[wr_ptr] <= wr_data;
      rd_ptr <= next_rd_ptr;
    end

  assign rd_data = mem[rd_ptr];

Here is a timing diagram of writing one piece of data to the FIFO and then immediately reading it out.

Looks great, doesn’t it?

The Problem

This code, however, does not work on Xilinx FPGAs from 6 series and above. Both XST and Vivado will happily implement the equivalent of this for the read logic. And by happily, I mean without error or warning.

always @(posedge clk)
  rd_data <= mem[rd_ptr];

As you and I can tell, that is not the same thing. And it causes the design in my old post to behave in subtly incorrect ways. You can see this when the FIFO is leaving the empty state.

The Solution

To solve this problem we need to implement a bypass register to hold the write data along with an output mux to select the bypass register when we are reading immediately following a write.

Here is the code which does the synchronous read from the RAM.

reg [width-1:0] rd_mem;
  always @(posedge clk)
    rd_mem <= mem[next_rd_ptr];

Here is the bypass register, and the code to determine when the bypass register should be used instead of the RAM output.

reg [width-1:0] bypass_reg;
always @(posedge clk)
  bypass_reg <= wr_data;

reg use_bypass;
always @(posedge clk)
  use_bypass <= writing && wr_ptr == next_rd_ptr;

And here is the output MUX.

assign rd_data = use_bypass ? bypass_reg : rd_mem;

An Alternate Approach

In looking at the Vivado Synthesis Guide (ug901), you can see in the section called RW_ADDR_COLLISION on page 63, a description of an attribute which allows the write data to take priority over the read data. If you synthesize the sync_fifo design with the synchronous_read parameter set to zero or one you will see that the same muxing logic is created either way. With synchronous_read set to one, the muxing is explicit. With it set to zero, then the mux logic is implicit and Vivado will add it for you. It looks like it is safe to use either way. Unfortunately, the attribute is not supported in XST, so there you will need to use the explicit bypass logic with the synchronous read style.

Here is an example of how to set the attribute when you declare the RAM in Verilog.

(* rw_addr_collision = "yes" *)
reg [width-1:0] mem [depth-1:0];

Conclusion

I actually find this pretty shocking that both XST and Vivado generate incorrect code without any error or warning. Clearly, this is a bug and makes me wonder what other constructs it is implementing incorrectly. You can download the complete synchronous FIFO design here.