Synchronous FIFO

I do a lot of logic design for Xilinx FPGAs, and I really like the tools that they use. Ironically, I hate their GUIs, but since I always run their tools from the command line using GNU Make, that’s not a big drawback– except when it comes to their CoreGen tool. I really hate that tool. It;s virtually impossible to run it from a command line, because it reads all its input from an XML file. The only way to make the XML file is from their GUI. You can modify the XML and run the tool again, but it won’t document the XML format, and the format can change from release to release. Another problem with CoreGen is that the modules it generates are not parameterized. Say you make a FIFO which is 31 bits wide. If a parameter changes, you need to fire up a GUI and point your mouse and click to build another FIFO. I’m sorry, but this is just wrong.

Now I suppose this just can’t be helped in some cases (like an ethernet MAC), but for many of the modules that CoreGen builds, you’re better off writing the module from scratch. You get a clean and portable design, and you don’t need to mess around with a separate simulation model. In general, I avoid CoreGen like the plague.

Declaration


`timescale 1ns/1ns
module sync_fifo
  #(
    parameter depth = 32,
    parameter width = 32,
    parameter log2_depth = log2(depth),
    parameter log2_depthp1 = log2(depth+1)
    )
  (
   input clk,
   input reset,
   input wr_enable,
   input rd_enable,
   output reg empty,
   output reg full,
   output [width-1:0] rd_data,
   input [width-1:0] wr_data,
   output reg [log2_depthp1-1:0] count
   );

The module essentially takes two parameters the depth and the width of the FIFO. The other two parameters are caused to two different Xilinx XST bugs. The first has to do with constant functions. XST doesn’t treat expressions involving parameters and constant functions as constant expressions unless the expression is in the right-hand side of a parameter declaration. The second bug is that XST does not support local parameters, so they can’t be declared with localparam. The ports contain all the usual suspects, and also include a count of the number of entries in the FIFO. Note that the width of the count may be larger than the width of the memory address. For example, in the default case of 32 bits, the memory address is 5 bits wide because it must span the range 0..31, but the count will be 6 bits wide because it needs to span the range 0..32 since there can be 32 entries in the FIFO.

Local functions

These two functions are used in the design, so I list them here. The first one returns the ceiling of the log base two of its input.


  function integer log2;
    input [31:0] value;
    begin
      value = value-1;
      for (log2=0; value>0; log2=log2+1)
	value = value>>1;
    end
  endfunction

This next one is used to increment the read and write pointers. Note that we can’t just rely on the pointer wrapping, because we don’t assume the FIFO is a power of two deep. If the FIFO is a power of two in depth, then the synthesis tool will optimize out the additional logic. Therefore, with this design you get the best of both worlds.

The read and write pointers

Next, we’ll update the read and write pointers. First I declare some signals which indicate whether the FIFO is reading or writing. It’s not just a matter of looking at the enable!


  wire writing = wr_enable && (rd_enable || !full);
  wire reading = rd_enable && !empty;

Here is the code for the read pointer. I don’t usually use a combinational next signal, but in this case, we’re going to need to, since then, we can have the synthesis tool infer a read-first style RAM. If you want to use a Xilinx block RAM, this is a necessity.


  reg [log2_depth-1:0] next_rd_ptr;
  always @(*)
    if (reset)
      next_rd_ptr = 0;
    else if (reading)
      next_rd_ptr = increment(rd_ptr);
    else
      next_rd_ptr = rd_ptr;

The write pointer doesn’t need the next variable, but I use the same style for symmetry.


  reg [log2_depth-1:0] wr_ptr;
  reg [log2_depth-1:0] next_wr_ptr;
  always @(*)
    if (reset)
      next_wr_ptr = 0;
    else if (writing)
      next_wr_ptr = increment(wr_ptr);
    else
      next_wr_ptr = wr_ptr;

  always @(posedge clk)
    wr_ptr <= next_wr_ptr;

The status and count outputs

I just use a counter to keep track of the reads and writes.


  always @(posedge clk)
    if (reset)
      count <= 0;
    else if (writing && !reading)
      count <= count+1;
    else if (reading && !writing)
      count <= count-1;

Empty and full are pretty simple. We need to know a couple things, though. First, they don’t depend on count. In many applications, you don’t need to know how many entries there are in the FIFO. In these cases, you simply leave the count output disconnected and the synthesis tool will remove the counter. If full or empty used the counter, then the logic cannot be removed. Also, I experimented with using unregistered values and the design was larger.


  always @(posedge clk)
    if (reset)
      empty <= 1;
    else if (reading && next_wr_ptr == next_rd_ptr && !full)
      empty <= 1;
    else
      if (writing && !reading)
	empty <= 0;
  
  always @(posedge clk)
    if (reset)
      full <= 0;
    else if (writing && next_wr_ptr == next_rd_ptr)
      full <= 1;
    else if (reading && !writing)
      full <= 0;

The memory

Finally, let’s get to the memory. Again, if you use CoreGen for your memories, you would need to generate another module for this and instantiate it here. But why would you? It takes less code to write the memory than to instantiate what CoreGen would have produced. Note that we need to infer a read-first memory. This is where the next_rd_ptr signal is needed.


  reg [width-1:0] mem [depth-1:0];
  always @(posedge clk)
    begin
      if (writing)
	mem[wr_ptr] <= wr_data;
      rd_ptr <= next_rd_ptr;
    end

  assign rd_data = mem[rd_ptr];

Wrap-up

So that’s about it. We now have simple RTL synchronous FIFO that synthesizes to a nice compact implementation, is portable across vendors, and simulates nicely. And best of all? No GUI!

I’ve attached the whole design here in case you want to use it. Just keep the copyright header intact, and of course, there is no warranty.

18 thoughts on “Synchronous FIFO”

sean on June 14, 2010 at 2:06 am said:

Hi – nice implementation – just to point out that you have assigned to rd_ptr from 2 separate blocks – just a typo i would say?
admin on April 13, 2011 at 3:44 pm said:

Good catch. yes the first block where it was assigned should not have been there. I have changed the text. Thanks.
Rohan on April 18, 2011 at 10:02 am said:

@admin Very nice code. I’m doing a project where I’m trying to create a queue data-structure to handle the reception and transmission of packets in a vehicular Ad-hoc network by using an FPGA. I’m quite a novice at embedded programming, and I was wondering whether this code would be able to to compile onto an Altera FPGA (using Quartus). At the moment, I’m using ModelSim to compile and test my code, and I’m supposed to synthesize it onto an FPGA eventually.
admin on April 18, 2011 at 10:12 am said:

This should compile into Altera. I have not tried it though. One of the advantages of using inferred memories is that they are more vendor independent. Try it and let me know how it goes.
Rohan on April 18, 2011 at 10:43 am said:

@admin Thanks for the speedy response. Do you have any suggestions for resources I could utilize for writing the test-bench and figuring out how to customize your code to fit my needs? Thanks!
Akhil Kumar on November 26, 2012 at 4:29 am said:

Hi Admin,

Thanx for the code. But I have some questions:

1. How will the design work if following case occur?

-> Suppose only write events occurred and no read event occurred.
-> Now FIFO is full
-> One read happens means from Ist location of FIFO
-> Now first location is empty but write pointer is at its maximum value
-> Now one write event occurs as full is zero
-> But as in code there is no logic to initialize the pointers to zero (they can be set to zero only when reset is asserted). How the written data will be stored at the first location as it is empty? Is maximum_value_counter + 1 = 0?

2. When reset is asserted next_rd_ptr and next_wr_ptr are initialized to zero. Now suppose a write access occurs, will next_wr_ptr get incremented at the same clock on which access has been requested or will it get incremented on the next clock?
admin on November 26, 2012 at 8:46 am said:

So, for case 1. If you reset the FIFO and then fill it up, the write pointer will point to where the read pointer points. In this case zero. The increment function will increment the write pointer modulo the size of the FIFO. Do if the FIFO depth is n, after n writes with no reads after a reset, the write pointer will be zero. Once a read occurs, the next write will be to address zero. In other words, the code to set the pointer back to zero is in the increment function.

For the second case, the next_rd_ptr and next_wr_ptr values are not clocked at all. They are combinational values so update whenever the inputs to the always block update. The net effect is that the rd_ptr and wr_ptr values update on the clock where the read or write signal is received.
Akhil Kumar on December 1, 2012 at 1:13 am said:

Admin, thanx for your reply. Is this code synthesizable? Can we declare memory such that as you declared in RTL? or Do we need to instantiate a separate memory model in the FIFO?
Akhil Kumar on December 2, 2012 at 2:26 am said:

Admin, do the above code support simultaneous write and read?
admin on December 3, 2012 at 7:24 am said:

Yes, the code is synthesizable. You need a synthesis flow which can infer memories. That is available for all FPGA vendors, and with many ASIC vendors when using a behavioral synthesis tool. The FIFO also supports simultaneous reads and writes.
Bijoy on February 6, 2014 at 9:22 pm said:

Hi Admin

You said the ‘The increment function will increment the write pointer modulo the size of the FIFO’

But to ‘increment’ function you are not passing the size of FIFO, so how does it wrap the write or read pointer to module the size ?
Bijoy
admin on February 6, 2014 at 10:41 pm said:

Sorry, the increment function is not shown in the text of the post. I should fix that. You can see it in the link that lets you download the code.

To answer your question, the function can use the depth module parameter even though it is not passed to the function.
Sowmya on April 1, 2014 at 11:28 am said:

Hi,
I tried to simulate this code in Xilinx ISE.
When clicked on ‘simulate behavior model’, I’m getting
ERROR: no Simulation Engine found in the Transform Input that matches sync_fifotbw_isim_beh.exe
Please help 🙁
admin on April 1, 2014 at 1:30 pm said:

Sorry, I have never use Isim or ISE on Windows, so I don’t have much to offer. You could try to simulate it using Icarus Verilog. If you do you need to remove the log2 function and instead use the Verilog $clog2 builtin function. Icarus Verilog does not support constant functions.
walrus1998headshothippo on January 14, 2016 at 4:48 am said:

Thanks for sharing 🙂
Shaun Heims on January 20, 2016 at 1:16 pm said:

Nice Posting. Appreciate it!
TRAN MINHHAI on March 26, 2016 at 10:51 pm said:

Thank you!!!
subashri on April 6, 2016 at 3:03 am said:

Thank you… it helps me well…

Beyond Circuits

Thoughts on logic design that goes beyond circuits.