Getting the Log Base 2 Algorithm to Synthesize

My last post introduced an algorithm for finding the log base 2 of a fixed point number. However, it had a gotcha. It had to use some floating point functions to initialize a table, and even though it is not synthesizing floating point, ISE, Vivado, and Quartus II all refuse to synthesize the design. What should we do?

Perl Preprocessor to the Rescue

In an older blog post, I discuss a Verilog Preprocessor that I wrote years ago. In the old days of Verilog ’95, preprocessors like this were practically required. The language was missing lots of features that made external solutions necessary,  but got largely fixed with Verilog 2001 and Verilog 2005. Now with Systemverilog, things are even better. However, sometimes tools don’t support all of the language features. In the last blog post, we discovered that the FPGA synthesis tools can’t handle the floating point functions used in the ROM initialization code.

Computing the lookup table in Perl

What we’ll do is write the ROM initialization code in Perl and use the preprocessor to generate the table before the synthesis tool sees it.

The code which gives the synthesis tools fits is this:

  function real rlog2;
    input real x;
    begin
      rlog2 = $ln(x)/$ln(2);
    end
  endfunction

  reg [out_frac-1:0] lut[(1<<lut_precision)-1:0];
  integer i;
  initial
    begin : init_lut
      lut[0] = 0;
      for (i=1; i<1<<lut_precision; i=i+1)
        lut[i] = $rtoi(rlog2(1.0+$itor(i)/$itor(1<<lut_precision))*$itor(1<<out_frac)+0.5);
    end

We can essentially turn this code into Perl and embed it, so that the preprocessor will be able to generate it for us. Remember, the Perl code goes inside special Verilog comments.

First, we’re going to need to define some parameter values in Perl and their counterparts in Verilog. These define the maximum allowable depth and width of the lookup table.

  //@ my $max_lut_precision = 12;
  localparam max_lut_precision = $max_lut_precision;
  //@ my $max_lut_frac = 27;
  localparam max_lut_frac = $max_lut_frac;

Here is the Perl code to compute the lookup table values:

  /*@
   sub compute_lut_value {
     my $i = shift;
     return log(1.0+$i/(1<<$lut_precision))*(1<<$lut_frac)/log(2.0);
   }
   @*/

Then, we’ll embed the lookup table inside a Verilog function.

  function [max_lut_frac-1:0] full_lut_table;
    input integer i;
    begin
      case (i)
	//@ for my $i (0..(1<<$max_lut_precision)-1) {
	//@ my $h = sprintf("${max_lut_frac}'h%x",int(compute_lut_value($i)+0.5));
	$i: full_lut_table = $h;
	//@ }
      endcase
    end
  endfunction

We’re also going to parallel the Perl code with a pure Verilog function, just like our local parameters.

  function [out_frac-1:0] compute_lut_value;
    input integer i;
    begin
      compute_lut_value = $rtoi(rlog2(1.0+$itor(i)/$itor(1<<lut_precision))*$itor(1<<out_frac)+0.5);
    end
  endfunction

But how do you parameterize it?

There happens to be one gotcha when working with preprocessors: you really don’t want to use them. Back in the Verilog ’95 days, my colleagues and I just used the preprocessor for every Verilog file. All of our parameterization was done there. We used a GNU Make flow, and it built all the Verilog files for us automatically. But with the advent of Verilog 2001 and SystemVerilog, a lot of things that we used the preprocessor for could be done within the language– and much better, too. One of the crufty things about using the preprocessor was that you needed to embed the parameter values in the module names. Otherwise, you would have module name conflicts for different modules with different parameter values. In this case, we actually want the preprocessor to generate a parameterized Verilog module. We still want to use the normal Verilog parameter mechanism to control the width and depth of the lookup table.

To do this, we must generate a maximal table in the preprocessor, and then cut from that table, using Verilog, a subtable that has the desired width and depth based on our Verilog parameter values.

Here is the code to do that. If the values of the depth and width (precision and fractional width) exceed the maximal Perl values, then we just use the pure Verilog implementation and the code will not be synthesizable, but it will still work in simulation, at least. If the parameter values are “in bounds” for the preprocessor-computed lookup table, then we’re going to go ahead and cut our actual table from the Perl generated lookup table, and the design will be synthesizable.

  reg [out_frac-1:0] lut[(1<<lut_precision)-1:0];
  integer i;
  generate
    // If the parameters are outside the bounds of the static lookup table then
    // compute the lookup table dynamically. This will not be synthesizable however
    // by most tools.
    if (lut_precision > max_lut_precision || out_frac > max_lut_frac)
      initial
	begin : init_lut_non_synth
	  lut[0] = 0;
	  for (i=1; i<1<<lut_precision; i=i+1)
	    begin
	      lut[i] = compute_lut_value(i);
	    end
	end
    else
      // The parameters are within bounds so we can use the precomputed table
      // and synthesize the design.
      initial
	begin : init_lut_synth
	  for (i=0; i<1<<lut_precision; i=i+1)
	    begin : loop
	      reg [max_lut_frac-1:0] table_value;
	      table_value = full_lut_table(i<<(max_lut_precision-lut_precision));
	      lut[i] = table_value[max_lut_frac-1:max_lut_frac-out_frac];
	    end
	end
  endgenerate

We can then finish off the design, just like in Computing the Logarithm Base 2

Conclusion

This may be a lot to take in. But the gist is that you build a table using the Perl preprocessor, which is large enough to use for all parameter values. Then in Verilog, you use the actual parameter values to cut out the portion of the pre-computed table that you need. This cutting out can be done during the elaboration or initialization stages of the synthesis. Of course, our job would be much easier if the synthesis tool developers got it into their heads that using floating point and math during elaboration or initialization does not necessitate synthesizing floating point logic.

If anyone knows of a software floating point library written purely in Verilog, please let me know. We could then use that to trick the synthesis tools into doing what we want.

Oh, and here’s a link to the complete file.

Detecting the rising edge of a short pulse

A reader is going through my ZedBoard tutorial and had some questions about detecting the rising edge of a pulse. The tutorial in question is using a ZedBoard to make a stopwatch. Kind of overkill in terms of hardware, but you have to start somewhere when you’re learning to code.

TRAN MINHHAI’s question asked: what do you do when the rising edge might just be a pulse, and the pulse might last less than a single clock cycle? The answer is to use a flip-flop with the input signal going to an asynchronous set input. The data input is just zero, and the clock signal is the one we are synchronizing to.

Here is the code:


`timescale 1ns/1ns
module edge_detect
  (
   input clk,
   input btnl,
   output btnl_rise
   );

  reg btnl_held = 0;
  always @(posedge clk or posedge btnl)
    if (btnl)
      btnl_held <= 1;
    else
      btnl_held <= 0;

  reg [1:0] btnl_shift = 0;
  always @(posedge clk)
    btnl_shift <= {btnl_shift,btnl_held};
  
  assign btnl_rise = btnl_shift == 2'b01;
endmodule

I also wrote a little test to go with it. Notice how short little pulses can come anywhere with respect to the clock edge:


`timescale 1ns/1ns
module edge_detect_test;

  reg clk = 0;
  always #10 clk = ~clk;

  reg btnl = 0;

  wire btnl_rise;

  edge_detect edge_detect(clk,btnl,btnl_rise);

  initial
    begin
      $dumpvars(0);
      
      @(posedge clk)
	btnl <= 1;
      repeat (10) @(posedge clk);
      btnl <= 0;
      repeat (4) @(posedge clk);
      #10 btnl <= 1;
      #1 btnl <= 0;
      repeat (4) @(posedge clk);
      #19 btnl <= 1;
      #1 btnl <= 0;
      repeat (4) @(posedge clk);
      #20 btnl <= 1;
      #1 btnl <= 0;
      repeat (4) @(posedge clk);
      $finish;
    end
endmodule

The long pulse case

In the case of a long pulse, the design works just like it would without the asynchronous set flip-flop. Here’s a timing diagram:

The short pulse case

But if there is a short pulse, the asynchronous set flip-flop holds the input value until there is a clock edge.

If the pulse appears in the middle of the clock cycle, then the timing diagram looks like this:

If the pulse appears right before the clock edge, then the timing diagram looks like this:

Now you know how to synchronize the rising edge of a pulse even if you have a slow clock.