Algorithm

Computing the log base 2 of a whole number is easy: just use a priority encoder. But what about a fixed point number? That sounds a lot harder. But, like many things, there’s a trick to make things easier.

Consider the following equation:

$\log_2{x} = \log_2 (2^n \cdot 2^{-n} \cdot x) = n + \log_2(2^{-n} \cdot x)$

If we choose $n$ such that $2^{-n}\cdot x\in [1,2)$ , then $n$ is just the whole number portion of the log. and we can compute that with a priority encoder. Furthermore, we can compute the $2^{-n}\cdot x$ portion by barrel-shifting the original number. Finally, we can use a lookup table to compute the lower bits. The nice thing about this is that the lookup table can be much smaller, since we only need to store the values between 1 and 2.

Another way to think about this is that the logarithm curve from [2,4) is the same as the one from [1,2), just shifted up by 2 and scaled horizontally by 2. Likewise the curve from [4,8) is scaled the same way. The algorithm is just taking advantage of this symmetry.

Oh– additionally, this algorithm only works on input values greater or equal to one, so the input and output values are always positive.

Implementation

Now, let’s code up our Verilog module. We’ll start with the module declaration:

module log2
 #(
 parameter in_int = 16,
 parameter in_frac = 8,
 parameter out_int = $clog2(in_int),
 parameter out_frac = in_int+in_frac-out_int,
 parameter lut_precision = 6,
 parameter register_output_stage = 0
 )

Since the module is dealing with a fixed-point input and output value, we need to specify how many integer and fractional bits there are in the input and output values. The lut_precision parameter specifies the log base 2 of the number of entries in our lookup table, and the default setting will be 64 entries in the table. There is also a parameter which allows for an optional final output register stage.

Next come the module ports:

  (
   input clk,
   input [in_int+in_frac-1:0] din,
   input din_valid,             // data in is valid
   output din_ready,            // ready to receive data
   output reg [out_int+out_frac-1:0] dout,
   output reg dout_valid,       // data out is ready (valid)
   output reg dout_error        // data out is incorrect - input data was less than 1.0
   );

We have the clock clk and the input value din. A handshake input indicating that the din value is valid. There is an output signal din_ready indicating that the module is ready to accept an input value. There is, of course, the output value dout, and a valid signal dout_valid. Since it is possible to provide an illegal input between 0 and 1, there is also a dout_error signal indicating the output value is not valid due to an inadmissible input value.
Now, let’s look at the body of the module:

  assign din_ready = 1;

Ready? Please. We’re always ready.

Pipeline Stage 1

Next, we instantiate the recursive priority encoder from my previous blog post. This takes the integer portion of the input value and produces the integer portion of the output value. There is also a prienc_error output which indicates that none of the input bits were set. This means we were given an input value that was less than one.

  wire [$clog2(in_int)-1:0] prienc_out;
  wire prienc_error;
  
  priencr #(in_int)
  priencr
    (
     .decode(din[in_frac+:in_int]),
     .encode(prienc_out),
     .valid(prienc_error)
     );

Next, we have the stage one pipeline logic:

  reg [$clog2(in_int)-1:0] stage1_prienc_out;
  reg stage1_error;
  reg stage1_valid;
  reg [in_int+in_frac-1:0] stage1_din;
  
  always @(posedge clk)
    begin
      stage1_din <= din;
      stage1_prienc_out <= prienc_out;
      stage1_error <= prienc_error;
      stage1_valid <= din_valid;
    end

Pipeline Stage 2

Stage two of the pipeline is the barrel shift logic. This shifts the input value to the left, based on the priority encoder output. Things are flipped, though, so lower priority encoder output values cause a larger shift. We’ll also pipeline the other signals:

  reg [in_int+in_frac-1:0] stage2_barrel_out;
  reg [out_int-1:0] stage2_barrel_out_int;
  reg stage2_error;
  reg stage2_valid;
  always @(posedge clk)
    begin
      stage2_barrel_out <= stage1_din << (in_int-stage1_prienc_out-1);
      stage2_barrel_out_int <= stage1_prienc_out;
      stage2_error <= stage1_error;
      stage2_valid <= stage1_valid;
    end

Pipeline Stage 3

The third pipeline stage is the lookup table, which computes the fractional part of the output. We will use an initial block to fill the table. First, we define a function to take the floating point log base-2 of an input value, since Verilog does not have one built in. Remember this rule from high school algebra?

  function real rlog2;
    input real x;
    begin
      rlog2 = $ln(x)/$ln(2);
    end
  endfunction

Next we declare the table and fill it in an initial block:

  reg [out_frac-1:0] lut[(1<<lut_precision)-1:0];
  integer i;
  initial
    begin : init_lut
      lut[0] = 0;
      for (i=1; i<1<<lut_precision; i=i+1)
        lut[i] = $rtoi(rlog2(1.0+$itor(i)/$itor(1<<lut_precision))*$itor(1<<out_frac)+0.5);
    end

Now, we need to use the barrel shift output as an address to the lookup table. We also carry along some of the stage 2 results:

  reg [out_frac-1:0] stage3_lut_out;
  reg [out_int-1:0] stage3_out_int;
  reg stage3_error;
  reg stage3_valid;
  always @(posedge clk)
    begin
      stage3_out_int <= stage2_barrel_out_int;
      stage3_lut_out <= lut[stage2_barrel_out[in_int+in_frac-2-:lut_precision]];
      stage3_error <= stage2_error;
      stage3_valid <= stage2_valid;
    end

Finally we have the code for the optional stage 4 pipeline. This is controlled by the register_output_stage parameter:

  generate
    if (register_output_stage)
      begin
        always @(posedge clk) dout = {stage3_out_int,stage3_lut_out};
        always @(posedge clk) dout_error = stage3_error;
        always @(posedge clk) dout_valid = stage3_valid;
      end
    else
      begin
        always @(*) dout = {stage3_out_int,stage3_lut_out};
        always @(*) dout_error = stage3_error;
        always @(*) dout_valid = stage3_valid;
      end
    endgenerate

Caveats

So there you have it: a module that computes log base 2 for fixed point inputs. However, there ‘s still one tiny problem: the design is not synthesizable by XST or Vivado. I haven’t tried other tools, but they may have issues with it as well.

The issue is that the tools don’t seem to implement the built-in Verilog floating point functions during elaboration. Essentially, the synthesis tool needs to run the initial block in order to know how to populate the lookup table. In general, the tools can do this. However, if they catch wind of a float value, they tuck their tail between their legs and run.

My workaround for this is to use the Perl preprocessor I describe in a previous blog post. But that’s a topic for another time.

Background

My FPGA builds are done on a server in the datacenter, which is helpful for a number of reasons, not the least of which is that it takes a lot of CPU, and the server has plenty of CPU power. However, it’s a pain to transfer the design files over to a local machine for programming and debugging. Here are instructions that you can use to run just a JTAG server on the lab machine, and connect to from the server machine.

Lab Machine

Install

You should install the Quartus II tools on the lab machine. There’s got to be a download for only the programming tools, but I haven’t found it yet. Download and install only the minimal amount. The machine needs to run the port mapper and xinetd. You may need to install those with yum or whatever other package manager your distribution requires.

Starting the Server

In the following, I assume that the environment variable $qII points to your Quartus II install directory.

You need to make sure that a file /etc/jtagd/jtagd.conf exists and has the following line in it:

Password = “mysecret”

Without that, the jtag daemon will not listen for remote machines.

First check that the JTAG software can see the board. Run the following command and verify that you get the correct output…

[pete@localhost ~]$ $qII/quartus/bin/jtagconfig
 1) CV SoCKit [1-1]
 02D020DD 5CSEBA6(.|ES)/5CSEMA6/..
 4BA00477 SOCVHPS

Then, run the following to start the server. You need to do this as root:

[pete@localhost ~]$ sudo $qII/quartus/bin/jtagd
[pete@localhost ~]$ sudo $qII/quartus/bin/jtagconfig --enableremote mysecret

Remote Machine

On your remote machine you should start up Quartus II, and:

Select Tools->Programmer. This will start the programmer window.
You should then click the Hardware Setup… button.
Click on the JTAG Settings tab.
Click the Add Server… button.
Enter the IP address of your server. Enter the password, greywolf.
The Hardware Setup window should indicate the server IP address and the connection status should be OK.
Click the Close button.
You should then see the IP address of the lab machine to the right of the Hardware Setup… button.

You should now be able to use the JTAG device from the remote machine.

Pro Tips

If your lab machine doesn’t have a public IP address– for instance, because it is a VM using NAT running on a host with a VPN connection to your server– you will need to set up an SSH tunnel to the lab machine.

The JTAG server runs on port 1309. You can use ssh to tunnel the connection.

From your VPN machine ssh to the remote machine with the following command:

[pete@localhost ~]$ ssh -L 1309:192.168.1.2:1309 user@myserver.mydomain.com

In this example, your guest machine has an IP address of 192.168.1.2, and the remote server is myserver.mydomain.com.

Then, you would use your VPN machine IP address as the address of the lab machine when connecting to the JTAG server.

Beyond Circuits

Thoughts on logic design that goes beyond circuits.

Monthly Archives: October 2015

Computing the Logarithm Base 2