简体   繁体   中英

How to fix long compilation for Verilog HDL in quartus

I've been trying to create a counting sort algorithm using Verilog HDL, but when I tried to compile this iteration of it, Quartus started to compile it for a really long time. I can't figure out what is the issue.

module sort(reset, clk, data_in0,data_in1,data_in2,data_in3,data_in4,data_in5,data_in6,data_in7,data_in8,data_in9, data_out0, data_out1, data_out2, data_out3, data_out4, data_out5, data_out6, data_out7, data_out8, data_out9);

input wire reset, clk;

input wire [1:0] data_in0;

input wire [1:0] data_in1;

input wire [1:0] data_in2;

input wire [1:0] data_in3;

input wire [1:0] data_in4;

input wire [1:0] data_in5;

input wire [1:0] data_in6;

input wire [1:0] data_in7;

input wire [1:0] data_in8;

input wire [1:0] data_in9;

output reg [1:0] data_out0;

output reg [1:0] data_out1;

output reg [1:0] data_out2;

output reg [1:0] data_out3;

output reg [1:0] data_out4;

output reg [1:0] data_out5;

output reg [1:0] data_out6;

output reg [1:0] data_out7;

output reg [1:0] data_out8;

output reg [1:0] data_out9;

reg [1:0] mem [9:0];


reg[9:0] buff [3:0];
integer i,k,j,f,s;

always@ (posedge clk)

begin

    for(i=0; i<4; i=i+1)
    buff[i]<=0;
    if (reset == 1) begin

    for (i = 0; i < 10; i = i + 1) mem[i]<=0;
    s=0;
    f=0;


end

else begin
if (f==0)begin
mem [0] <= data_in0;
mem[1]<=data_in1;

mem[2]<=data_in2;

mem[3]<=data_in3;

mem[4]<=data_in4;

mem[5]<=data_in5;

mem[6]<=data_in6;

mem[7]<=data_in7;

mem[8]<=data_in8;

mem[9]<=data_in9;
f=1;
end
 for( i = 0; i <10 ; i=i+1)
begin
    buff[mem[i]]<=buff[mem[i]]+1;
end
if(s==0) begin
k<=0;
    for( i = 0; i <4 ; i=i+1)
    begin
        for( j = 0; j < 10 ; j = j +1)
        begin
            if(j<buff[i])
            begin
                mem[k]<=i;
                k<=k+1;
            end
        end
    end

end     s=1;    

data_out0 = mem[0];

data_out1 = mem[1];

data_out2 = mem[2];

data_out3 = mem[3];

data_out4 = mem[4];

data_out5 = mem[5];

data_out6 = mem[6];

data_out7 = mem[7];

data_out8 = mem[8];

data_out9 = mem[9];

end

end

endmodule

It takes a really long time to pass the Analysis and Synthesis section. I assume it is due to mistakes in this code or the wrong use of operators, but I can't understand where it is exactly.

for loops in Verilog don't work the way you seem to expect. This is not going to execute step by step, but the synthesis tool will try to unroll the loops, and as everything is contained within an always @(posedge clk) , it will do all unrolled statements in a single clock cycle. Rethink your module using state machines to achieve sequentiality.

This is an example of how a FSM-based solution for your problem would be. It can be vastly improved, though, but it's just a starting (and hopefully working) point.

For start, I've changed your module interface. Discrete inputs can be used, but as the algorithm uses indexes to run over the entire input domain, I've assume two external memories: one with the input data and another one which will hold the output data. The module drives the corresponding address bus for both memories, as well as the write enable signal, and data buses. There is also a busy signal so the rest of the system knows that the module has not yet finished sorting data. Finally, I've sorted 16 numbers instead of 10.

Internally, I've used a memory element, count , as the vector that holds the histogram of the input data. As this memory is tiny, I've used as four independent registers. This allows me to use more than one element of "count" in the same clock cycle, as in count[3] <= count[3] + count[2] + count[1] + count[0];

The version of the algorithm I've used is from Wikipedia: https://en.wikipedia.org/wiki/Counting_sort

function countingSort(array, k) is
  count ← new array of k zeros
  for i = 1 to length(array) do
    count[array[i]] ← count[array[i]] + 1
  for i = 2 to k do
    count[i] ← count[i] + count[i - 1]
  for i = length(array) downto 1 do
    output[count[array[i]]] ← array[i]
    count[array[i]] ← count[array[i]] - 1
  return output

And this is the Verilog module:

module sort (
  input wire clk,
  input wire reset,
  output reg [3:0] addr_data_in,
  input wire [1:0] data_in,
  output reg [3:0] addr_data_out,
  output reg [1:0] data_out,
  output reg write_data_out_strobe,
  output reg busy
);

/*
function countingSort(array, k) is
  count ← new array of k zeros
  for i = 1 to length(array) do
    count[array[i]] ← count[array[i]] + 1
  for i = 2 to k do
    count[i] ← count[i] + count[i - 1]
  for i = length(array) downto 1 do
    output[count[array[i]]] ← array[i]
    count[array[i]] ← count[array[i]] - 1
  return output
*/
  localparam
    ZERO         = 3'd0,
    MAKEHIST1    = 3'd1,
    MAKEHIST2    = 3'd2,
    PREFIXSUM    = 3'd3,
    PLACEOUTPUT1 = 3'd4,
    PLACEOUTPUT2 = 3'd5,
    IDLE         = 3'd7
    ;

  reg [4:0] count[0:3];
  reg [2:0] state = IDLE;
  reg [1:0] data;

  always @(posedge clk) begin
    if (reset == 1'b1) begin
      state <= ZERO;
      write_data_out_strobe <= 1'b0;
      busy <= 1'b1;
    end
    else begin
      case (state)
        ZERO:
        //count ← new array of k zeros
          begin
            count[0] <= 4'd0;
            count[1] <= 4'd0;
            count[2] <= 4'd0;
            count[3] <= 4'd0;
            addr_data_in <= 4'd0;
            state <= MAKEHIST1;
          end
        MAKEHIST1:
        //for i = 1 to length(array) do
        //  count[array[i]] ← count[array[i]] + 1
          begin
            data <= data_in;
            addr_data_in <= addr_data_in + 4'd1;
            state <= MAKEHIST2;
          end
        MAKEHIST2:
          begin
            count[data] <= count[data] + 4'd1;
            if (addr_data_in == 4'd0)
              state <= PREFIXSUM;
            else
              state <= MAKEHIST1;
          end
        PREFIXSUM:
        //for i = 2 to k do
        //  count[i] ← count[i] + count[i - 1]
          begin
            count[1] <= count[1] + count[0];
            count[2] <= count[2] + count[1] + count[0];
            count[3] <= count[3] + count[2] + count[1] + count[0];
            addr_data_in <= 4'd15;
            state <= PLACEOUTPUT1;
          end
        PLACEOUTPUT1:
        //for i = length(array) downto 1 do
        //  output[count[array[i]]] ← array[i]
        //  count[array[i]] ← count[array[i]] - 1
          begin
            data <= data_in;
            addr_data_in <= addr_data_in - 4'd1;
            write_data_out_strobe <= 1'b0;
            state <= PLACEOUTPUT2;
          end
        PLACEOUTPUT2:
          begin
            addr_data_out <= count[data] - 5'd1;
            data_out <= data;
            count[data] <= count[data] - 4'd1;
            write_data_out_strobe <= 1'b1;
            if (addr_data_in == 4'd15)
              state <= IDLE;
            else
              state <= PLACEOUTPUT1;
          end
        IDLE:
          begin
            write_data_out_strobe <= 1'b0;
            busy <= 1'b0;
          end
      endcase
    end  // of else
  end  // of always
endmodule

You can see that because of the way I'm using count, this will surely generate lots of muxes and decoders, just because I'm using a register value as the address for count[] in some places. However, I think this will synthesize much faster. Yosis can make it in a couple of seconds, FYI.

Besides, here you have a test bench for the above module:

module tb_counting_sort;
  reg clk, reset;
  wire [3:0] addr_data_in, addr_data_out;
  wire [1:0] data_in,data_out;
  wire write_data_out_strobe, busy;

  sort uut (
    .clk(clk),
    .reset(reset),
    .addr_data_in(addr_data_in),
    .data_in(data_in),
    .addr_data_out(addr_data_out),
    .data_out(data_out),
    .write_data_out_strobe(write_data_out_strobe),
    .busy(busy)
  );

  reg [1:0] vector_in[0:15];
  reg [1:0] vector_out[0:15];
  assign data_in = vector_in[addr_data_in];
  always @(posedge clk)
    if (write_data_out_strobe == 1'b1)
      vector_out[addr_data_out] <= data_out;

  integer i;
  initial begin
    vector_in[0]  = 2'd2;
    vector_in[1]  = 2'd1;
    vector_in[2]  = 2'd0;
    vector_in[3]  = 2'd0;
    vector_in[4]  = 2'd3;
    vector_in[5]  = 2'd1;
    vector_in[6]  = 2'd0;
    vector_in[7]  = 2'd2;
    vector_in[8]  = 2'd1;
    vector_in[9]  = 2'd1;
    vector_in[10] = 2'd3;
    vector_in[11] = 2'd3;
    vector_in[12] = 2'd3;
    vector_in[13] = 2'd2;
    vector_in[14] = 2'd1;
    vector_in[15] = 2'd0;

    reset = 1'b1;
    clk = 1'b0;
    repeat (2) 
      @(posedge clk);
    reset = 1'b0;

    @(negedge busy);
    for (i=0;i<16;i=i+1)
      $write ("%1d ", vector_out[i]);
    $display("");
    $finish;
  end

  always begin
    clk = #5 ~clk;
  end
endmodule

Both modules can be viewed, simulated or synthesized at EDAPlayground, here: https://www.edaplayground.com/x/6GLj

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM