rSqrt

Compute reciprocal square-root operation and simulate with latency

Libraries:
HDL Coder

Description

The rSqrt block performs the reciprocal square-root operation on the input data signal. The block has control signals that indicate whether the input and output data are valid. You can also specify the number of iterations of the algorithm and the latency strategy.

To use this block in your Simulink^® model, open the HDLMathLib library by entering this command in the MATLAB^® Command Window:

open_system("HDLMathLib")

You can simulate the rSqrt block with latency. For more information, see Latency Considerations.

Examples

Implement rSqrt Block with Control Signals

Implement the control-signal based reciprocal square root block and use it to generate HDL code.

Open Script

Ports

Input

expand all

dataIn — Input data signal
scalar | vector

Input signal to calculate the reciprocal square root, specified as a scalar or vector.

validIn — Whether input control signal is valid
scalar

Input control signal that indicates whether the input signal is valid, specified as a scalar.

Data Types: Boolean

Output

expand all

dataOut — Output data signal
scalar | vector

Output signal that is the reciprocal square root of the input signal, returned as a scalar or vector.

validOut — Whether output control signal is valid
scalar

Output control signal that indicates whether output signal is valid, returned as a scalar.

Data Types: Boolean

Parameters

expand all

Architecture — Architecture used
`RecipSqrtNewtonSingleRate` (default)

Select the architecture for rSqrt block.

Programmatic Use

Block Parameter: architecture

Type: character vector

Values: RecipSqrtNewtonSingleRate

Default: 'RecipSqrtNewtonSingleRate'

Number of iterations — Number of iterations used
`3` (default) | `Integer`

Specify the number of iterations for rSqrt algorithm.

Programmatic Use

Block Parameter: numOfIterations

Type: character vector

Values:

Integer
                values

Default: '3'

Latency strategy — Latency strategy
`Max` (default) | `Min` | `Custom` | `Custom(PerIteration)` | `Zero`

Specify whether to use minimum, maximum, custom, or zero latency. For more information, see Latency Strategy.

To use custom latency for the block, set the Latency strategy to Custom and enter the latency value in the Custom latency field.

You can also control the number of pipeline stages for the iterative algorithm. To customize the latency for iterative algorithm, set the Latency strategy to Custom(PerIteration) and enter the iterations per pipeline value in the IterationsPerPipeline field. (since R2025a)

Programmatic Use

Block Parameter: latencyMode

Type: character vector

Values: 'Max' | 'Min' | 'Custom' | 'Custom(PerIteration)' | 'Zero'

Default: 'Max'

Custom latency — Specify the custom latency value
0 (default)

When you set Latency strategy to Custom, use this parameter to specify the custom latency value. The latency must be a nonnegative integer in the range [0, L], where L is the maximum latency value of rSqrt block. For more information, see CustomLatency.

Dependency

To use this parameter, set Latency strategy to Custom.

Programmatic Use

Block Parameter: customLatencyValue

Type: Integer

Values:

0 to Max
                latency

Default: 0

IterationsPerPipeline — Iterations per pipeline
`1` (default) | `positive integer`

Since R2025a

Specify the iterations to use per each pipeline stage in the algorithm.

Dependency

To enable this parameter, set Latency strategy to Custom(PerIteration).

Programmatic Use

Block Parameter: iterationsPerPipelineValue

Type: Integer

Values: Positive integer

Default: 1

Output data type — Select the output data type of the block
`Inherit: Inherit via internal rule` (default) | `Inherit: Same as first input` | `Inherit: Inherit via back propagation` | `int8` | `uint8` | `int16` | `uint16` | `int32` | `uint32` | `int64` | `uint64` | `fixdt(1,16)` | `fixdt(1,16,0)` | `<data type expression>`

Specify the output data type. The data type can be inherited or specified directly.

Programmatic Use

Block Parameter: OutDataTypeStr

Type: character vector

Values:

'Inherit: Inherit via
                  internal rule'

'Inherit: Inherit via back
                  propagation'

Default:

'Inherit: Inherit via
                  internal rule'

Saturate on integer overflow — Choose the behavior when integer overflow occurs
`off` (default) | `on`

Action Reasons for Taking This Action What Happens for Overflows Example

Action	Reasons for Taking This Action	What Happens for Overflows	Example
Select this check box.	Your model has possible overflow, and you want explicit saturation protection in the generated code.	Overflows saturate to either the minimum or maximum value that the data type can represent.	The maximum value that the `int8` (signed, 8-bit integer) data type can represent is 127. Any block operation result greater than this maximum value causes overflow of the 8-bit integer. With the check box selected, the block output saturates at 127. Similarly, the block output saturates at a minimum output value of -128.
Do not select this check box.	You want to optimize efficiency of your generated code. You want to avoid overspecifying how a block handles out-of-range signals. For more information, see Troubleshoot Signal Range Errors.	Overflows wrap to the value that is representable by the data type.	The maximum value that the `int8` (signed, 8-bit integer) data type can represent is 127. Any block operation result greater than this maximum value causes overflow of the 8-bit integer. With the check box cleared, the software interprets the overflow-causing value as `int8`, which can produce an unintended result. For example, a block result of 130 (binary 1000 0010) expressed as `int8`, is -126.

Select this check box.

Your model has possible overflow, and you want explicit saturation protection in the generated code.

Overflows saturate to either the minimum or maximum value that the data type can represent.

The maximum value that the int8 (signed, 8-bit integer) data type can represent is 127. Any block operation result greater than this maximum value causes overflow of the 8-bit integer. With the check box selected, the block output saturates at 127. Similarly, the block output saturates at a minimum output value of -128.

Do not select this check box.

You want to optimize efficiency of your generated code.

You want to avoid overspecifying how a block handles out-of-range signals. For more information, see Troubleshoot Signal Range Errors.

Overflows wrap to the value that is representable by the data type.

The maximum value that the int8 (signed, 8-bit integer) data type can represent is 127. Any block operation result greater than this maximum value causes overflow of the 8-bit integer. With the check box cleared, the software interprets the overflow-causing value as int8, which can produce an unintended result. For example, a block result of 130 (binary 1000 0010) expressed as int8, is -126.

When you select this check box, saturation applies to every internal operation on the block, not just the output or result. Usually, the code generation process can detect when overflow is not possible. In this case, the code generator does not produce saturation code.

Programmatic Use

Block Parameter: SaturateOnIntegerOverflow

Type: character vector

Value: 'off' | 'on'

Default: 'off'

Integer rounding mode — Rounding mode for fixed-point operations
`Floor` (default) | `Ceiling` | `Convergent` | `Nearest` | `Round` | `Simplest` | `Zero`

Specify the rounding mode for fixed-point operations. For more information, see Rounding Modes.

Programmatic Use

Block Parameter: RndMeth

Type: character vector

Values:

'Ceiling' | 'Convergent' | 'Floor' | 'Nearest' | 'Round' | 'Simplest' |
                  'Zero'

Default: 'Floor'

Algorithms

expand all

Latency Considerations

The rSqrt block is a masked subsystem that contains the LumpLatencyMATLAB Function block. The subsystem uses this MATLAB Function block to compute the latency based on the Number of iterations. To view the function that computes the latency of the block, open the LumpLatency block in the masked subsystem. To view inside the mask, click the ⇩ icon on the block.

This table shows how the block calculates the latency based on the setting of the Latency strategy parameter:

Latency Strategy	Latency Value (L)
`Max`	Uses maximum latency by using the equation L = (N * 4) + 5, where N is the value of the Number of iterations parameter.
`Min`	Uses minimum latency by using the equation L = 2 + `ceil`(((N * 4) - 1) / 3)
`Custom`	Specifies a custom latency value. To specify the latency, enter a value between zero and the maximum latency in the Custom latency parameter. For more information, see Custom latency.
`Custom(PerIteration)`	Use this setting to control the pipeline stages for the iterative algorithm. Specify the number of pipeline stages per iteration using the IterationsPerPipeline parameter. The block uses the equation L = 1 + `ceil`((N4) / K), where K is the value of the IterationsPerPipeline* parameter.
`Zero`	The latency of the block is `0`.

Pipeline Customization

The rSqrt block uses pipelined architectures to implement the Newton-Raphson-based reciprocal square-root algorithm. By default, the block uses the maximum latency, which depends on the Number of iterations parameter. The block performs a single iteration per pipeline stage. For example, if you set the Number of iterations to 15, the latency of the block is 65 based on the maximum latency equation in Latency Considerations. When you increase number of iterations, latency of the block also increases.

You can customize the latency for the iterative algorithm by setting the Latency Strategy to Custom(PerIteration), which allows you to control the number of iterations per pipeline stages. For example, if you set the Number of iterations to 15 and you want the block to perform the iterations in three pipeline stages, then set the IterationsPerPipeline to 5. By using the Custom(PerIteration) latency strategy, the latency of the block reduces to 13.

Extended Capabilities

expand all

HDL Code Generation
Generate VHDL, Verilog and SystemVerilog code for FPGA and ASIC designs using HDL Coder™.

The block supports HDL code generation using HDL Coder™. HDL Coder provides additional configuration options that affect HDL implementation and synthesized logic.

HDL Architecture

Architecture Description

Module (default) Generate code for the subsystem and the blocks within the subsystem.

Architecture	Description
`Module` (default)	Generate code for the subsystem and the blocks within the subsystem.
`BlackBox`	Generate a black box interface. The generated HDL code includes only the input/output port definitions for the subsystem. Therefore, you can use a subsystem in your model to generate an interface to existing, manually written HDL code. The black-box interface generation for subsystems is similar to the Model block interface generation without the clock signals.
`No HDL`	Remove the subsystem from the generated code. You can use the subsystem in simulation, however, treat it as a “no-op” in the HDL code.

BlackBox

Generate a black box interface. The generated HDL code includes only the input/output port definitions for the subsystem. Therefore, you can use a subsystem in your model to generate an interface to existing, manually written HDL code.

The black-box interface generation for subsystems is similar to the Model block interface generation without the clock signals.

No HDL

Remove the subsystem from the generated code. You can use the subsystem in simulation, however, treat it as a “no-op” in the HDL code.

HDL Block Properties

General
AdaptivePipelining	Automatic pipeline insertion based on the synthesis tool, target frequency, and multiplier word-lengths. The default is `inherit`. See also AdaptivePipelining.
BalanceDelays	Detects introduction of new delays along one path and inserts matching delays on the other paths. The default is `inherit`. See also BalanceDelays.
ClockRatePipelining	Insert pipeline registers at a faster clock rate instead of the slower data rate. The default is `inherit`. See also ClockRatePipelining.
ConstrainedOutputPipeline	Number of registers to place at the outputs by moving existing delays within your design. Distributed pipelining does not redistribute these registers. The default is `0`. For more details, see ConstrainedOutputPipeline.
DistributedPipelining	Pipeline register distribution, or register retiming. The default is `inherit`. See also DistributedPipelining.
FlattenHierarchy	Remove subsystem hierarchy from generated HDL code. The default is `inherit`. See also FlattenHierarchy.
InputPipeline	Number of input pipeline stages to insert in the generated code. Distributed pipelining and constrained output pipelining can move these registers. The default is `0`. For more details, see InputPipeline.
OutputPipeline	Number of output pipeline stages to insert in the generated code. Distributed pipelining and constrained output pipelining can move these registers. The default is `0`. For more details, see OutputPipeline.
SharingFactor	Number of functionally equivalent resources to map to a single shared resource. The default is 0. See also Resource Sharing.
StreamingFactor	Number of parallel data paths, or vectors, that are time multiplexed to transform into serial, scalar data paths. The default is 0, which implements fully parallel data paths. See also Streaming.
SynthesisAttributes	Specifies the synthesis attributes for the blocks and block output signals in the model. The generated HDL code contains these attributes. For more information, see SynthesisAttributes.

Target Specification

This block cannot be the DUT, so the block property settings in the Target Specification tab are ignored.

Limitations

The block does not support vector inputs.
The block does not support bus inputs.
Cannot be used in Synchronous Subsystem.
Does not support resource sharing optimization.

Version History

Introduced in R2020b

expand all

R2026a: DSPStyle HDL block property has been removed

The DSPStyle HDL block property has been removed. To specify synthesis attributes for multiplier mapping, use the SynthesisAttributes HDL block property instead.

R2026a: Specify synthesis attributes for blocks and block output signals

Use the SynthesisAttributes HDL block property to specify the synthesis attributes for the block and its output signals. HDL Coder includes these attributes in the generated HDL code.

R2025a: Control number of pipeline stages per iteration

You can control the pipeline stages for iterative algorithms by setting the LatencyStrategy parameter HDL to Custom(PerIterations), then specifying the number of pipeline stages per iteration by using the IterationsPerPipeline parameter. Use this setting to control the pipeline stages in the generated code and optimize the design for speed and resource utilization.

rSqrt

Description

Examples

Implement rSqrt Block with Control Signals

Ports

Input

dataIn — Input data signal scalar | vector

validIn — Whether input control signal is valid scalar

Output

dataOut — Output data signal scalar | vector

validOut — Whether output control signal is valid scalar

Parameters

Architecture — Architecture used RecipSqrtNewtonSingleRate (default)

Programmatic Use

Number of iterations — Number of iterations used 3 (default) | Integer

Programmatic Use

Latency strategy — Latency strategy Max (default) | Min | Custom | Custom(PerIteration) | Zero

Programmatic Use

Custom latency — Specify the custom latency value 0 (default)

Dependency

Programmatic Use

IterationsPerPipeline — Iterations per pipeline 1 (default) | positive integer

Dependency

Programmatic Use

Output data type — Select the output data type of the block Inherit: Inherit via internal rule (default) | Inherit: Same as first input | Inherit: Inherit via back propagation | int8 | uint8 | int16 | uint16 | int32 | uint32 | int64 | uint64 | fixdt(1,16) | fixdt(1,16,0) | <data type expression>

Programmatic Use

Saturate on integer overflow — Choose the behavior when integer overflow occurs off (default) | on

Programmatic Use

Integer rounding mode — Rounding mode for fixed-point operations Floor (default) | Ceiling | Convergent | Nearest | Round | Simplest | Zero

Programmatic Use

Algorithms

Latency Considerations

Pipeline Customization

Extended Capabilities

HDL Code Generation Generate VHDL, Verilog and SystemVerilog code for FPGA and ASIC designs using HDL Coder™.

Version History

R2026a: DSPStyle HDL block property has been removed

R2026a: Specify synthesis attributes for blocks and block output signals

R2025a: Control number of pipeline stages per iteration

See Also

dataIn — Input data signal
scalar | vector

validIn — Whether input control signal is valid
scalar

dataOut — Output data signal
scalar | vector

validOut — Whether output control signal is valid
scalar

Architecture — Architecture used
`RecipSqrtNewtonSingleRate` (default)

Number of iterations — Number of iterations used
`3` (default) | `Integer`

Latency strategy — Latency strategy
`Max` (default) | `Min` | `Custom` | `Custom(PerIteration)` | `Zero`

Custom latency — Specify the custom latency value
0 (default)

IterationsPerPipeline — Iterations per pipeline
`1` (default) | `positive integer`

Saturate on integer overflow — Choose the behavior when integer overflow occurs
`off` (default) | `on`

Integer rounding mode — Rounding mode for fixed-point operations
`Floor` (default) | `Ceiling` | `Convergent` | `Nearest` | `Round` | `Simplest` | `Zero`

HDL Code Generation
Generate VHDL, Verilog and SystemVerilog code for FPGA and ASIC designs using HDL Coder™.