2D-3D image registration - COMPLETE 6D ESTIMATION

Question

0 votos

Hello everyone,

I am simulating a 2D–3D image registration pipeline for 6D motion estimation (3D translation + rotation) from two 2D X-ray projections, based on the following paper:

[1] Fu & Kuduvalli, “A fast, accurate, and automatic 2D–3D image registration for image-guided cranial radiosurgery,” Med. Phys., 2008.

The pipeline consists of:

Generating a 3D CT volume with known fiducials
Applying a known 6D rigid transformation to the CT
Generating two 2D projections (A and B)
Performing accurate 2D–2D registration between each X-ray and its corresponding DRR
Estimating the final 6D transformation using geometric back-projection / analytical decomposition (as described in the paper)

Problem Description

The 2D image registrations are accurate and stable (x, y, in-plane rotation, and roll are recovered correctly for both projections).
However, the final 6D estimation shows significant error, especially in:
Out-of-plane translation
Out-of-plane rotations

Because the 2D registrations behave as expected, I suspect the issue is in the geometric back-projection / 2D-to-3D decomposition step, rather than in the similarity metric or optimization.

Specifically, I am unsure whether:

My implementation of the analytical back-projection equations is correct
The projection geometry (scaling, sign conventions, coordinate frames) is consistent between the two views
Additional geometric constraints or assumptions are required for observability

What I Am Looking For

I would greatly appreciate any insights or suggestions on:

Correct implementation of 2D-to-3D geometric back-projection for dual-view registration
Common pitfalls in coordinate systems, scaling factors, or sign conventions
Whether a cone-beam vs. parallel-beam assumption materially affects the 6D solution
Any recommended validation or sanity checks for this type of pipeline

I have attached the MATLAB code below for reference.

Thank you very much for your time and support.

Reference

[1] Fu, D., & Kuduvalli, G. (2008). A fast, accurate, and automatic 2D–3D image registration for image-guided cranial radiosurgery. Medical Physics, 35(5), 2180–2194.

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Follow Question

Answer 1

Matt J el 18 de Dic. de 2025

Editada: Matt J el 19 de Dic. de 2025

0 votos

I haven't read the paper, but I don't see how the proposed method would ever work as you describe it. 6D motion simply does not propagate into 2D projections in a way that 2D-2D registration (step 4) can model, whether cone beam or parallel beam. Additionally, registering the DRR's to each projection independetly ignores the coupled effects of 3D motion on each projection.

Since you say the CT subject contains fiducials, it seems to me like the best approach would be to find the fiducials in the projections and then triangulate their 3D positions stereoscopically. (In fact, I vaguely wonder if that's the true purpose of the 2D-2D registration -- to help you search for the fiducial projections in 2D.)

To do the triangulation, the Computer Vision Toolbox has a command triangulate which may be helpful. Once you've triangulated, the problem reduces to a 3D-3D point registration, which you can do with,

https://www.mathworks.com/matlabcentral/fileexchange/26186-absolute-orientation-horn-s-method?s_tid=prof_contriblnk

13 comentarios
Mostrar 11 comentarios más antiguos Ocultar 11 comentarios más antiguos

payam samadi el 19 de Dic. de 2025

Editada: payam samadi el 19 de Dic. de 2025

Thank you very much for your detailed and thoughtful comment.

Let me clarify my actual goal and the context of this work, since I think there may be some misunderstanding.

My objective is not a pure 2D–2D or feature-based reconstruction problem, but rather a 2D–3D image registration pipeline for estimating a 6D rigid transformation (3D translation + rotation) between a 3D CT volume and two 2D X-ray projections.

The workflow is explicitly inspired by Fu & Kuduvalli (2008), where the idea is to decompose the 6D motion into in-plane and out-of-plane components using dual projections.

In my setup:

I start from a known 3D CT volume

I generate synthetic 2D projections (AP and lateral)

I then estimate motion between CT-derived DRRs and the measured projections

So the 2D–2D registrations are not the final goal, but an intermediate step in a 2D–3D registration framework.

Regarding your point that “6D motion does not propagate into 2D projections in a way that 2D–2D registration can model” — I fully agree in the general case.

However, the Fu & Kuduvalli approach relies on:

Known imaging geometry

Small residual motion assumptions

Cone-beam perspective

Analytical coupling between dual views

Under these assumptions, certain combinations of in-plane shifts and rotations from two projections can be analytically mapped back to approximate 3D translations and rotations. This is exactly the part I am currently struggling with — specifically the geometric back-projection / decomposition step.

You also mentioned that independently registering each DRR to each projection ignores the coupled nature of 3D motion.

That is a very valid concern, and I suspect this may indeed be one of the weak points of my current implementation, especially if the forward projection model is not fully consistent.

Regarding fiducials:

ُIn real situation, some images have and some not. So, I want to test both situations. therefore, in this work my intention is to:

Study and reproduce a fiducial-less, intensity-based 2D–3D registration method

Understand the limitations of decomposing 6D motion from dual projections without explicit point correspondences

So while triangulation is an excellent practical solution, it is somewhat outside the scope of what I am trying to validate here.

Finally, to your last implicit question:

I actually have a separate, reliable implementation that independently estimates, for each projection (A and B), the in-plane parameters:

(xA,yA,thetaA),(xB,yB,thetaB)

My main question (and the reason for this post) is exactly this:

Given accurate per-projection in-plane motion estimates, is it theoretically and practically valid to recover a full 6D rigid transformation from these parameters alone, and under what geometric assumptions does this become observable?

At the moment, I strongly suspect that the issue lies in:

Either an incorrect geometric back-projection model

Or insufficient observability due to projection geometry (e.g., parallel beam, limited parallax, or symmetry)

Any insight into this specific coupling step would be extremely helpful.

Thank you again for your valuable feedback — I appreciate you taking the time to look at this problem

Matt J el 19 de Dic. de 2025

Editada: Matt J el 19 de Dic. de 2025

Given accurate per-projection in-plane motion estimates, is it theoretically and practically valid to recover a full 6D rigid transformation from these parameters alone, and under what geometric assumptions does this become observable?

Doesn't the paper tell you that? If it is possible, it is certainly not possible in a single, analytical step. It would have to be done iteratively. Possibly, the process you described is meant to be wrapped in an iterative loop? I.e., once you have the in-plane motion estimates, you are meant to somehow update the estimate of the 6D pose and recompute the DRRs?

To prove that it is not possible analytically, I offer the following example:

Imagine the CT subject is a centered cylinder with a single fiducial at the surface, centered in the AP view. Now let's rotate the cylinder by a small angle (3 degrees) about the scanner axis and see what the in-plane, per projection motion shows. The lateral projection will be almost unchanged, so its DRR registration will probably compute zero in-plane motion, effectively.

The AP view will see a slightly larger in-plane shift in the position of the fiducial, but the projection of the background cylinder will remain largely unchanged. Because most AP projection pixels remain unchanged, a rigid in-plane registration will yield unpredictable results, but it will assume some motion to account for the shift in the fiducial projection. A key point, though, is that the amount of the motion computed will depend on the size and density of the fiducial.

But because the computed in-plane motion depends on the size and density of the fiducial, it is impossible to use the motion parameters alone to back out the 3D rotation analytically. The same 3D motion can induce different projection registration results, depending on the composition of the phantom.

payam samadi el 21 de Dic. de 2025

Ver19.m

Perform 3D -2D image registration between the planning CT and the X-ray image

1. generate a digital reconstruction image (Digitally Reconstructed Radiographs, DRR) from 3D CT or CBCT data. These two sets of DDRs act as 2D reference images; One set is for Projection A, another for orthogonal Projection B.

2. Transform the 3D data space and generate the corresponding DRR image (in-plane and out-of-plane rotations in the 2D projections)

3. Multi-Phase Registration Framework;

Phase 1: Initial In-Plane Estimate (Goal to find in-plane transformation for each projection),

Phase 2: Initial Roll Estimate (goal to find the best-matching roll angle from the pre-computed DRR set), Phase 3: Iterative Refinement (goal to refine both the in-plane transformation and the roll angle to high accuracy)

Note that the similarity test value reaches the minimum value and converges (similarity measure was then performed between the DRR images and the X-Ray images).

4. 3D Pose Calculation

After both Projection A and Projection B have completed their independent registrations, their results are combined.

I developed this code, but I am still failing to simulate this article. Can you help me?

For more testm you can chages these number

ground_truth.translation = [35, -20, 19]; % Translation in mm [SI, LR, AP]

ground_truth.rotation = [3, -2.5, 1.5]; % Rotation in degrees [roll, pitch, yaw]

payam samadi el 22 de Dic. de 2025

Dear Matt

According to the article, the in the first phase, the in-plane transformation parameters (x,y, theta) are initially estimated using the nominal DRR image, which these three parameters are quickly searched via multiresolution matching, using the sum of squared difference (SSD) as the similarity measure.

Note, the desired pixel accuracy for the translations and the half-degree accuracy for the in-plane rotation are achieved.

In the second phase, the roll is separately searched in one dimension based on the approximate results of (x,y, theta) obtained in the first phase. The optimized pattern intensity is used as a similarity measure to determine the reference DRR image corresponding to a roll.

Note, the search space during this phase is the full search range of roll angles, sampled at 1° intervals for the initial estimation.

The third phase is an iterative process. The in-plane transformation and the roll are refined iteratively, using pattern intensity as the similarity measure for increased accuracy.

Note, the iteration process has two steps, the first consisting of the refinement of the previously estimated in-plane transformation via steep descent minimization, followed by the refinement of the roll using one-dimensional interpolation. These two steps are repeated for a specified number of iterations. The final registration results give the in-plane transformation and the roll.

The "calculatePatternIntensity" function is used for Similarity measure.

Search Method

Multiresolution matching is a fast search method designed to estimate the initial in-plane transformation parameters that will be refined in subsequent search phases.

First, three image resolutions are defined. The original x-ray images are used as the highest resolution images. The lower resolution images are generated by subsampling the original x-ray images by half in each dimension. Then the lowest resolution images are created by further subsampling by half in each dimension.

The basic idea of multiresolution matching is to match the images at each resolution level successively, starting with the lowest resolution. The results at the lower resolution serve as rough estimates of the in-plane transformation parameters x,y, theta. The output at the lower level is then passed on to the subsequent higher resolution level. The parameters x,y, theta are refined using the higher resolution images. In the final matching results, the accuracy of translations depends on the spatial resolution of the highest resolution images. The accuracy of in-plane rotation depends on the sampling intervals of the in-plane rotated DRR reference images.

Note, there is a risk that accompanies the use of multiresolution matching. It is possible that the estimate at lower levels may converge to local optima far from the global optima. In this case, the result of the further matching at subsequent higher resolution levels will most likely not converge to the global optima. To overcome this risk, multiple candidates of estimation are used. A sorted list of candidates with a best match from the lower level is passed to the higher resolution level. The more candidate displacements are used, the more reliable the estimates become. The best candidates are ranked by their SSD values.

Based on these infromation and two codes (Ver23 or Ver24), where do you think I made mistake?

payam samadi el 1 de En. de 2026

Ver45.zip

Dear Matt

I wanted to share an update after my recent attempt with the code (Ver45).

Line 64 loads the 3D CT volume (CT_13.nii.gz).

According to the article, I must generate a set of orthogonal X-ray projections (DRRs) from the 3D CT by rotating the volume around the Superior–Inferior (SI) axis. This corresponds to a roll rotation, implemented as rotation about the X-axis (as defined in the paper’s coordinate system). The roll range is −5° to +5° (step size = 1°, total of 11 angles).

This rotation and DRR generation process is triggered at line 136 using the function “generateAccurateDRRSets”.

Inside that function, the “generateDRR” function is called to produce orthogonal projections for each rotation angle. This results in 22 reference DRR datasets:

• 11 DRRs for Reference Set A

• 11 DRRs for Reference Set B

These two DRR sets simulate two detector panels (A and B like the Figure 1) and are used as the reference images for registration workflow described in the article.

6D Motion Simulation (Lines 152–184)

A full 6-degree-of-freedom (6D) movement is applied to the CT volume:

• Translations: X, Y, Z (mm)

• Rotations: Roll, Pitch, Yaw (degrees)

This transformation simulates patient/target displacement and generates a shifted X-ray projection for each detector (A and B).

Registration Step (Lines 187–192)

For each projection (A and B), the function “registerOneProjection” is used to estimate four parameters from each single 2D image:

• x → in-plane translation (lateral)

• y → in-plane translation (vertical)

• θ → in-plane rotation

• roll (θx) → rotation around SI axis

These four parameters are the required outputs before multi-view consistency is evaluated.

Note, the function "generateXrayWithShift" does not alter the CT volume. Essentially, this function is solely used to generate X-ray projections. Consequently, the outputs (xray_A_shift and xray_B_shift) produced between lines 170 and 181 have been moved (applying true movement based on Equations 1-3).

Consistency Check (Equations 9–10) — Lines 195–201

A consistency validation step is performed between the results from detector A and B.

This follows the mathematical conditions from Equations (9) and (10) of the paper.

If A and B do not agree (in the sense defined by the article), the solution is rejected or flagged as unstable.

Final 6D Estimation (Equation 4) - Lines 203-211

Once consistency is confirmed, the parameters from both projections are combined using Equation (4) from the article to compute the final 6D estimate.

This produces the estimated patient/object transformation in all degrees of freedom.

=====================================================

Known Issue: Roll Estimation Inaccuracy

The workflow generates outputs for multiple samples (e.g., 154–164), and most degrees of freedom behave correctly.

For example,

ground_truth.translation = [-20, 20, -2.75]; % y, z, x % Translation in mm [LR, AP, SI]

ground_truth.rotation = [-1, 5, -2]; % Rotation in degrees [roll(X), pitch(Y), yaw(Z)]

However, the final estimated roll (θx) is consistently inaccurate. The issue is traced to the current implementation of “registerOneProjection”.

=== Final Estimated 3D Motion ===

Tx (SI) = -2.725 mm, Ty (LR) = -20.003 mm, Tz (AP) = 20.003 mm

Roll (θx) = -0.024°, Pitch (θy) = 5.004°, Yaw (θz) = -1.990°

This function does not reliably recover true roll values, which leads to:

• incorrect roll output per projection.

• inconsistency violations in Equations (9–10).

I would greatly appreciate any help you can provide.

Matt J el 1 de En. de 2026

Editada: Matt J el 1 de En. de 2026

I need to be clear. Nobody is going to read the full article you have posted. It's the kind of thing no one has time for. And, without reading the article, there is no way for forum participants to know why the novel algorithm that it proposes even should work in the first place (or at least why it should work in the mind of the article's author).

What you need to do is identify a piece of the algorithm that everyone can agree should work, even without reading the article, but which doesn't work, according to a coded demonstration that you will present. The community then has a chance of helping you understand why that section is not working.

payam samadi el 2 de En. de 2026

Dear Matt

Thank you for your response and your time.

First, I have to say that this code (according to the article) has three parts:

Phase 1: Initial In-Plane Estimate (Goal to find in-plane transformation for each projection [x,y, theta]),

Phase 2: Initial Roll Estimate (goal to find the best-matching roll angle from the pre-computed DRR set [roll part]),

Phase 3: Iterative Refinement (goal to refine both the in-plane transformation and the roll angle to high accuracy).

To test Phase 1, please reviewed lines 143-148, which call the "validateRegistrationAccuracy" function. Inside this function, I included some test cases applied to the X-ray images (reference images). The results were calculated and compared with the true test cases, using an error metric. For this test, I assumed no Roll for simplicity. The results passed accurately, allowing me to proceed to the next phases (2 and 3) to estimate the true roll, which I am currently having problems with.

Iniciar sesión para comentar.

2D-3D image registration - COMPLETE 6D ESTIMATION

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Respuestas (1)

13 comentarios
Mostrar 11 comentarios más antiguos Ocultar 11 comentarios más antiguos

Categorías

Etiquetas

Community Treasure Hunt

2D-3D image registration - COMPLETE 6D ESTIMATION

0 comentarios Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Respuestas (1)

13 comentarios Mostrar 11 comentarios más antiguos Ocultar 11 comentarios más antiguos

Categorías

Etiquetas

Ver también

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

13 comentarios
Mostrar 11 comentarios más antiguos Ocultar 11 comentarios más antiguos