2D-3D image registration - COMPLETE 6D ESTIMATION

Hello everyone,
I am simulating a 2D–3D image registration pipeline for 6D motion estimation (3D translation + rotation) from two 2D X-ray projections, based on the following paper:
[1] Fu & Kuduvalli, “A fast, accurate, and automatic 2D–3D image registration for image-guided cranial radiosurgery,” Med. Phys., 2008.
The pipeline consists of:
  1. Generating a 3D CT volume with known fiducials
  2. Applying a known 6D rigid transformation to the CT
  3. Generating two 2D projections (A and B)
  4. Performing accurate 2D–2D registration between each X-ray and its corresponding DRR
  5. Estimating the final 6D transformation using geometric back-projection / analytical decomposition (as described in the paper)
Problem Description
  • The 2D image registrations are accurate and stable (x, y, in-plane rotation, and roll are recovered correctly for both projections).
  • However, the final 6D estimation shows significant error, especially in:
  • Out-of-plane translation
  • Out-of-plane rotations
Because the 2D registrations behave as expected, I suspect the issue is in the geometric back-projection / 2D-to-3D decomposition step, rather than in the similarity metric or optimization.
Specifically, I am unsure whether:
  • My implementation of the analytical back-projection equations is correct
  • The projection geometry (scaling, sign conventions, coordinate frames) is consistent between the two views
  • Additional geometric constraints or assumptions are required for observability
What I Am Looking For
I would greatly appreciate any insights or suggestions on:
  • Correct implementation of 2D-to-3D geometric back-projection for dual-view registration
  • Common pitfalls in coordinate systems, scaling factors, or sign conventions
  • Whether a cone-beam vs. parallel-beam assumption materially affects the 6D solution
  • Any recommended validation or sanity checks for this type of pipeline
I have attached the MATLAB code below for reference.
Thank you very much for your time and support.
Reference
[1] Fu, D., & Kuduvalli, G. (2008). A fast, accurate, and automatic 2D–3D image registration for image-guided cranial radiosurgery. Medical Physics, 35(5), 2180–2194.

Respuestas (1)

Matt J
Matt J el 18 de Dic. de 2025
Editada: Matt J el 19 de Dic. de 2025

0 votos

I haven't read the paper, but I don't see how the proposed method would ever work as you describe it. 6D motion simply does not propagate into 2D projections in a way that 2D-2D registration (step 4) can model, whether cone beam or parallel beam. Additionally, registering the DRR's to each projection independetly ignores the coupled effects of 3D motion on each projection.
Since you say the CT subject contains fiducials, it seems to me like the best approach would be to find the fiducials in the projections and then triangulate their 3D positions stereoscopically. (In fact, I vaguely wonder if that's the true purpose of the 2D-2D registration -- to help you search for the fiducial projections in 2D.)
To do the triangulation, the Computer Vision Toolbox has a command triangulate which may be helpful. Once you've triangulated, the problem reduces to a 3D-3D point registration, which you can do with,

13 comentarios

payam samadi
payam samadi el 19 de Dic. de 2025
Editada: payam samadi el 19 de Dic. de 2025
Thank you very much for your detailed and thoughtful comment.
Let me clarify my actual goal and the context of this work, since I think there may be some misunderstanding.
My objective is not a pure 2D–2D or feature-based reconstruction problem, but rather a 2D–3D image registration pipeline for estimating a 6D rigid transformation (3D translation + rotation) between a 3D CT volume and two 2D X-ray projections.
The workflow is explicitly inspired by Fu & Kuduvalli (2008), where the idea is to decompose the 6D motion into in-plane and out-of-plane components using dual projections.
In my setup:
I start from a known 3D CT volume
I generate synthetic 2D projections (AP and lateral)
I then estimate motion between CT-derived DRRs and the measured projections
So the 2D–2D registrations are not the final goal, but an intermediate step in a 2D–3D registration framework.
Regarding your point that “6D motion does not propagate into 2D projections in a way that 2D–2D registration can model” — I fully agree in the general case.
However, the Fu & Kuduvalli approach relies on:
Known imaging geometry
Small residual motion assumptions
Cone-beam perspective
Analytical coupling between dual views
Under these assumptions, certain combinations of in-plane shifts and rotations from two projections can be analytically mapped back to approximate 3D translations and rotations. This is exactly the part I am currently struggling with — specifically the geometric back-projection / decomposition step.
You also mentioned that independently registering each DRR to each projection ignores the coupled nature of 3D motion.
That is a very valid concern, and I suspect this may indeed be one of the weak points of my current implementation, especially if the forward projection model is not fully consistent.
Regarding fiducials:
ُIn real situation, some images have and some not. So, I want to test both situations. therefore, in this work my intention is to:
Study and reproduce a fiducial-less, intensity-based 2D–3D registration method
Understand the limitations of decomposing 6D motion from dual projections without explicit point correspondences
So while triangulation is an excellent practical solution, it is somewhat outside the scope of what I am trying to validate here.
Finally, to your last implicit question:
I actually have a separate, reliable implementation that independently estimates, for each projection (A and B), the in-plane parameters:
(xA,yA,thetaA),(xB,yB,thetaB)
My main question (and the reason for this post) is exactly this:
Given accurate per-projection in-plane motion estimates, is it theoretically and practically valid to recover a full 6D rigid transformation from these parameters alone, and under what geometric assumptions does this become observable?
At the moment, I strongly suspect that the issue lies in:
Either an incorrect geometric back-projection model
Or insufficient observability due to projection geometry (e.g., parallel beam, limited parallax, or symmetry)
Any insight into this specific coupling step would be extremely helpful.
Thank you again for your valuable feedback — I appreciate you taking the time to look at this problem
Matt J
Matt J el 19 de Dic. de 2025
Editada: Matt J el 19 de Dic. de 2025
Given accurate per-projection in-plane motion estimates, is it theoretically and practically valid to recover a full 6D rigid transformation from these parameters alone, and under what geometric assumptions does this become observable?
Doesn't the paper tell you that? If it is possible, it is certainly not possible in a single, analytical step. It would have to be done iteratively. Possibly, the process you described is meant to be wrapped in an iterative loop? I.e., once you have the in-plane motion estimates, you are meant to somehow update the estimate of the 6D pose and recompute the DRRs?
To prove that it is not possible analytically, I offer the following example:
Imagine the CT subject is a centered cylinder with a single fiducial at the surface, centered in the AP view. Now let's rotate the cylinder by a small angle (3 degrees) about the scanner axis and see what the in-plane, per projection motion shows. The lateral projection will be almost unchanged, so its DRR registration will probably compute zero in-plane motion, effectively.
The AP view will see a slightly larger in-plane shift in the position of the fiducial, but the projection of the background cylinder will remain largely unchanged. Because most AP projection pixels remain unchanged, a rigid in-plane registration will yield unpredictable results, but it will assume some motion to account for the shift in the fiducial projection. A key point, though, is that the amount of the motion computed will depend on the size and density of the fiducial.
But because the computed in-plane motion depends on the size and density of the fiducial, it is impossible to use the motion parameters alone to back out the 3D rotation analytically. The same 3D motion can induce different projection registration results, depending on the composition of the phantom.
payam samadi
payam samadi el 20 de Dic. de 2025
In general, I agree with you a full 6D rigid transformation cannot be recovered analytically and uniquely from per-projection in-plane motion parameters alone. The mapping is content-dependent and many-to-one.
However, under restricted assumptions (small residual motion, calibrated cone-beam geometry, sufficient parallax, and a rigid object with rich structure), the problem can become locally observable in an iterative framework. In this case, the per-projection in-plane motion estimates can be used to incrementally update the 3D pose, regenerate DRRs, and repeat until convergence.
This is how I currently interpret the Fu & Kuduvalli approach: not as a closed-form solution, but as a locally valid, approximation-based iterative method.
If you think this interpretation is reasonable, I would really appreciate any guidance on how best to formulate or implement this iterative update step (code exmaple) in practice.
payam samadi
payam samadi el 20 de Dic. de 2025
Additionally, if I want to simulate this article correctly, which specific machine details and code (for example, reconstruction algorithms and projection algorithms) should I have completely?
Matt J
Matt J el 20 de Dic. de 2025
I would really appreciate any guidance on how best to formulate or implement this iterative update step
How does the paper say to do it?
payam samadi
payam samadi el 21 de Dic. de 2025
Perform 3D -2D image registration between the planning CT and the X-ray image
1. generate a digital reconstruction image (Digitally Reconstructed Radiographs, DRR) from 3D CT or CBCT data. These two sets of DDRs act as 2D reference images; One set is for Projection A, another for orthogonal Projection B.
2. Transform the 3D data space and generate the corresponding DRR image (in-plane and out-of-plane rotations in the 2D projections)
3. Multi-Phase Registration Framework;
Phase 1: Initial In-Plane Estimate (Goal to find in-plane transformation for each projection),
Phase 2: Initial Roll Estimate (goal to find the best-matching roll angle from the pre-computed DRR set), Phase 3: Iterative Refinement (goal to refine both the in-plane transformation and the roll angle to high accuracy)
Note that the similarity test value reaches the minimum value and converges (similarity measure was then performed between the DRR images and the X-Ray images).
4. 3D Pose Calculation
After both Projection A and Projection B have completed their independent registrations, their results are combined.
I developed this code, but I am still failing to simulate this article. Can you help me?
For more testm you can chages these number
ground_truth.translation = [35, -20, 19]; % Translation in mm [SI, LR, AP]
ground_truth.rotation = [3, -2.5, 1.5]; % Rotation in degrees [roll, pitch, yaw]
Matt J
Matt J el 21 de Dic. de 2025
Looking at registerSingleProjection(), it appears that it is not really a standard 2D-2D in-plane registration as your initial description suggests,
4. Performing accurate 2D–2D registration between each X-ray and its corresponding DRR
Had it been, you would be using imregister() or imregtform(). You also have an out-of-plane roll parameter, whose estimation is part of the novel content of the paper, proposed by the authors.
That being the case, you are probably barking up the wrong tree by looking for a troubleshooting in this forum, because your difficulty is not a Matlab issue. It is a problem with the authors' idea.
payam samadi
payam samadi el 22 de Dic. de 2025
Editada: payam samadi el 22 de Dic. de 2025
Dear Matt
Thank you for your comments. I added Multiresolution in-plane estimation for Phase 1, and modified phases 2 and 3, based on the article. Still the code fail to estimate 2D-2D image registration and also perform 2D-3D image registration. I developed two versions (vers.23.m and ver.24.m), which are attached here.
Would you please help me step by step to fix this problem and simulate the article?
payam samadi
payam samadi el 22 de Dic. de 2025
Dear Matt
According to the article, the in the first phase, the in-plane transformation parameters (x,y, theta) are initially estimated using the nominal DRR image, which these three parameters are quickly searched via multiresolution matching, using the sum of squared difference (SSD) as the similarity measure.
Note, the desired pixel accuracy for the translations and the half-degree accuracy for the in-plane rotation are achieved.
In the second phase, the roll is separately searched in one dimension based on the approximate results of (x,y, theta) obtained in the first phase. The optimized pattern intensity is used as a similarity measure to determine the reference DRR image corresponding to a roll.
Note, the search space during this phase is the full search range of roll angles, sampled at 1° intervals for the initial estimation.
The third phase is an iterative process. The in-plane transformation and the roll are refined iteratively, using pattern intensity as the similarity measure for increased accuracy.
Note, the iteration process has two steps, the first consisting of the refinement of the previously estimated in-plane transformation via steep descent minimization, followed by the refinement of the roll using one-dimensional interpolation. These two steps are repeated for a specified number of iterations. The final registration results give the in-plane transformation and the roll.
The "calculatePatternIntensity" function is used for Similarity measure.
Search Method
Multiresolution matching is a fast search method designed to estimate the initial in-plane transformation parameters that will be refined in subsequent search phases.
First, three image resolutions are defined. The original x-ray images are used as the highest resolution images. The lower resolution images are generated by subsampling the original x-ray images by half in each dimension. Then the lowest resolution images are created by further subsampling by half in each dimension.
The basic idea of multiresolution matching is to match the images at each resolution level successively, starting with the lowest resolution. The results at the lower resolution serve as rough estimates of the in-plane transformation parameters x,y, theta. The output at the lower level is then passed on to the subsequent higher resolution level. The parameters x,y, theta are refined using the higher resolution images. In the final matching results, the accuracy of translations depends on the spatial resolution of the highest resolution images. The accuracy of in-plane rotation depends on the sampling intervals of the in-plane rotated DRR reference images.
Note, there is a risk that accompanies the use of multiresolution matching. It is possible that the estimate at lower levels may converge to local optima far from the global optima. In this case, the result of the further matching at subsequent higher resolution levels will most likely not converge to the global optima. To overcome this risk, multiple candidates of estimation are used. A sorted list of candidates with a best match from the lower level is passed to the higher resolution level. The more candidate displacements are used, the more reliable the estimates become. The best candidates are ranked by their SSD values.
Based on these infromation and two codes (Ver23 or Ver24), where do you think I made mistake?
payam samadi
payam samadi el 22 de Dic. de 2025
Dear Matt
I created another version (Ver26.m), and this time, I tested the model's performance under the conditions of (x, y, theta), assuming no roll for simplicity (lines 559-581).
The test cases are as follows:
```
test_cases = [
-5, 0, -2;
0, 3, 1;
3, -2, 0;
-2, 4, 1;
4, 1, -1;
];
```
I believe the code is capable of estimating the true 2D-2D results (x, y, theta), but it struggles to accurately estimate the true roll. Do you agree with this assessment?
I apologize for posting so many pieces of code, but I need to complete this simulation, which is very important to me. I would greatly appreciate any help you can provide.
Dear Matt
I wanted to share an update after my recent attempt with the code (Ver45).
Line 64 loads the 3D CT volume (CT_13.nii.gz).
According to the article, I must generate a set of orthogonal X-ray projections (DRRs) from the 3D CT by rotating the volume around the Superior–Inferior (SI) axis. This corresponds to a roll rotation, implemented as rotation about the X-axis (as defined in the paper’s coordinate system). The roll range is −5° to +5° (step size = 1°, total of 11 angles).
This rotation and DRR generation process is triggered at line 136 using the function “generateAccurateDRRSets”.
Inside that function, the “generateDRR” function is called to produce orthogonal projections for each rotation angle. This results in 22 reference DRR datasets:
• 11 DRRs for Reference Set A
• 11 DRRs for Reference Set B
These two DRR sets simulate two detector panels (A and B like the Figure 1) and are used as the reference images for registration workflow described in the article.
6D Motion Simulation (Lines 152–184)
A full 6-degree-of-freedom (6D) movement is applied to the CT volume:
• Translations: X, Y, Z (mm)
• Rotations: Roll, Pitch, Yaw (degrees)
This transformation simulates patient/target displacement and generates a shifted X-ray projection for each detector (A and B).
Registration Step (Lines 187–192)
For each projection (A and B), the function “registerOneProjection” is used to estimate four parameters from each single 2D image:
• x → in-plane translation (lateral)
• y → in-plane translation (vertical)
• θ → in-plane rotation
• roll (θx) → rotation around SI axis
These four parameters are the required outputs before multi-view consistency is evaluated.
Note, the function "generateXrayWithShift" does not alter the CT volume. Essentially, this function is solely used to generate X-ray projections. Consequently, the outputs (xray_A_shift and xray_B_shift) produced between lines 170 and 181 have been moved (applying true movement based on Equations 1-3).
Consistency Check (Equations 9–10) — Lines 195–201
A consistency validation step is performed between the results from detector A and B.
This follows the mathematical conditions from Equations (9) and (10) of the paper.
If A and B do not agree (in the sense defined by the article), the solution is rejected or flagged as unstable.
Final 6D Estimation (Equation 4) - Lines 203-211
Once consistency is confirmed, the parameters from both projections are combined using Equation (4) from the article to compute the final 6D estimate.
This produces the estimated patient/object transformation in all degrees of freedom.
=====================================================
Known Issue: Roll Estimation Inaccuracy
The workflow generates outputs for multiple samples (e.g., 154–164), and most degrees of freedom behave correctly.
For example,
ground_truth.translation = [-20, 20, -2.75]; % y, z, x % Translation in mm [LR, AP, SI]
ground_truth.rotation = [-1, 5, -2]; % Rotation in degrees [roll(X), pitch(Y), yaw(Z)]
However, the final estimated roll (θx) is consistently inaccurate. The issue is traced to the current implementation of “registerOneProjection”.
=== Final Estimated 3D Motion ===
Tx (SI) = -2.725 mm, Ty (LR) = -20.003 mm, Tz (AP) = 20.003 mm
Roll (θx) = -0.024°, Pitch (θy) = 5.004°, Yaw (θz) = -1.990°
This function does not reliably recover true roll values, which leads to:
• incorrect roll output per projection.
• inconsistency violations in Equations (9–10).
I would greatly appreciate any help you can provide.
Matt J
Matt J el 1 de En. de 2026
Editada: Matt J el 1 de En. de 2026
I need to be clear. Nobody is going to read the full article you have posted. It's the kind of thing no one has time for. And, without reading the article, there is no way for forum participants to know why the novel algorithm that it proposes even should work in the first place (or at least why it should work in the mind of the article's author).
What you need to do is identify a piece of the algorithm that everyone can agree should work, even without reading the article, but which doesn't work, according to a coded demonstration that you will present. The community then has a chance of helping you understand why that section is not working.
Dear Matt
Thank you for your response and your time.
First, I have to say that this code (according to the article) has three parts:
Phase 1: Initial In-Plane Estimate (Goal to find in-plane transformation for each projection [x,y, theta]),
Phase 2: Initial Roll Estimate (goal to find the best-matching roll angle from the pre-computed DRR set [roll part]),
Phase 3: Iterative Refinement (goal to refine both the in-plane transformation and the roll angle to high accuracy).
To test Phase 1, please reviewed lines 143-148, which call the "validateRegistrationAccuracy" function. Inside this function, I included some test cases applied to the X-ray images (reference images). The results were calculated and compared with the true test cases, using an error metric. For this test, I assumed no Roll for simplicity. The results passed accurately, allowing me to proceed to the next phases (2 and 3) to estimate the true roll, which I am currently having problems with.

Iniciar sesión para comentar.

Preguntada:

el 18 de Dic. de 2025

Comentada:

el 2 de En. de 2026

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by