Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

MonoTracker: Monocular-Based Fully Automatic Registration and Real-Time Tracking Method for Neurosurgical Robots

MonoTracker: Monocular-Based Fully Automatic Registration and Real-Time Tracking Method for... Robot-assisted surgery has become an indispensable component in modern neurosurgical procedures. However, existing registration methods for neurosurgical robots often rely on high-end hardware and involve prolonged or unstable registration times, limiting their applicability in dynamic and time-sensitive intraoperative settings. This paper proposes a novel fully automatic monocular-based registration and real-time tracking method. First, dedi- cated fiducials are designed, and an automatic preoperative and intraoperative detection method for these fiducials is introduced. Second, a geometric representation of the fiducials is constructed based on a 2D KD-Tree. Through a two-stage optimization process, the depth of 2D fiducials is estimated, and 2D-3D correspondences are established to achieve monocular registration. This approach enables fully automatic intraoperative registration using only a sin- gle optical camera. Finally, a six-degree-of-freedom visual servo control strategy inspired by the mass-spring-damper system is proposed. By integrating artificial potential field and admittance control, the strategy ensures real-time responsiveness and stable tracking. Experimental results demonstrate that the proposed method achieves a registra- tion time of 0.23 s per instance with an average error of 0.58 mm. Additionally, the motion performance of the control strategy has been validated. Preliminary experiments verify the effectiveness of MonoTracker in dynamic tracking scenarios. This method holds promise for enhancing the adaptability of neurosurgical robots and offers significant clinical application potential. Keywords Neurosurgical robot, Automatic detection, Monocular registration, Visual servo control posing a significant threat to global health [1, 2]. In 1 Introduction recent years, robotic-assisted surgery has entered the Neurosurgical diseases, such as cerebrovascular dis- field of neurosurgery. With its precise stereotactic posi - orders, intracranial tumors, and neurological dysfunc- tioning and stable instrument handling, it has improved tions, have extremely high mortality and disability rates, treatment outcomes and increased surgical efficiency, establishing a new paradigm for innovation in neuro- *Correspondence: surgical procedures [3, 4]. In clinical settings, the stand- Diansheng Chen [email protected] ard workflow of robot-assisted neurosurgery is typically School of Mechanical Engineering and Automation, Beihang University, structured into four sequential phases: (1) preoperative Beijing 100191, China 2 path planning based on patient-specific imaging data, Hunan Intelligent Rehabilitation Robot and Auxiliary Equipment Engineering Technology Research Center, Changsha 410004, China (2) intraoperative registration to establish the robot’s School of General Engineering, Beihang University, Beijing 100191, initial posture, (3) real-time tracking to compensate for China 4 intraoperative patient motion, and (4) surgical execution School of Astronautics, Beihang University, Beijing 100191, China Tongji Medical College, Huazhong University of Science and Technology, guided by robotic assistance. Among these, steps (2) and Wuhan 430030, China (3) remain major challenges, particularly under dynamic © The Author(s) 2025. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. Chen et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 2 of 27 intraoperative conditions. Specifically, unexpected inci - Additionally, RGB-D and NIR cameras typically have dents may cause intraoperative movement of the patient’s strict working space requirements, which can increase head [3, 5]. This necessitates a prolonged re-registration intraoperative uncertainty in complex clinical environ- and repositioning of the robot, which can compromise ments and with patient head movement. Previous studies the optimal timing of the surgical procedure. Therefore, have shown that, after intrinsic calibration and distor- developing an accurate and rapid registration method tion correction, the reprojection error of optical cameras while enabling real-time robotic posture adjustment is can be controlled within sub-pixel levels, providing a crucial for enhancing the adaptability of neurosurgical theoretical basis for achieving high-precision monocu- robots [6]. lar registration [20, 21]. Moreover, the "what-you-see-is- During the preoperative phase, imaging modalities what-you-get" nature of optical cameras not only enables such as computed tomography (CT) are employed to a larger effective working space but also offers surgeons identify the target puncture site and plan the optimal sur- a more intuitive interactive interface. Additionally, owing gical trajectory. To enable robotic execution, it is essential to their significant cost advantage, monocular camera- to establish an accurate spatial correspondence between based intraoperative registration presents a promising the preoperative plan and the intraoperative anatomical research direction. environment. This alignment is achieved through spa - Conversely, the selection of reference features is crucial tial registration, which matches the preoperative images for ensuring the stability and reliability of the real-time with the real-time physical position of the patient [7, 8]. registration process. For marker-less image-to-patient Based on the type of optical sensor, neurosurgical spatial registration methods, existing algorithms have been registration methods can be categorized into those using proven to be either too inefficient or too inaccurate [7, monocular cameras [9], depth cameras [10, 11], and near- 9, 22]. For marker-based registration methods, marker infrared cameras [12]. Meng et  al. proposed an auto- types include fiducial spheres and encoded patterns (e.g., matic registration method using a monocular camera, ArUco) [13, 19, 23, 24]. Although encoded patterns can which was applied to neurosurgical robots. This method be detected and localized in visible light images, accu- employs multi-view stereo vision to reconstruct 3D fea- rately identifying them in medical imaging remains a tures, but single registration times can reach several min- challenge. Moreover, encoded patterns are often attached utes [9]. Su et al. used an RGB-D camera to extract facial to planar holders, making it difficult to achieve spatial features of patients and completed registration through distribution in the surgical area. Their relatively large size ICP, with registration times around 3s and errors within also makes them unsuitable for dynamic intraoperative 1−2 mm [10]. Near-infrared optical navigation systems, environments. Therefore, the methods based on fiducial such as Polaris Vega VT from Northern Digital Inc., are spheres still dominate in clinical settings [13, 19]. For ® ® highly popular in clinical surgeries owing to their stable instance, in the guidelines for the StealthStation S7 marker recognition accuracy [12]. However, during sur- treatment, skin-adhesive markers are used as fiducial gery, surgeons need to manually select fiducial markers, spheres [25]. and manual registration typically takes several minutes to After completing spatial registration, the tracking con- meet surgical precision requirements [13]. In summary, trol of surgical robots is mostly completed by surgeons intraoperative registration technology has made certain via remote control [26–28]. However, this process is progress. Nonetheless, achieving precise, rapid, and reli- strenuous and inefficient for Surgeons. They should focus able intraoperative registration to compensate for patient more on specific surgical operations rather than repeti - head movement remains a significant challenge [3]. tive adjustments of the surgical robot [29]. Autonomous Recent studies have begun to explore the trade-off positioning of surgical robots to the desired posture between registration accuracy and efficiency. On one typically involves common path planning methods such hand, the choice of optical sensor fundamentally influ - as graph search methods [30], numerical methods [31], ences the type and quality of the collected data, as well and sampling-based methods [32, 33]. However, these as the size of the effective working volume [14–16]. Some methods seldom consider pose tracking [34]. The artifi - studies have achieved intraoperative real-time registra- cial potential field (APF) method is an efficient local path tion using RGB-D and near-infrared (NIR) cameras [17– planning algorithm that constructs virtual potential fields 19]. However, owing to the inherent errors of current in the robot’s operational space or joint space. It enables mainstream depth cameras (typically within 2%), real- the robot to avoid obstacles and rapidly track targets, time registration based on RGB-D cameras often fails with a simple structure and strong repeatability [35, 36]. to meet clinical accuracy requirements [17]. Addition- Hao et  al. combined APF with an original dual neural ally, NIR cameras require custom fiducial points and are network to propose an improved path planning method expensive, making them impractical for clinical use [19]. that avoids local minima and enhances path execution Chen  et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 3 of 27 precision. Experimental validation has proven its effec - a practical solution that can be readily integrated into tiveness for autonomous spine surgery [34]. However, the surgical workflow without significant overhead or this method has not been integrated with visual modules interruption. and only enables single-instance path planning. There - The remainder of this paper is organized as follows: fore, as neurosurgical robots advance towards greater Section  2.1 describes the neurosurgical robotic system intelligence and automation, designing a safe and reliable and its workflow; Section  2.2 delves into the monocular- visual servo tracking control method to accommodate based automatic registration method; Section  2.3 dis- dynamic changes remains of great significance. cusses the 6D APFF admittance for visual servo control. In summary, to improve the efficiency and safety of The experimental validation is primarily conducted in neurosurgical robots, we propose MonoTracker, a fully Section  3, with the results discussed in Section  4. Sec- automatic registration and tracking framework based tion  5 concludes the paper and provides an outlook on on a monocular camera. Unlike existing methods that future research. primarily focus on registration, MonoTracker provides a unified solution for automatic registration and real-time 2 Materials and Methods tracking. It emphasizes the robustness and accuracy of 2.1 Overall Framework the online registration process while integrating visual This study is based on a neurosurgical robot with servo-based tracking control strategies for robotic execu- enhanced autonomous capabilities, comprising three tion. The main contributions of this work are summa - main parts as shown in Figure 1(a): (A) the robotic execu- rized as follows: tion subsystem, (B) the visual navigation subsystem, and (C) other intelligent surgical subsystems that assist in (1) We propose a fully automatic monocular-based performing surgical tasks. The execution subsystem con - registration and tracking framework, which inte- sists of a UR10 robotic arm and an autonomous punctur- grates preoperative fiducial identification, intraop - ing device. The navigation subsystem can utilize various erative monocular registration, and robotic visual solutions, including monocular cameras, depth cameras, servo tracking. and NIR cameras. (2) An automatic monocular registration method is In clinical surgeries, the operational procedure of the developed, incorporating a two-stage optimization neurosurgical robot can be divided into five steps, as strategy to establish accurate 2D-3D mappings and shown in Figure  1(d): preoperative puncture path plan- recover depth information. This enables precise and ning, intraoperative patient registration, automatic posi- efficient spatial registration using only a single opti - tioning by the robot, autonomous bone drilling by the cal camera. robot, and surgical manipulation by the surgeon. (3) A novel 6D visual servo control method based on an Artificial Potential Field Force (APFF) admit - Step I: Preoperative puncture path planning. tance model is introduced, inspired by the mass- Based on imaging data such as CT scans, the robot spring-damper system. This approach ensures automatically identifies fiducials in the preoperative smooth and stable pose tracking, thereby improv- imaging space, which will be discussed in detail later. ing the safety and robustness of neurosurgical robot The surgeon determines the target location within control. the skull and plans an optimal puncture path to mini- (4) The effectiveness of the proposed methods is vali - mize iatrogenic injury to the patient [37, 38]. dated through extensive experiments, demonstrat- Step II: Intraoperative patient registration. To map ing the accuracy and efficiency of the registration the planned path to the surgical space, the navigation algorithm as well as the motion performance of the subsystem must automatically recognize fiducials tracking control strategy. and align the coordinates between the preoperative imaging space and intraoperative navigation space Compared with traditional manual registration meth- [39]. Unlike conventional surgical robots [40, 41], ods that often require several minutes and may interrupt this study considers intraoperative dynamic factors, surgical flow, MonoTracker completes registration in such as patient head repositioning during surgery. By 0.23 s, enabling near real-time feedback. Although NIR- achieving real-time registration while ensuring high based methods offer slightly faster performance (~0.1 registration accuracy, the proposed method enhances s), their limited working volume significantly reduces the adaptability of the robotic system. their adaptability in dynamic and complex surgical envi- Step III: Automatic positioning. The execution sub - ronments. MonoTracker achieves a favorable trade-off system autonomously adjusts its pose to ensure that between accuracy, efficiency, and system cost, offering the puncturing device reaches the preoperatively Chen et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 4 of 27 Figure 1 Neurosurgical robot system and surgical procedure: (a) Intraoperative robot space, (b) Preoperative image space, (c) Planned puncture path, (d) Neurosurgical procedure workflow 2.2.1 A utomatic Detection of Fiducials planned target posture. During real-time registra- During the perspective transformation process of tion, the subsystem must be capable of automatically monocular imaging, fiducials lose depth and geometric adapting its position while safely and reliably track- features. Therefore, a key challenge is to automatically ing the target posture. However, research in this area detect these fiducials preoperatively and intraoperatively, remains limited. ensuring their accurate correspondence. This paper Step IV: Autonomous bone drilling. Once the punc- designs a fiducial sphere, as illustrated in Figure  3. Its turing device reaches the target pose, it can initiate customized spherical features are easily recognizable in the bone drilling procedure. During this phase, the the 3D medical imaging space. The spherical character - robot will monitor the force feedback from the drill- istics ensure that, regardless of the intraoperative camera ing to ensure the safety and effectiveness of the sur - posture, fiducials appear as regular circles in 2D images. gery. The center of a fiducial in the medical imaging space S Step V: Surgical operation. Once the robot com- med and the center of the corresponding circle in the camera pletes drilling and establishes a surgical pathway, image space S can be established as effective corre - the surgeon can proceed with surgical procedures, cam sponding points, with a specific proof provided here. assisted by intelligent surgical subsystems. Assuming point P is the center of the spherical fidu - 3d cial, its coordinates in the camera coordinate system C In the above process, Step II and Step III are essential are (a, b, c) . Based on the pinhole imaging principle, the and form the foundation for subsequent surgical opera- imaging of the fiducial forms at the intersection of two tions. This paper will focus on researching MonoTracker, opposing cones, Cone and Cone , with the vertex at specifically targeting these pivotal aspects. ABC CDE C(0, 0, 0) . The cone Cone can be represented in C as CDE c follows: 2.2 M onocular‑based Automatic Registration Monocular cameras do not introduce additional hard- (ax + by + cz) 2 2 2 x + y + z = , (1) ware errors and provide a larger effective working space, 2 2 l − R making them more suitable for intraoperative automatic −−→ registration. This paper proposes a monocular automatic where l represents CP  . By introducing a cutting plane 3d registration method, consisting of three key steps: (1) at z = z , an elliptical plane Ellipse is generated, C 0 EF automatic fiducial detection; (2) accurate correspond - whose center is P (x , y , z ): 2d 2d 2d 2d ence estimation; and (3) depth recovery to accomplish automatic registration, as illustrated in Figure 2. Chen  et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 5 of 27 Figure 2 Monocular-based automatic registration method Figure 3 Spherical fiducial design and monocular imaging principle (ax+by+cz) bx − ay = 0, 2 2 2 x + y + z = , (3) l −R 1 (2) z = . z = z . By simultaneously solving Eqs. (2) and (3), the 3D The shortest distance from P to the line OP , 3d 2d coordinates of points E and F can be determined. At denoted as dis , can characterize the error between err this point, the center of the ellipse, P (x , y , z ) , i s 2d 2d 2d 2d the center of the fiducial in S and the circular center med given by: in S as effective corresponding points. Therefore, it is cam reasonable to assume z = 1/b . Under this assumption, the equation for the line where the endpoints of Ellipse EF are located is: Chen et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 6 of 27 1 c leverages 2D KD-Tree features in conjunction with Par- x = (x + x ) = , 2d e f  2 2 c −R ticle Swarm Optimization (PSO) to facilitate rapid and 1 ac y = (y + y ) = , 2d e f (4) 2 2 b(c −R ) stable estimation of the 2D-3D correspondence of fidu - z = . 2d cials. PSO is a global optimization algorithm suitable for solving problems where the optimal solution lies Because the line OP passes through the origin, using 2d within a multidimensional parameter space [42]. the formula for the distance from a point to a line, dis err Assuming a 3D point set in medical imaging space can be expressed as: 1 2 3 n ={P , P , P , ..., P } and a 2D point set in pixel vir vir vir vir vir 1 2 3 n 2 2 a + b space  ={P , P , P , ..., P } , the specific steps pix 2 pix pix pix pix dis = R . err (5) 2 2 2 2 2 2 of the method are described in Algorithm 1. c (a + b ) + (c − R ) Because R << c , the equation can be further simplified Algorithm 1 Fiducial correspondence estimation via 2D KD-Tree and PSO to: 2 2 l − c dis = R , err (6) cl 2 2 2 2 where l = a + b + c , and owing to factors such as the camera’s field of view, the observation distance c is typically several times larger than a and b, and R is on a different order of magnitude compared to a, b, and c, making dis generally much smaller than the tolerance err error e. Thus, this completes the proof of the fundamen - tal principles of monocular registration. For automatic detection of preoperative fiducials, owing to significant physical property differences between fiducials and patient bones or tissues, initial extraction of the fiducial model can be accomplished by setting an appropriate grayscale threshold. Considering the uncertainties associated with threshold segmenta- tion, outlier detection techniques are employed to miti- gate noise interference. Subsequently, Density-based Spatial Clustering of Applications with Noise (DBSCAN) clustering is utilized to discern the features of individual fiducials. The center P of each fiducial is determined by vir calculating the centroid of the point cloud set. In terms of intraoperative automatic detection, a dedi- cated dataset has been created to facilitate instance segmentation of fiducials under monocular camera observation. We have enhanced the state-of-the-art YOLOv8 instance segmentation model to suit our spe- cific requirements. Throughout the surgical procedure, the robot executes real-time instance segmentation on multiple fiducials, calculating the average of the pixel set for each specific fiducial. This process effectively extracts the central P of the circular fiducial. pix 2.2.2 Estimation of Correspondence Relationships Upon successful automatic detection of preopera- tive and intraoperative fiducials, a critical challenge For the 2D point set  , an unordered set of pix remains: estimating the correspondence between these points can be transformed into an ordered set two sets. This paper proposes a novel method that 1 2 n S ={P , P , ..., P } by employing the 2D KD-Tree pix pix pix pix Chen  et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 7 of 27 construction rules. At this point, geometric features thereby enhancing exploration. The inertia weight ̟ is introduced to control the exploration capability of the F ={V , V , ..., V } can be constructed, where V is 1 1 2 n−1 i particles. Thus, the correspondence between the 2D and the vector established between two adjacent points: 3D fiducial markers is successfully established. i+1 i V = P − P . i (7) pix pix 2.2.3 Estimation of C orrespondence Relationships Additionally, under the projection in the direction of Furthermore, to achieve high-precision registration, a unit vector F = (a , b , c ) , the set  is projected V V V V vir it is essential to reconstruct the depth information of onto the plane a x + b y + c z = 0 through the ori- V V V the 2D point set in S , thereby acquiring the com- pix 1 2 n gin, generating the point set S ={P , P , ..., P } . xyz xyz xyz xyz plete 3D geometric features of the fiducials in the sur - Because the vector F can be obtained by rotating the 1 2 n gical space. Let  ={Q , Q , ..., Q } denote an vir vir vir vir unit vector n (0, 0, 1) around the X and Y axes by α and ordered set of 3D points in the medical imaging space β , resp e ctively , S can be further transformed to the xyz 1 2 n and  ={Q , Q , ..., Q } denote the correspond- pix pix pix pix XOY plane and rotated around the vector n by the angle ing set of 2D points in the pixel space, where each 1 2 n γ , generating a 2D point set S ={P , P , ..., P }. xoy xoy xoy xoy i i pair (Q , Q ) forms a known 3D-2D correspond- vir pix ence. Furthermore, assume an ordered 3D point set  2 2 α = arctan(b a + c ),  V V V 1 2 n ={Q , Q , ..., Q } in C : cam c � cam cam cam β = arctan(a c ), V V (8)   i i � �  α 0 c 0 P = M(α) · M(β) · P , i x xyz vir  Q  pix   i i z = 0 β c 0 Q , i y (11) cam P = M(γ ) · P , xoy xyz 1 0 0 1 0 where M(α) , M(β) , and M(γ ) are the rotation where α , β , c , and c are the camera intrinsic parame- x y matrices for α , β , and γ . Following the construc- ters. Thus, by uniquely determining the depth distances tion rules of the 2D KD-Tree, the geometric features [z , z , ..., z ] in C , the position information for all points ′ ′ ′ 1 2 n c F ={V , V , ..., V } is constructed. There exists a 2 1 2 n−1 in  can be established. To calculate z , a loss function cam i set of variables [α, β , γ ] , such that the geometric differ - is constructed as follows: ence |F − F | → 0 . Moreover, the established point sets 1 2 S and S correspond to each other. This paper uses an pix xoy min S(z , ..., z ) 1 n z ,...,z ∈D 1 n optimization method to solve for the variables [α, β , γ ] , n−1 constructing a loss function F(α, β , γ) as follows:  (12) i i+1 i i+1 2 = (Q − Q  −Q − Q ) , vir cam cam vir min F(α, β, γ ) i=1 α∈A,β∈B,γ ∈C i+1 n−1 i i i+1 (9) where Q − Q and Q − Q represent the vir cam cam 2 vir = V − V , A, B, C = [−π, π], distances between adjacent points in the ordered point i=1 sets. When the loss function is minimized, it indicates that the geometric distances between the fiducial points when F(α, β, γ) → min , [α, β , γ ] can be determined as in C , at that depth distance are consistent with those in the optimal solution. At this point, the correspondence C . Generally, the depth information of all fiducials in the between S and S can be established from  and pix xoy vir v surgical space is correctly reconstructed. . This paper employs the PSO algorithm to solve this pix To boost the computational efficiency and success rate optimization problem. In each iteration, the particles’ of the two-stage optimization problems associated with velocities and positions are updated based on the follow- correspondence estimation and depth recovery, this ing equations: paper introduces an integrated optimization framework, t+1 t t t v = ̟ v + c r (pbest − x ) + c r (gbest − x ), 1 1 i 2 2 depicted in Figure 4. i i i i t+1 t+1 x = x + v , In this setup, the loop arrows ①, ②, ③, and ④ rep- i i i (10) resent the process where, upon determining that the t t optimization results meet the requirements, the results where v and x represent the velocity and position of the i i from the ith round of optimization are fed into the (i+1) ith particle at iteration t , resp e ctively . c and c are the 1 2 th round as the initial values for optimization variables. cognitive and social coefficients, governing the attrac - By appropriately setting the key parameters {p, m, t} and tion towards the individual best pbest and the global best f , the computational efficiency and success rate of gbest . r and r serve as stochastic weights that modu- 1 2 optimization can be significantly improved. Additionally, late the influence of personal and global best positions, Chen et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 8 of 27 Figure 4 Two-stage optimization for correspondence estimation and depth recovery 2.3 Visual Servo Control with APFF Admittance an extra PSO optimization module is implemented in the 2.3.1 Mo deling of Robotics Kinematics first stage of optimization to further improve the suc - The robotic execution subsystem primarily consists of cess rate without substantially reducing computational two components: the UR10 robotic arm and an autono- efficiency. Once the depth recovery of the point set cam mous puncture device. To achieve visual servo tracking is completed, it can be combined with the set  , and vir control of the robot, it is essential to conduct a kinematic registration can be performed using the Singular Value analysis of the robot’s operational entity. This paper Decomposition (SVD) method. Ultimately, the spatial employs the Product of Exponentials (POE) method transformation matrix T from C to C is obtained. Thus, v c for kinematic modeling of the robot, as illustrated in this section completes the introduction and theoretical Figure  5. The UR10 is a high-precision, high-load, col - derivation process of the monocular-based automatic laborative robotic arm with six degrees of freedom. registration method. The autonomous puncture device, a custom-developed electromechanical system, incorporates two degrees of Figure 5 Kinematic modeling of the robotic execution subsystem based on the POE method Chen  et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 9 of 27 freedom-linear feed and rotational cutting. Consider- u Th s, through the POE method, the kinematic mod - ing that the autonomous puncture device does not move eling and analysis of the robotic execution subsystem are during the robot’s registration tracking process, this completed. paper focuses on kinematic modeling of the six motion joints of the robotic arm. 2.3.2 V isual Servo Control Herein, ω represents the unit vector of the joint rota- APF method is an efficient local path planning algorithm, tion axis, r denotes a point on the axis, and v signifies i i yet it frequently encounters local minima issues and is the linear component in the spatial screw, and it is estab- typically applied to end-effector position constraints, lished that: complicating the concurrent consideration of its attitude. v =−ω × r = rˆ ω , i i i i i Utilizing this method to ensure safe and efficient robotic (13) ξ = (ω ; v ), i i i tracking during surgical procedures presents a consider- able challenge. To address this, the paper introduces a where ξ determines the unit spatial screw axis for joint i. 6D APFF admittance-based visual servo control method, By combining ξ with the exponential map, the corre- enhancing tracking efficiency while improving safety and sponding spatial transformation matrix can be obtained. reliability. u Th s, the kinematic model of the robot is established as Initially, potential field force modeling is conducted for follows: the surgical environment. Within this environment, the desired location exerts an attractive force on the robot’s ˆ ˆ ˆ ˆ ˆ ˆ t t ξ θ ξ θ ξ θ ξ θ ξ θ ξ θ t 1 1 2 2 3 3 4 4 5 5 6 6 T = h (θ) = e e e e e e h (0), b b b end-effector, whereas obstacles generate repulsive forces. (14) The attractive potential field integrates quadratic and coni - where T is the transformation matrix from the robot’s b cal potential fields, with the attractive force F (q) being att,i (17) base coordinate system C to the tool coordinate system defined as:  � � � �  −ζ (o (q) − o (q )), if o (q) − o (q ) ≤ d, � � i i i i i f f � � F (q) = o (q)−o (q ) att ,i i i � � � �  −dζ , if o (q) − o (q ) > d. i � i i �  � � �o (q)−o (q )� i i In the ith iteration, ζ represents the coefficient of the t t C , h (0) is the initial pose of T , ξ is the skew-symmet- b b attractive potential field, with d serving as the distance ric matrix of ξ , and θ − θ are the joint angles rotating 1 6 threshold where the potential field shifts from conical to about ξ − ξ , pre-set according to the actual safe work- 1 6 parabolic. The terms o (q) and o (q ) denote the robot’s i i S f space. Additionally, the spatial Jacobian matrix J (θ ) , current and desired positions during the ith iteration. For which converts the robot joint angular velocities into the the repulsive potential field, the repulsion becomes infinite end-effector space velocity, can be written as: as the robot’s position approaches a boundary. Conversely,   the repulsive force diminishes to zero once the robot is   ˙ beyond a certain threshold distance. The repulsive force S S ′ ′ ′ 2   V = J (θ)θ = (ξ , ξ , ..., ξ ) , (15) 1 2 6   F (q) is given as follows: ... rep,i 1 1 1 n ( − ) �ρ , if ρ ≤ ρ , i i i 0 ρ ρ i 0 ρ F (q) = i (18) rep,i 0, if ρ >ρ , i 0 S ′ ′ ′ J (θ) = (ξ , ξ , ..., ξ ),  1 2 6 ξ = Ad ξ , (Ex) i i (16) where ρ is the shortest distance from the robot’s posi-  ˆ ˆ ˆ θ ξ θ ξ θ ξ 1 1 2 2 i−1 i−1 tion at o (q) to any obstacle, and �ρ is the gradient Ex = e e ...e . i i of the distance field. n and ρ represent the repulsive i 0 potential field coefficient and the distance threshold for F (q) . Particularly, the patient’s head acts as an obsta- rep,i cle in the robot tracking process and is a critical safety consideration. By utilizing fiducials, a spherical surface is constructed through the random sample consen- sus to envelop the patient’s head. This sphere is consid - ered a typical intraoperative obstacle, exerting a virtual Chen et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 10 of 27 Figure 6 Visual servo control based on 6D APFF admittance repulsive force on the robot’s end-effector. With this, the puncture needle, is obtained. The calculation of F is mid,i modeling of APF in the robot’s workspace is completed. shown in Eq. (21). To address the challenges of applying the APF method to B B B F = F + F . path planning, this paper first introduces a 6D APFF admit - (21) mid,i top,i rem,i tance-based method, inspired by the mass-spring-damping The tracking process under the guidance of APF can be system, as illustrated in Figure 6. The end-effector needle is modeled as a mass-spring-damping system. Here, the vir- modeled as a line segment defined by the tip point P , the top tual mass of the puncture needle is denoted as M . The remote point P , and the midpoint P . rem mid difference S between the target pose Tar and the In the spatial coordinate system C , the force screws i S mid,i S S current pose Pose of the puncture needle forms a vir- F and F represent the forces acting on P and top top,i rem,i tual spring, with a spring constant K . Finally, the velocity P under the APF, respectively. It is assumed that C is rem S damping coefficient of the system is denoted as B: aligned with C .  � � B B B S S (22) MV + BV + KS = F . r × f mid,i mid,i mid,i mid,i S top,i top,i F = , top,i S top,i B In this model, V and V represent the veloc- mid,i mid,i (19) � �  S S  ity screw and the acceleration screw, respectively, of the r × f  S rem,i rem,i F = , rem,i S midpoint P on the puncture needle. Under the action f mid rem,i of F , the mass term M generates acceleration for mid,i S S where r and r denote the 3D coordinates of the movement of the puncture needle, addressing the top,i rem,i S S P and P in C . Corresp ondingly , f and f discontinuity in acceleration often seen in traditional top rem S top,i rem,i represent the forces acting on P and P under the APF methods where APF forces are directly mapped top rem APF. Additionally, in the body coordinate system C , it is to tracking velocities. The damping coefficient B helps established that: prevent excessively high tracking velocities, whereas the spring constant K serves an inspirational role, guid- B T S F = Ad F , top,i T top,i SB ing the puncture needle to escape from local minima. It (20) B T S is important to note that because motion screws cannot F = Ad F . rem,i T rem,i SB be directly subtracted, the transformation matrix from B B The adjoint transformation matrix Ad from C to C S B Pose (T ) → Tar (T ) is the exponential map of the SB i i i tar S S is used to describe the 6D force screws F and F motion screw S . This involves the following relation: top,i rem,i mid,i in C , enabling their transformation into the force screws B −1 B B B mid,i (23) F and F in C . Ultimately, the 6D force screw (T ) T = e . B i tar top,i rem,i F , representing the force of the APF acting on the mid,i The process of solving for S is denoted by mid,i Tar ≃ Pose . Finally, the iterative solution is obtained i i Chen  et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 11 of 27 through numerical integration. Assuming the control the target pose is primarily governed by attractive forces. period is ∆t, the following relation holds: Therefore, F can be reformulated as: mid,i  � � B B −1 B B B  ˙ F = K S , V = M F − BV − KS , e (26)  mid,i mid,i mid,i mid,i mid,i mid,i  B B V = V + V × �t, mid,i+1 mid,i mid,i where K is the equivalent stiffness of APF forces, and it Pose = Pose + V × �t, B B i+1 i  mid,i+1 is independent of S . Consequently, when S ≈ 0 , mid,i mid,i  B S = Tar ≃ Pose ,  i+1 i+1 mid,i+1 Eq. (22) can be rewritten as: F = APF(Pose ). i+1 mid,i+1 B K + K B B ¨ ˙ (24) S + S + S = 0. (27) mid,i mid,i mid,i M M We have also introduced the concept of multi-segment B In this case, the natural frequency ω and damping variable stiffness. When S < ε , as the tracking is mid,i ratio ζ can be defined as: about to be completed, the stiffness K is set to a higher value, whereas the potential field force is set to zero. This ω = (K + K ) M, n e approach can further accelerate convergence and reduce  (28) ζ = B 2 M(K + K ). oscillation. In this context, APF(∗) denotes the APF force function. During the iteration process, the output veloc- where ω and ζ are key determinants of the dynamic ity screw V is mapped to the robotic joint angular mid,i+1 characteristics of a second-order system. To further velocities θ through the velocity Jacobian matrix J : exe B { } analyze the influence of parameters M ,B,K ,K on the controller performance, the sensitivity is defined as −1 B θ = (J ) V exe B (25) mid,i+1 follows: Ultimately, θ is executed by the robot’s lower-level exe ∂f x S = · . controller, and the entire robotic tracking process is illus- x (29) ∂x f trated in Figure  7. Thus, the design and theoretical deri - vation of the visual servo control method based on 6D By jointly analyzing Eq. (28) and Eq. (29), the sensitivity APFF admittance are completed. of ω and ζ to {M ,B,K ,K } can be obtained, as listed in n e Table 1. 2.3.3 Parametric Sensitivity Analyses The proposed controller constitutes a typical discrete- time, nonlinear closed-loop system, in which the param- Table 1 Sensitivity of dynamic indices to controller parameters eters M , B , and K play a critical role in determining the Indice M B K control performance of the robot. To support the rational selection and tuning of these parameters, a sensitivity ω −0.5 0 +0.5 +0.5 analysis is performed to theoretically assess their influ - ζ −0.5 +1 −0.5 −0.5 ence on the system’s dynamic behavior. The tracking of Figure 7 Visual servo control process flow diagram Chen et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 12 of 27 robustness of the proposed algorithm, several distrac- tor markers were also randomly attached to the head phantom. The placement of the customized fiducials followed three key principles: (1) Randomness: Fidu- cials should be placed without fixed patterns to avoid location-specific biases and to test the generalizability of the algorithm. (2) Dispersion: Fiducials should be spatially distributed to prevent self-occlusion caused by overly compact arrangements, ensuring visibility robustness. (3) Visibility: The placement ensured that all fiducials remained within the camera’s field of view, reflecting perceptual completeness. Figure 8 Overview of the visual servoing experimental setup Figure 9 Design and placement of fiducials 3 Experiments and Results In this section, preliminary experiments were conducted, primarily including: automatic detection of fiducials, monocular-based registration, and visual servo control with 6D APFF admittance. The overall experimental plat - form consists of a robotic execution subsystem, a punc- ture device, and a vision navigation subsystem built with an HD video camera of NDI Polaris Vega VT, as shown in Figure  8. NDI Polaris Vega VT can achieve a meas- urement accuracy of up to 0.12 mm within the effective workspace. It is widely used in clinical surgeries with surgical robots [43, 44]. The host PC is equipped with an Intel Core i9-12900H CPU at 2.5 GHz, offering high-per - formance processing capabilities. It has 32 GB of RAM and is fitted with an Nvidia GeForce RTX 3060 GPU, suitable for handling graphics-intensive tasks. Moreover, the accuracy verification of monocular registration is car - ried out by the IR sensors of NDI. 3.1 A utomatic Fiducial Detection Experiments Figure 10 Comparison between manual and automatic fiducial The customized fiducials and their physical place - annotation in the preoperative stage ment are illustrated in Figure 9. To further evaluate the Chen  et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 13 of 27 Table 2 Quantitative analysis of preoperative fiducial detection i Example 1 Example 2 X (mm) Y (mm) Z (mm) X (mm) Y (mm) Z (mm) d (mm) X (mm) Y (mm) Z (mm) X  (mm) Y  (mm) Z  (mm) d (mm) err err err err err err 1 −80.71 107.84 493.16 0.19 0.20 0.08 0.29 4.15 174.37 517.68 0.14 0.10 0.32 0.36 2 −81.00 107.49 493.44 0.10 0.15 0.20 0.27 3.98 174.01 517.89 0.03 0.26 0.11 0.29 3 −80.68 107.32 493.29 0.22 0.32 0.05 0.39 4.05 174.37 518.24 0.04 0.10 0.24 0.26 4 −80.96 107.81 492.96 0.06 0.17 0.28 0.33 3.92 174.37 517.89 0.09 0.10 0.11 0.17 5 −81.19 107.81 492.98 0.29 0.17 0.26 0.43 3.99 174.37 518.10 0.02 0.10 0.10 0.14 6 −80.71 107.72 493.47 0.19 0.08 0.23 0.31 4.16 174.28 518.11 0.15 0.01 0.11 0.18 7 −81.03 107.48 493.36 0.13 0.16 0.12 0.24 3.84 174.13 518.09 0.17 0.14 0.09 0.24 Mean −80.89 107.64 493.24 0.17 0.18 0.17 0.32 4.01 174.27 518.00 0.09 0.12 0.15 0.24 Auto −80.80 107.68 493.32 0.09 0.04 0.08 0.13 4.05 174.31 518.06 0.04 0.04 0.06 0.08 Chen et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 14 of 27 The framework proposed in this paper implements extraction of all pixel points corresponding to the fidu - an efficient detection of fiducials in S preoperatively, cial spheres in the image coordinate system. By calculat- med using a combination of thresholding and clustering ing the mean coordinates of these pixels, the center of methods. To validate the performance, the experimental each fiducial sphere can be determined, thereby complet - setup is as follows: Seven testers, after learning the basic ing the automatic detection process. For this study, the operations and marking methods of the Slicer software, baseline model used is YOLOv8x-seg (a variant within manually annotated an example containing nine fiducials. YOLOv8, specifically designed for instance segmentation The results and time cost for marking were recorded, tasks). During the training process, the evaluation results with each tester denoted as User-i. The proposed detec - on the validation set are shown in Figure 11(a). The focus tion method also performed the detection automatically, is particularly on the mask (M) accuracy and recall rate, recording the results and the time cost. The results are both of which eventually approach 1 with increasing iter- shown in Figure 10. ations, indicating that the model’s performance meets the Considering the clarity of the image, two marking requirements. Additionally, as shown in Figure 11(b), the results are provided. Black spheres represent the mark- model exhibits stable segmentation performance in pre- ings made by the testers, red spheres represent the mean dicting fiducials, even in the presence of distractor mark - of these black sphere markings, and blue spheres indicate ers. For videos with a resolution of 1280x720, the model the results of automatic detection. The red sphere, rep - processes at about 10 frames/s. resenting the mean position of all manual annotations, reflects the consensus region among annotators and can 3.2 Monocular‑based Registration Experiments be assumed as an ideal reference location. The dispersed This section primarily aims to validate the accuracy and distribution of black spheres illustrates inconsistencies efficiency of the monocular registration method. First, in manual labeling, highlighting the presence of annota- the accuracy of the fiducial localization during the reg - tion uncertainty and potential errors. The automatically istration process is evaluated. The experimental setup detected point is found to be in close proximity to the is as follows: (1) Using a single optical camera based on mean annotation, qualitatively demonstrating the reli- NDI, the 3D spatial coordinates of the fiducials in the ability and effectiveness of the proposed detection algo - surgical environment are estimated through correspond- rithm. A quantitative analysis is further performed, as ing estimation and depth recovery, and aligned to the listed in Table 2. NDI’s infrared coordinate system C ; (2) An NDI posi- In this context, X , Y , and Z represent the devi- tioning probe is employed to directly determine the 3D err err err ations of manually annotated points from the mean coordinates of the fiducials in C ; (3) Precision validation point along the X, Y, and Z axes, respectively, whereas experiments are conducted under six different positional d denotes the Euclidean distance between them. It is relationships between the cranial models and NDI. To observed that the manual annotation errors range from evaluate the impact of fiducial quantity on the method, 0.14 mm to 0.43 mm, further indicating the inherent a sensitivity analysis experiment is conducted; (4) Five uncertainty associated with manual labeling. Compared fiducials are used for registration, whereas the remaining to the average manual annotation errors (0.32 mm and fiducials are utilized to evaluate the registration accuracy; 0.24 mm), the proposed method achieves significantly (5) The registration efficiency is assessed under the same lower average errors of 0.13 mm and 0.08 mm, cor- hardware conditions. i i ∗ ∗ ∗ responding to reductions of 59.4% and 66.7%, thereby Suppose p (x , y , z ) and p (x , y , z ) represent i i i mon ndi i i i demonstrating a substantial improvement in localization the ith pair of points obtained from (1) and (2) in a single accuracy. Additionally, a comparison of time consump- experiment. The localization accuracy of the proposed tion is listed in Table 3. method is evaluated using the Euclidean Distance Error: The time taken by different testers varied significantly, i i with the fastest at 9.3 min and the slowest at 14.6 min, d = p − p . (30) mon ndi averaging 1.4 min per fiducial. Conversely, the automatic i=1 detection of nine fiducials took a total of only 0.1 s. In Additionally, the registration accuracy in the validation summary, the proposed automatic detection method has experiments should be evaluated with the Target Regis- clearly reached human-level accuracy and is more effi - tration Error (TRE). Suppose the verification fiducial on cient, saving a significant amount of valuable time in the the image is denoted as p , and the corresponding fidu - treatment of acute patients. pix The intraoperative automatic detection of fiducial cial in the surgical space obtained by NDI is p : ndi markers is primarily achieved through instance segmen- tation. Specifically, instance segmentation enables the Chen  et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 15 of 27 Table 3 Preoperative fiducial marking time consumption i 1 2 3 4 5 6 7 Auto Time 13.9 min 14.6 min 12.8 min 9.3 min 12.5 min 11.9 min 14.5 min <0.1 s provides richer spatial constraints during optimiza- ndi c i i TRE = T T p − p . (31) c p pix ndi tion, leading to more reliable alignment outcomes. It i=1 is worth noting that in the Test_2 scenario, the method ndi c still achieved high localization accuracy even with a Here, T and T represent the spatial transformation c p limited number of fiducials. This should be considered matrices from C to C and from the pixel coordinate sys- c i a case-specific anomaly rather than a generalizable tem to C . m denotes the number of validation fiducials. outcome. Therefore, to ensure consistent and reliable The Fiducial Registration Error (FRE) represents the reg - performance, it is recommended to utilize a greater istration error of the point set involved in the registration number of fiducials. after the registration is completed, and its calculation for- The results of validation experiments for the registra - mula is consistent with that of the TRE. The experimental tion are shown in Figure 14 and Table 4. results validating the spatial localization accuracy of the As shown in the figure, each of the validation experi - proposed method are shown in Figure 12. ments involved five registration fiducials and four valida - As depicted in the accompanying figure, each of tion fiducials. It can be observed that the mean source the six validation experiments for localization accu- registration error across the six experiments is 0.347 racy involved nine fiducials. It can be observed that, mm, indicating that the registration process was suc- excluding outliers, the X, Y, and Z coordinate errors cessfully completed. The average errors at the validation of the majority in each experiment are less than 1 mm. reference points were 0.74, 0.35, 0.56, 0.69, 0.67, and 0.48 Additionally, the mean and median errors of d are also mm, respectively, with a mean error of 0.581 mm and a below 1 mm. It is important to note that the fiducials median of 0.571 mm. These results meet the expected used in this experiment are spherical, requiring the use clinical registration requirement, which specifies that the of an NDI positioning probe to collect multiple surface median registration error should be within 1 mm. The points of the spherical reference, and the sphere center minimum average error was 0.282 mm, the maximum is obtained through sphere fitting. This may introduce was 0.902 mm, and the standard deviation was 0.245 some errors. Therefore, we conclude that the spatial mm, further demonstrating the stability of the proposed localization accuracy of the proposed method meets method. The results of time consumption experiments the requirements for clinical surgery. are shown in Figure 15. We evaluated the localization performance under It can be observed that the average time taken from varying fiducial configurations, specifically examining fiducial detection to the completion of spatial registra - the effects of fiducial quantity and spatial arrangement. tion across the six experiments is 0.228 s, with minimal The experimental design involved three test scenarios fluctuation. Among the processes, fiducial detection in which fiducials were progressively occluded, as illus - takes the longest time, averaging 0.132 s, which accounts trated in Figure  13(a). Notably, the specific markers for approximately 58% of the total time. This is because subjected to occlusion differed across scenarios, ena - higher-resolution images were used to improve the bling a preliminary assessment of how fiducial place - localization accuracy of the fiducials on the 2D image. ment influences localization accuracy. The results are Next, depth recovery and correspondence estimation, presented in Figure 13(b). both solved using optimized methods, took an average As illustrated in Figure  13(b), when the number of of only 0.058 and 0.033 s, respectively, demonstrating fiducials exceeds 4, the localization error consistently good real-time performance. Finally, the time for spatial remains below 1 mm. Additionally, the small vari- registration was negligible. In summary, the proposed ance observed in these cases indicates the stability of monocular registration algorithm satisfies the real-time the proposed method. However, when the number of requirements of clinical surgical robot registration and fiducials falls below 4, the localization accuracy pro - tracking. gressively deteriorates, accompanied by an increase in variance, suggesting a decline in the method’s robust- ness. This phenomenon can be interpreted from a feature-based perspective: a larger number of fiducials Chen et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 16 of 27 (a) (b) Figure 11 Intraoperative automatic fiducial detection: (a) Visualization of key metrics during the training process of YOLOv8x-seg. The plots include the precision, recall, and mean average precision (mAP) for object detection (B) and instance segmentation (M); (b) Visualization of the predicted results of fiducial markers by YOLOv8x-seg Chen  et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 17 of 27 3.3 Visual Servo Control Experiments Following the completion of the registration experiment, a tracking control experiment was conducted using the obtained registration results. In this experiment, the operator held a patient head model with attached fidu - cials, simulating various movements to replicate ran- dom intraoperative disturbances of the patient’s head. To evaluate our method’s performance, three methods were compared: the APF-velocity-based visual servo con- trol method (APF-V), the APF-acceleration-based visual servo control method without multi-segment variable stiffness (APF-A-I), and the method proposed in this paper, named APF-A-II. In Section  2.3.3, the sensitivity of the controller’s dynamic performance to the control parameters has been Figure 12 Validation experiments of the localization accuracy derived. Furthermore, based on relevant studies [45, 46] (a) (b) Figure 13 Impact of fiducial quantity on performance: (a) Experiment setup, (b) Localization error under different fiducial configurations Chen et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 18 of 27 Figure 14 Validation experiments of the registration accuracy Table 4 Registration errors Exp FRE (mm) TRE (mm) TRE TRE TRE (mm) Time (s) min max std (mm) (mm) 1 0.413 0.336 1.088 0.320 0.736 0.222 2 0.294 0.202 0.510 0.109 0.355 0.240 3 0.471 0.261 0.846 0.209 0.559 0.224 4 0.228 0.439 0.832 0.152 0.692 0.225 5 0.273 0.194 1.491 0.524 0.666 0.216 6 0.404 0.258 0.646 0.156 0.482 0.244 Mean 0.347 0.282 0.902 0.245 0.581 0.228 M = diag(0.05, 0.05, 0.05, 0.5, 0.5, 0.5), B = diag(3, 3, 3, 30, 30, 30), (32) K = diag(10, 10, 10, 10, 10, 10), K = diag(120, 120, 120, 300, 300, 300), where K and K represent the front and rear stages of 1 2 the multi-segment variable stiffness, respectively. To fur - ther evaluate the effectiveness of the set parameters, a simulation experiment was conducted, as shown in Fig- ure 16. In the experiment, M , B , and K were respectively set to {0.1, 1, 10}, {0.75, 1, 1.25}, and {0.1, 1, 10} relative to their set values for comparison. It is evident that the selected values of B and K achieve Figure 15 Time consumption of complete registration pipeline a more favorable trade-off between overshoot and response time. Notably, reducing the value of M gener- ally enhances the system’s dynamic performance, par- and with appropriate adjustments, the controller param- ticularly in terms of transient behavior. Nevertheless, this eter is set as: enhancement is typically accompanied by abrupt velocity Chen  et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 19 of 27 Figure 16 Sensitivity analysis of M , B , and K variations (the increased slope of x(t) ). In the context of speed may lead to fiducial blur, thereby degrading recog - surgical robotics, such rapid transitions may introduce nition performance. This resembles clinical conditions, potential safety hazards. The introducing M ensures where patient movement during surgery is typically slow velocity continuity, thereby alleviating this issue and fur- and of small amplitude owing to the effects of anesthesia. ther validating the stability of the proposed model. The experimental procedure is illustrated in Figure 17. A 16-second head motion experiment was conducted, At each second, the head motion status, camera recog- during which the head phantom was manually moved by nition results, and the corresponding tracking response the operator across four distinct positions to qualitatively were recorded. As illustrated in the figure, MonoTracker validate the effectiveness of the proposed method. In this consistently detected the fiducial markers even dur - setup, the head motion was intentionally kept relatively ing head movement and successfully completed spatial slow to mimic realistic patient movements. Excessive registration. Once the preoperative planning pose was Chen et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 20 of 27 Figure 17 Visual servoing tracking process with APF-A-II transformed into the robot’s coordinate frame, the sur- under complex conditions. The tracking process is illus - gical robot commenced tracking based on the proposed trated in Figure  18. To verify the advantages of the pro- method. The results demonstrate that the robot main - posed method, a comparative analysis is conducted from tained good real-time tracking performance. four perspectives: task reachability, motion smooth- The head movement disturbance was set at three lev - ness, environmental adaptability, and real-time tracking els: low, medium, and high speeds, to comprehensively performance. analyze the tracking performance of the three methods Chen  et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 21 of 27 Figure 18 Head tracking experiments with three methods (1) Task Reachability: As shown in Figure  18, the resulting in higher tracking accuracy compared to first column illustrates the desired pose of the track - APF-V. Additionally, the smoothness of the motion ing target, whereas the second, third, and fourth col- trajectory is enhanced, particularly in reducing sud- umns display the tracking motion trajectories of the den changes in acceleration as tracking approaches three algorithms under low, medium, and high-speed completion. This improvement is mainly due to movements, respectively. It can be observed that all mapping the potential field force to acceleration, three methods ultimately manage to track the target. making the robot’s motion more controllable. By However, APF-V and APF-A-I exhibit some oscilla- comparing the motion trajectories in Figure 18 and tory adjustments near the end of the tracking pro- the motion speeds in Figure  21, it can be observed cess. The proposed method, APF-A-II, achieves rapid that APF-A-II further enhances trajectory tracking convergence. This is primarily due to the approach accuracy and motion smoothness. Even when the of converting pose errors into 6D motion twists, desired trajectory changes abruptly, the smoothness which enables quick target tracking without causing of the robot’s tracking motion is still maintained, oscillations typically caused by tracking overshoot. greatly improving the safety of surgical robots in Additionally, the tracking trajectory of the proposed clinical applications. method shows the highest similarity to the desired (3) Environmental Adaptability: Considering the trajectory, indicating superior tracking accuracy dur- uncertainty of patient head movement during clinical ing the visual servo process. This is particularly sig - surgery, the robot must be capable of high-precision nificant for surgical robots, where high tracking pre - tracking at various speeds while maintaining smooth cision is crucial. motion trajectories. However, under different target (2) Motion Smoothness: As shown in Figures  19, movement speeds, owing to the limitations of the controller’s hyperparameters, it is difficult to bal 20, and 21, APF-A-I introduces a guiding term, - Chen et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 22 of 27 Figure 19 Motion performance analysis of APF-V ance high-precision tracking with trajectory smooth- ness. For APF-A-II, the introduction of the multi- segment variable stiffness method enables rapid 4 Discussion tracking and obstacle avoidance using 6D APF forces This paper proposes a fully automatic registration and when the tracking error exceeds a threshold, with tracking control framework based on a monocular cam- low stiffness for heuristic guidance. When the track - era. First, custom fiducials are designed, and automatic ing error is below the threshold, APF forces are no detection methods are developed to facilitate the auto- longer needed, and rapid tracking of the desired pose matic extraction of fiducials in C and C . By constructing v c is achieved through virtual spring forces under high 2D KD-Tree features and utilizing a two-stage optimiza- stiffness. tion method, the 2D-3D correspondence estimation and (4) Tracking Real‑ Time Performance: As shown in depth recovery are accomplished, followed by registra- Figure  22, the average computation times per itera- tion through SVD. The mass-spring-damping system tion for APF-V, APF-A-I, and APF-A-II are 0.0389, model is improved, and a visual servo control strategy 0.0388, and 0.0393 s, respectively. Therefore, the based on 6D APFF admittance is designed. Integrating improvements introduced in the proposed method the outputs of monocular registration, the robot’s pose is only slightly increase the computation time, without adjusted in real time, achieving dynamic tracking of the adding significant overhead. This ensures that the head movement. The experiments confirm the accuracy, real-time performance of the tracking motion control efficiency, and motion performance. is maintained. Chen  et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 23 of 27 Figure 20 Motion performance analysis of APF-A-I As listed in Table  5, this study compares the pro- several minutes. To our knowledge, there has been no posed registration method with some advanced meth- prior report of using a single RGB camera to achieve ods. Here, the variables primarily include the type intraoperative registration in seconds for neurosurgi- of surgery, whether markers are used, and the type of cal robotics. NIR-based registration offers the highest visual sensor. The method based on fiducial mark - accuracy, but owing to the costly markers and limited ers is abbreviated as MB (Marker-Based), whereas the workspace, its clinical application is subject to more method without fiducial markers is abbreviated as ML stringent conditions. The proposed method, which is (Marker-Less). Clearly, registration methods based based on monocular, balances registration accuracy on markers mostly achieve an accuracy within 1 mm. and efficiency and reduces hardware costs and system However, those marker-less methods often have errors complexity, holds significant clinical importance. greater than 2.5 mm, which does not meet the clinical However, because MonoTracker relies on geomet- requirements of neurosurgical operations. Therefore, ric features of fiducials for correspondence estimation, registration methods based on markers will continue its registration may fail when fiducials are occluded. In to be widely used in clinics for a considerable length fact, the information loss problem is an inherent draw- of time. In terms of visual sensor types, RGB-D cam- back in mono-modality surgical navigation systems eras are extensively used by marker-less methods, but and a common challenge faced by researchers [49]. their accuracy has reached a plateau. Meng et  al. [9] MonoTracker identifies the number of fiducials to detect first achieved intraoperative registration using a sin - potential occlusions and alerts the surgeon accordingly, gle RGB camera. However, owing to the use of multi- thereby ensuring surgical safety. Recently, several stud- view 3D reconstruction, a single registration can take ies have explored the methods of multi-sensor fusion Chen et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 24 of 27 Figure 21 Motion performance analysis of APF-A-II [50], robot-assisted tracking [43], and deep learning planning techniques based on APF by representing techniques [51] to alleviate information loss. Heunis the target pose as a 6D force acting on the end of the et al. employed eight infrared cameras in a large capture robotic arm. Combined with a mass-spring-damping solution to avoid dynamic obstacles, which may lead to model, a 6D APFF visual servo controller is designed, substantial costs [50]. Conversely, monocular cameras ultimately achieving good motion performance. In this are lower-cost, more compact, and offer a wider field of paper, the key parameters M , B , and K are determined view, making them more advantageous for multi-device empirically to suit general tasks for the surgical robot deployment in surgical environments. This provides system. However, owing to space limitations, the paper a feasible solution for MonoTracker to address occlu- does not delve into complex clinical environments with sion issues in clinical settings. Moreover, incorporating dynamic obstacles. Essentially, the nature of this visual temporal information by performing feature matching servo control method is to map potential field forces to between consecutive frames enables the identification of accelerations, enhancing the controller’s performance occluded fiducials, thereby providing an effective solu - in terms of velocity smoothness. Therefore, this method tion to the problem of partial occlusion. This also repre - can be generalized and applied to similar visual track- sents a promising direction for future research. ing tasks. In the field of visual servo control, the paper pre - In conclusion, this work, aimed at tracking the patient’s sents a tracking control method based on APFF admit- head during neurosurgical operations, has designed a tance. This method improves upon traditional path monocular-based automatic registration and tracking Chen  et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 25 of 27 Figure 22 Comparison of computation time for three methods Table 5 Comparison of advanced registration methods Ref. Procedure Sensor TRE (mm) Time (s) Note [20] Brain NIR 0.7 0.1 MB [47] Brain NIR 0.45 − MB [9] Brain RGB 1.39 >40 MB [48] Orthopedic RGB-D 2.74 0.2 ML [11] Brain RGB-D <3 <10 ML [12] Spine RGB-D 2.7 1.5 ML Ours Brain RGB 0.58 0.23 MB control framework, and has conducted preliminary work integrating monocular-based registration and exploration and validation theoretically and experimen- visual servo control. tally. Although some issues remain unresolved, it holds (2) The proposed registration module employs a 2D tremendous potential for future applications: (1) Com- KD-Tree-based feature extraction method and a pleting dynamic tracking of the patient’s head intraopera- two-stage optimization strategy to establish robust tively enhances the adaptability of neurosurgical robots 2D-3D correspondences and recover monocular and reduces the workload for surgeons. (2) The fully depth information. The transformation is estimated automatic real-time registration method lays the ground- via SVD, enabling efficient and accurate alignment. work for deeper integration of Mixed Reality (MR) and (3) A 6D APFF, inspired by the mass-spring-damper surgical robots. (3) The visual servo control method pro - system, is developed to ensure compliant motion posed provides a new approach for visual tracking tasks and stability during tracking. The strategy enhances with similar requirements. system responsiveness while maintaining trajectory smoothness. 5 Conclusions (4) Experimental validation demonstrates that the proposed registration method achieves clinically acceptable accuracy and computational efficiency. (1) To address the challenge of puncture planning The visual servo controller provides stable and deviation induced by intraoperative patient head smooth tracking performance, reducing the cogni- movement in neurosurgical procedures, this study tive and operational load on the surgeon. presents MonoTracker, a fully automated frame- Chen et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 26 of 27 [7] K Machetanz, F Grimm, S Wang, et al. Patient-to-robot registration: The (5) The MonoTracker framework shows strong fate of robot-assisted stereotaxy. The International Journal of Medical potential for application in various surgical robot Robotics and Computer Assisted Surgery, 2021, 17(5): 2288. navigation scenarios. Future work will focus on [8] A Fomenko, D Serletis. Robotic stereotaxy in cranial neurosurgery: A qualitative systematic review. Neurosurgery, 2018, 83(4): 642–650. addressing fiducial loss in the camera’s field of view [9] F Meng, F Zhai, B Zeng, et al. An automatic markerless registration through multi-view monocular tracking strate- method for neurosurgical robotics based on an optical camera. Inter- gies. Additionally, the integration of this method national Journal of Computer Assisted Radiology and Surgery, 2018, 13: 253–265. into Augmented Reality (AR)/Virtual Reality (VR)- [10] Y Su, Y Sun, M Hosny, et al. Facial landmark-guided surface matching for based surgical navigation systems will be explored image-to-patient registration with an RGB-D camera. The International to enhance surgical immersion and human-robot Journal of Medical Robotics and Computer Assisted Surgery, 2022, 18(3): interaction. [11] F Liebmann, M Atzigen, D Stütz, et al. Automatic registration with con- tinuous pose updates for marker-less surgical navigation in spine surgery. Medical Image Analysis, 2024, 91: 103027. Acknowledgements [12] G Fattori, A J Lomax, D C Weber, et al. Technical assessment of the NDI Not applicable. Polaris Vega optical tracking system. Radiation Oncology, 2021, 16(1): 1–4. [13] A I Omara, M Wang, Y F Fan, et al. Anatomical landmarks for point-match- Authors’ Contributions ing registration in image-guided neurosurgery. The International Journal Kai Chen was in charge of the idea conception, algorithm implementation, of Medical Robotics and Computer Assisted Surgery, 2014, 10(1): 55–64. data collection, and manuscript writing. Diansheng Chen was in charge of [14] G A Puerto-Souza, J A Cadeddu, G L Mariottini. Toward long-term and the idea conception, study oversight/supervision, manuscript review. Ruijie accurate augmented-reality for monocular endoscopic videos. IEEE Zhang was in charge of the algorithm implementation, data collection, manu- Transactions on Biomedical Engineering, 2014, 61(10): 2609–2620. script editing. Cai Meng was in charge of the study oversight/supervision, [15] F Liebmann, D Stütz, D Suter, et al. Spinedepth: A multi-modal data col- manuscript editing and review. Zhouping Tang was in charge of the study lection approach for automatic labelling and intraoperative spinal shape oversight/supervision, manuscript review. All authors read and approved the reconstruction based on RGB-D data. Journal of Imaging, 2021, 7(9): 164. final manuscript. [16] X S Hu, N Wagley, A T Rioboo, et al. Photogrammetry-based stereoscopic optode registration method for functional near-infrared spectroscopy. Funding Journal of Biomedical Optics, 2020, 25(9): 095001. Supported by National Natural Science Foundation of China (Grant No. [17] S Kim, H An, M Song, et al. Automated marker-less patient-to-preopera- 92148206). tive medical image registration approach using RGB-D images and facial landmarks for potential use in computer-aided surgical navigation of the Data Availability paranasal sinus. Proceedings of the Computer Graphics International Confer- The datasets used and analysed during the current study are available from ence, Shanghai, China, 2023: 135–145. the corresponding author on reasonable request. [18] L X Liang. Precise iterative closest point algorithm for RGB-D data registra- tion with noise and outliers. Neurocomputing, 2020, 399: 361–368. [19] Q Lin, R Yang, K Cai, et al. Real-time automatic registration in optical surgi- Declarations cal navigation. Infrared Physics & Technology, 2016, 76: 375–385. [20] Y Xu, F Gao, H Ren, et al. An iterative distortion compensation algorithm Competing Interests for camera calibration based on phase target. Sensors, 2017, 17(6): 1188. The authors declare no competing financial interests. [21] H Liu, J Fu, M He, et al. GWM-view: gradient-weighted multi-view calibra- tion method for machining robot positioning. Robotics and Computer- Integrated Manufacturing, 2023, 83: 102560. Received: 28 February 2025 Revised: 9 July 2025 Accepted: 21 July 2025 [22] A Taleb, C Guigou, S Leclerc, et al. Image-to-patient registration in computer-assisted surgery of head and neck: State-of-the-art, perspec- tives, and challenges. Journal of Clinical Medicine, 2023, 12(16): 5398. [23] A Martin-Gomez, H Li, T Song, et al. STTAR: surgical tool tracking using off-the-shelf augmented reality head-mounted displays. IEEE Transactions References on Visualization and Computer Graphics, 2023. [1] T Vos, S S Lim, C Abbafati, et al. Global burden of 369 diseases and injuries [24] J Zhang, Z Yang, S Jiang, et al. A spatial registration method based on in 204 countries and territories, 1990–2019: A systematic analysis for 2D–3D registration for an augmented reality spinal surgery navigation the global burden of disease study 2019. The Lancet, 2020, 396(10258): system. The International Journal of Medical Robotics and Computer Assisted 1204–1222. Surgery, 2024, 20(1): e2612. [2] V L Feigin, T Vos, F Alahdab, et al. Burden of neurological disorders across [25] M T Holland, K Mansfield, A Mitchell, et al. Hidden error in optical stereo - the US from 1990–2017: A global burden of disease study. JAMA Neurol., tactic navigation systems and strategy to maximize accuracy. Stereotactic 2021, 78(2): 165–176. and Functional Neurosurgery, 2021, 99(5): 369–376. [3] C Faria, W Erlhagen, M Rito, et al. Review of robotic technology for [26] Y Wang, W Wang, Y Cai, et al. A guiding and positioning motion strategy stereotactic neurosurgery. IEEE Reviews in Biomedical Engineering, 2015, 8: based on a new conical virtual fixture for robot-assisted oral surgery. 125–137. Machines, 2022, 11(1): 3. [4] Z Wu, D Chen, C Pan, et al. Surgical robotics for intracerebral hemorrhage [27] H Su, W Qi, C Yang, et al. Deep neural network approach in robot tool treatment: State of the art and future directions. Annals of Biomedical dynamics identification for bilateral teleoperation. IEEE Robotics and Engineering, 2023, 51(9): 1933–1941. Automation Letters, 2020, 5(2): 2943–2949. [5] T Haidegger, Z Benyo, K Peter. Patient motion tracking in the presence of [28] T Haidegger, S Speidel, D Stoyanov, et al. Robot-assisted minimally inva- measurement errors. Proceedings of the 2009 Annual International Confer- sive surgery—surgical robotics in the data age. Proceedings of the IEEE, ence of the IEEE Engineering in Medicine and Biology Society, Minneapolis, 2022, 110(7): 835–846. USA, September 3–6, 2009: 5563–5566. [29] S Dinesh, U K Sahu, D Sahu, et al. Review on sensors and components [6] G Z Yang, J Cambias, K Cleary, et al. Medical robotics—regulatory, ethical, used in robotic surgery: recent advances and new challenges. IEEE Access, and legal considerations for increasing levels of autonomy. Science Robot- 2023, 11: 140722–140739. ics, 2017, 2(4): 8638. [30] S Niyaz, A Kuntz, O Salzman, et al. Following surgical trajectories with concentric tube robots via nearest-neighbor graphs. Proceedings of the Chen  et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 27 of 27 2018 International Symposium on Experimental Robotics, Buenos Aires, Kai Chen born in 1995, is currently a PhD candidate at School of Argentina, 2020: 3–13. Mechanical Engineering and Automation, Beihang University, China. His [31] W Park, Y Wang, G S Chirikjian. The path-of-probability algorithm for steer- research interests include surgical robotics and medical image pro- ing and feedback control of flexible needles. The International Journal of cessing. E-mail: [email protected]. Robotics Research, 2010, 29(7): 813–830. [32] A Segato, V Pieri, A Favaro, et al. Automated steerable path planning for Diansheng Chen born in 1969, is currently a professor at School of deep brain stimulation safeguarding fiber tracts and deep gray matter Mechanical Engineering and Automation, Beihang University, China. He nuclei. Frontiers in Robotics and AI, 2019, 6: 70. received his PhD degree from Jilin University, China, in 2003. E-mail: [33] A Hong, Q Boehler, R Moser, et al. 3D path planning for flexible needle steering in neurosurgery. The International Journal of Medical Robotics and [email protected]. Computer Assisted Surgery, 2019, 15(4): 1998. [34] L Hao, D Liu, S Du, et al. An improved path planning algorithm based on Ruijie Zhang born in 2000, is currently a master candidate at artificial potential field and primal-dual neural network for surgical robot. School of General Engineering, Beihang University, China. He received Computer Methods and Programs in Biomedicine, 2022, 227: 107202. his bachelor’s degree from Beihang University, China, in 2023. [35] S O Park, M C Lee, J Kim. Trajectory planning with collision avoidance for redundant robots using Jacobian and artificial potential field-based real- Cai Meng born in 1977, is currently an associate professor at School time inverse kinematics. International Journal of Control, Automation and of Astronautics, Beihang University, China. He received his PhD degree Systems, 2020, 18: 2095–2107. from Beihang University, China, in 2004. E-mail: [email protected]. [36] B Kovács, G Szayer, F Tajti, et al. A novel potential field method for path planning of mobile robots by adapting animal motion attributes. Robotics and Autonomous Systems, 2016, 82: 24–34. Zhouping Tang born in 1969, is currently a professor at Tongji Med- [37] L He, Y Meng, J Zhong, et al. Preoperative path planning algorithm for ical College, Huazhong University of Science and Technology, China. He lung puncture biopsy based on path constraint and multidimensional received his PhD degree from Tongji Medical College, Huazhong Uni- space distance optimization. Biomedical Signal Processing and Control, versity of Science and Technology, China, in 2004. 2023, 80: 104304. [38] G Tong, X Wang, H Jiang, et al. A deep learning model for automatic segmentation of intraparenchymal and intraventricular hemorrhage for catheter puncture path planning. IEEE Journal of Biomedical and Health Informatics, 2023. [39] J Han, J Davids, H Ashrafian, et al. A systematic review of robotic surgery: From supervised paradigms to fully autonomous robotic approaches. The International Journal of Medical Robotics and Computer Assisted Surgery, 2022, 18(2): 2358. [40] F Xu, H Jin, X Yang, et al. Improved accuracy using a modified registration method of ROSA in deep brain stimulation surgery. Neurosurgical Focus, 2018, 45(2): 18. [41] H Yasin, H J Hoff, I Blümcke, et al. Experience with 102 frameless stereo - tactic biopsies using the Neuromate robotic device. World Neurosurgery, 2019, 123: 450–456. [42] D Tian, Q Xu, X Yao, et al. Diversity-guided particle swarm optimization with multi-level learning strategy. Swarm and Evolutionary Computation, 2024, 86: 101533. [43] J Han, M Luo, Y You, et al. Optimization scheme for online viewpoint planning of active optical navigation system in orthopedic surgeries. IEEE Transactions on Instrumentation and Measurement, 2023, 72: 1–13. [44] L Chen, L Ma, F Zhang, et al. An intelligent tracking system for surgical instruments in complex surgical environment. Expert Systems with Appli- cations, 2023, 230: 120743. [45] Y Wang, W Wang, Y Cai, et al. Preliminary study of a new macro‐micro robot system for dental implant surgery: Design, development and con- trol. The International Journal of Medical Robotics and Computer Assisted Surgery, 2024, 20(1): e2614. [46] J Wang, C Lu, Y Lv, et al. Task space compliant control and six-dimensional force regulation toward automated robotic ultrasound imaging. IEEE Transactions on Automation Science and Engineering, 2023. [47] F Suligoj, M Švaco, B Jerbić, et al. Automated marker localization in the planning phase of robotic neurosurgery. IEEE Access, 2017, 5: 12265–12274. [48] H Liu, F R Y Baena. Automatic markerless registration and tracking of the bone for computer-assisted orthopaedic surgery. IEEE Access, 2020, 8: 42010–42020. [49] L Xu, H Zhang, J Wang, et al. Information loss challenges in surgical navigation systems: From information fusion to AI-based approaches. Information Fusion, 2023, 92: 13–36. [50] C M Heunis, B F Barata, G P Furtado, et al. Collaborative surgical robots: optical tracking during endovascular operations. IEEE Robotics & Automa- tion Magazine, 2020, 27(3): 29–44. [51] S Tukra, H J Marcus, S Giannarou. See-through vision with unsupervised scene occlusion reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(7): 3779–3790. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Chinese Journal of Mechanical Engineering Springer Journals

MonoTracker: Monocular-Based Fully Automatic Registration and Real-Time Tracking Method for Neurosurgical Robots

Loading next page...
 
/lp/springer-journals/monotracker-monocular-based-fully-automatic-registration-and-real-time-nG7F3neASJ

References (45)

Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2025
ISSN
1000-9345
eISSN
2192-8258
DOI
10.1186/s10033-025-01334-3
Publisher site
See Article on Publisher Site

Abstract

Robot-assisted surgery has become an indispensable component in modern neurosurgical procedures. However, existing registration methods for neurosurgical robots often rely on high-end hardware and involve prolonged or unstable registration times, limiting their applicability in dynamic and time-sensitive intraoperative settings. This paper proposes a novel fully automatic monocular-based registration and real-time tracking method. First, dedi- cated fiducials are designed, and an automatic preoperative and intraoperative detection method for these fiducials is introduced. Second, a geometric representation of the fiducials is constructed based on a 2D KD-Tree. Through a two-stage optimization process, the depth of 2D fiducials is estimated, and 2D-3D correspondences are established to achieve monocular registration. This approach enables fully automatic intraoperative registration using only a sin- gle optical camera. Finally, a six-degree-of-freedom visual servo control strategy inspired by the mass-spring-damper system is proposed. By integrating artificial potential field and admittance control, the strategy ensures real-time responsiveness and stable tracking. Experimental results demonstrate that the proposed method achieves a registra- tion time of 0.23 s per instance with an average error of 0.58 mm. Additionally, the motion performance of the control strategy has been validated. Preliminary experiments verify the effectiveness of MonoTracker in dynamic tracking scenarios. This method holds promise for enhancing the adaptability of neurosurgical robots and offers significant clinical application potential. Keywords Neurosurgical robot, Automatic detection, Monocular registration, Visual servo control posing a significant threat to global health [1, 2]. In 1 Introduction recent years, robotic-assisted surgery has entered the Neurosurgical diseases, such as cerebrovascular dis- field of neurosurgery. With its precise stereotactic posi - orders, intracranial tumors, and neurological dysfunc- tioning and stable instrument handling, it has improved tions, have extremely high mortality and disability rates, treatment outcomes and increased surgical efficiency, establishing a new paradigm for innovation in neuro- *Correspondence: surgical procedures [3, 4]. In clinical settings, the stand- Diansheng Chen [email protected] ard workflow of robot-assisted neurosurgery is typically School of Mechanical Engineering and Automation, Beihang University, structured into four sequential phases: (1) preoperative Beijing 100191, China 2 path planning based on patient-specific imaging data, Hunan Intelligent Rehabilitation Robot and Auxiliary Equipment Engineering Technology Research Center, Changsha 410004, China (2) intraoperative registration to establish the robot’s School of General Engineering, Beihang University, Beijing 100191, initial posture, (3) real-time tracking to compensate for China 4 intraoperative patient motion, and (4) surgical execution School of Astronautics, Beihang University, Beijing 100191, China Tongji Medical College, Huazhong University of Science and Technology, guided by robotic assistance. Among these, steps (2) and Wuhan 430030, China (3) remain major challenges, particularly under dynamic © The Author(s) 2025. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. Chen et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 2 of 27 intraoperative conditions. Specifically, unexpected inci - Additionally, RGB-D and NIR cameras typically have dents may cause intraoperative movement of the patient’s strict working space requirements, which can increase head [3, 5]. This necessitates a prolonged re-registration intraoperative uncertainty in complex clinical environ- and repositioning of the robot, which can compromise ments and with patient head movement. Previous studies the optimal timing of the surgical procedure. Therefore, have shown that, after intrinsic calibration and distor- developing an accurate and rapid registration method tion correction, the reprojection error of optical cameras while enabling real-time robotic posture adjustment is can be controlled within sub-pixel levels, providing a crucial for enhancing the adaptability of neurosurgical theoretical basis for achieving high-precision monocu- robots [6]. lar registration [20, 21]. Moreover, the "what-you-see-is- During the preoperative phase, imaging modalities what-you-get" nature of optical cameras not only enables such as computed tomography (CT) are employed to a larger effective working space but also offers surgeons identify the target puncture site and plan the optimal sur- a more intuitive interactive interface. Additionally, owing gical trajectory. To enable robotic execution, it is essential to their significant cost advantage, monocular camera- to establish an accurate spatial correspondence between based intraoperative registration presents a promising the preoperative plan and the intraoperative anatomical research direction. environment. This alignment is achieved through spa - Conversely, the selection of reference features is crucial tial registration, which matches the preoperative images for ensuring the stability and reliability of the real-time with the real-time physical position of the patient [7, 8]. registration process. For marker-less image-to-patient Based on the type of optical sensor, neurosurgical spatial registration methods, existing algorithms have been registration methods can be categorized into those using proven to be either too inefficient or too inaccurate [7, monocular cameras [9], depth cameras [10, 11], and near- 9, 22]. For marker-based registration methods, marker infrared cameras [12]. Meng et  al. proposed an auto- types include fiducial spheres and encoded patterns (e.g., matic registration method using a monocular camera, ArUco) [13, 19, 23, 24]. Although encoded patterns can which was applied to neurosurgical robots. This method be detected and localized in visible light images, accu- employs multi-view stereo vision to reconstruct 3D fea- rately identifying them in medical imaging remains a tures, but single registration times can reach several min- challenge. Moreover, encoded patterns are often attached utes [9]. Su et al. used an RGB-D camera to extract facial to planar holders, making it difficult to achieve spatial features of patients and completed registration through distribution in the surgical area. Their relatively large size ICP, with registration times around 3s and errors within also makes them unsuitable for dynamic intraoperative 1−2 mm [10]. Near-infrared optical navigation systems, environments. Therefore, the methods based on fiducial such as Polaris Vega VT from Northern Digital Inc., are spheres still dominate in clinical settings [13, 19]. For ® ® highly popular in clinical surgeries owing to their stable instance, in the guidelines for the StealthStation S7 marker recognition accuracy [12]. However, during sur- treatment, skin-adhesive markers are used as fiducial gery, surgeons need to manually select fiducial markers, spheres [25]. and manual registration typically takes several minutes to After completing spatial registration, the tracking con- meet surgical precision requirements [13]. In summary, trol of surgical robots is mostly completed by surgeons intraoperative registration technology has made certain via remote control [26–28]. However, this process is progress. Nonetheless, achieving precise, rapid, and reli- strenuous and inefficient for Surgeons. They should focus able intraoperative registration to compensate for patient more on specific surgical operations rather than repeti - head movement remains a significant challenge [3]. tive adjustments of the surgical robot [29]. Autonomous Recent studies have begun to explore the trade-off positioning of surgical robots to the desired posture between registration accuracy and efficiency. On one typically involves common path planning methods such hand, the choice of optical sensor fundamentally influ - as graph search methods [30], numerical methods [31], ences the type and quality of the collected data, as well and sampling-based methods [32, 33]. However, these as the size of the effective working volume [14–16]. Some methods seldom consider pose tracking [34]. The artifi - studies have achieved intraoperative real-time registra- cial potential field (APF) method is an efficient local path tion using RGB-D and near-infrared (NIR) cameras [17– planning algorithm that constructs virtual potential fields 19]. However, owing to the inherent errors of current in the robot’s operational space or joint space. It enables mainstream depth cameras (typically within 2%), real- the robot to avoid obstacles and rapidly track targets, time registration based on RGB-D cameras often fails with a simple structure and strong repeatability [35, 36]. to meet clinical accuracy requirements [17]. Addition- Hao et  al. combined APF with an original dual neural ally, NIR cameras require custom fiducial points and are network to propose an improved path planning method expensive, making them impractical for clinical use [19]. that avoids local minima and enhances path execution Chen  et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 3 of 27 precision. Experimental validation has proven its effec - a practical solution that can be readily integrated into tiveness for autonomous spine surgery [34]. However, the surgical workflow without significant overhead or this method has not been integrated with visual modules interruption. and only enables single-instance path planning. There - The remainder of this paper is organized as follows: fore, as neurosurgical robots advance towards greater Section  2.1 describes the neurosurgical robotic system intelligence and automation, designing a safe and reliable and its workflow; Section  2.2 delves into the monocular- visual servo tracking control method to accommodate based automatic registration method; Section  2.3 dis- dynamic changes remains of great significance. cusses the 6D APFF admittance for visual servo control. In summary, to improve the efficiency and safety of The experimental validation is primarily conducted in neurosurgical robots, we propose MonoTracker, a fully Section  3, with the results discussed in Section  4. Sec- automatic registration and tracking framework based tion  5 concludes the paper and provides an outlook on on a monocular camera. Unlike existing methods that future research. primarily focus on registration, MonoTracker provides a unified solution for automatic registration and real-time 2 Materials and Methods tracking. It emphasizes the robustness and accuracy of 2.1 Overall Framework the online registration process while integrating visual This study is based on a neurosurgical robot with servo-based tracking control strategies for robotic execu- enhanced autonomous capabilities, comprising three tion. The main contributions of this work are summa - main parts as shown in Figure 1(a): (A) the robotic execu- rized as follows: tion subsystem, (B) the visual navigation subsystem, and (C) other intelligent surgical subsystems that assist in (1) We propose a fully automatic monocular-based performing surgical tasks. The execution subsystem con - registration and tracking framework, which inte- sists of a UR10 robotic arm and an autonomous punctur- grates preoperative fiducial identification, intraop - ing device. The navigation subsystem can utilize various erative monocular registration, and robotic visual solutions, including monocular cameras, depth cameras, servo tracking. and NIR cameras. (2) An automatic monocular registration method is In clinical surgeries, the operational procedure of the developed, incorporating a two-stage optimization neurosurgical robot can be divided into five steps, as strategy to establish accurate 2D-3D mappings and shown in Figure  1(d): preoperative puncture path plan- recover depth information. This enables precise and ning, intraoperative patient registration, automatic posi- efficient spatial registration using only a single opti - tioning by the robot, autonomous bone drilling by the cal camera. robot, and surgical manipulation by the surgeon. (3) A novel 6D visual servo control method based on an Artificial Potential Field Force (APFF) admit - Step I: Preoperative puncture path planning. tance model is introduced, inspired by the mass- Based on imaging data such as CT scans, the robot spring-damper system. This approach ensures automatically identifies fiducials in the preoperative smooth and stable pose tracking, thereby improv- imaging space, which will be discussed in detail later. ing the safety and robustness of neurosurgical robot The surgeon determines the target location within control. the skull and plans an optimal puncture path to mini- (4) The effectiveness of the proposed methods is vali - mize iatrogenic injury to the patient [37, 38]. dated through extensive experiments, demonstrat- Step II: Intraoperative patient registration. To map ing the accuracy and efficiency of the registration the planned path to the surgical space, the navigation algorithm as well as the motion performance of the subsystem must automatically recognize fiducials tracking control strategy. and align the coordinates between the preoperative imaging space and intraoperative navigation space Compared with traditional manual registration meth- [39]. Unlike conventional surgical robots [40, 41], ods that often require several minutes and may interrupt this study considers intraoperative dynamic factors, surgical flow, MonoTracker completes registration in such as patient head repositioning during surgery. By 0.23 s, enabling near real-time feedback. Although NIR- achieving real-time registration while ensuring high based methods offer slightly faster performance (~0.1 registration accuracy, the proposed method enhances s), their limited working volume significantly reduces the adaptability of the robotic system. their adaptability in dynamic and complex surgical envi- Step III: Automatic positioning. The execution sub - ronments. MonoTracker achieves a favorable trade-off system autonomously adjusts its pose to ensure that between accuracy, efficiency, and system cost, offering the puncturing device reaches the preoperatively Chen et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 4 of 27 Figure 1 Neurosurgical robot system and surgical procedure: (a) Intraoperative robot space, (b) Preoperative image space, (c) Planned puncture path, (d) Neurosurgical procedure workflow 2.2.1 A utomatic Detection of Fiducials planned target posture. During real-time registra- During the perspective transformation process of tion, the subsystem must be capable of automatically monocular imaging, fiducials lose depth and geometric adapting its position while safely and reliably track- features. Therefore, a key challenge is to automatically ing the target posture. However, research in this area detect these fiducials preoperatively and intraoperatively, remains limited. ensuring their accurate correspondence. This paper Step IV: Autonomous bone drilling. Once the punc- designs a fiducial sphere, as illustrated in Figure  3. Its turing device reaches the target pose, it can initiate customized spherical features are easily recognizable in the bone drilling procedure. During this phase, the the 3D medical imaging space. The spherical character - robot will monitor the force feedback from the drill- istics ensure that, regardless of the intraoperative camera ing to ensure the safety and effectiveness of the sur - posture, fiducials appear as regular circles in 2D images. gery. The center of a fiducial in the medical imaging space S Step V: Surgical operation. Once the robot com- med and the center of the corresponding circle in the camera pletes drilling and establishes a surgical pathway, image space S can be established as effective corre - the surgeon can proceed with surgical procedures, cam sponding points, with a specific proof provided here. assisted by intelligent surgical subsystems. Assuming point P is the center of the spherical fidu - 3d cial, its coordinates in the camera coordinate system C In the above process, Step II and Step III are essential are (a, b, c) . Based on the pinhole imaging principle, the and form the foundation for subsequent surgical opera- imaging of the fiducial forms at the intersection of two tions. This paper will focus on researching MonoTracker, opposing cones, Cone and Cone , with the vertex at specifically targeting these pivotal aspects. ABC CDE C(0, 0, 0) . The cone Cone can be represented in C as CDE c follows: 2.2 M onocular‑based Automatic Registration Monocular cameras do not introduce additional hard- (ax + by + cz) 2 2 2 x + y + z = , (1) ware errors and provide a larger effective working space, 2 2 l − R making them more suitable for intraoperative automatic −−→ registration. This paper proposes a monocular automatic where l represents CP  . By introducing a cutting plane 3d registration method, consisting of three key steps: (1) at z = z , an elliptical plane Ellipse is generated, C 0 EF automatic fiducial detection; (2) accurate correspond - whose center is P (x , y , z ): 2d 2d 2d 2d ence estimation; and (3) depth recovery to accomplish automatic registration, as illustrated in Figure 2. Chen  et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 5 of 27 Figure 2 Monocular-based automatic registration method Figure 3 Spherical fiducial design and monocular imaging principle (ax+by+cz) bx − ay = 0, 2 2 2 x + y + z = , (3) l −R 1 (2) z = . z = z . By simultaneously solving Eqs. (2) and (3), the 3D The shortest distance from P to the line OP , 3d 2d coordinates of points E and F can be determined. At denoted as dis , can characterize the error between err this point, the center of the ellipse, P (x , y , z ) , i s 2d 2d 2d 2d the center of the fiducial in S and the circular center med given by: in S as effective corresponding points. Therefore, it is cam reasonable to assume z = 1/b . Under this assumption, the equation for the line where the endpoints of Ellipse EF are located is: Chen et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 6 of 27 1 c leverages 2D KD-Tree features in conjunction with Par- x = (x + x ) = , 2d e f  2 2 c −R ticle Swarm Optimization (PSO) to facilitate rapid and 1 ac y = (y + y ) = , 2d e f (4) 2 2 b(c −R ) stable estimation of the 2D-3D correspondence of fidu - z = . 2d cials. PSO is a global optimization algorithm suitable for solving problems where the optimal solution lies Because the line OP passes through the origin, using 2d within a multidimensional parameter space [42]. the formula for the distance from a point to a line, dis err Assuming a 3D point set in medical imaging space can be expressed as: 1 2 3 n ={P , P , P , ..., P } and a 2D point set in pixel vir vir vir vir vir 1 2 3 n 2 2 a + b space  ={P , P , P , ..., P } , the specific steps pix 2 pix pix pix pix dis = R . err (5) 2 2 2 2 2 2 of the method are described in Algorithm 1. c (a + b ) + (c − R ) Because R << c , the equation can be further simplified Algorithm 1 Fiducial correspondence estimation via 2D KD-Tree and PSO to: 2 2 l − c dis = R , err (6) cl 2 2 2 2 where l = a + b + c , and owing to factors such as the camera’s field of view, the observation distance c is typically several times larger than a and b, and R is on a different order of magnitude compared to a, b, and c, making dis generally much smaller than the tolerance err error e. Thus, this completes the proof of the fundamen - tal principles of monocular registration. For automatic detection of preoperative fiducials, owing to significant physical property differences between fiducials and patient bones or tissues, initial extraction of the fiducial model can be accomplished by setting an appropriate grayscale threshold. Considering the uncertainties associated with threshold segmenta- tion, outlier detection techniques are employed to miti- gate noise interference. Subsequently, Density-based Spatial Clustering of Applications with Noise (DBSCAN) clustering is utilized to discern the features of individual fiducials. The center P of each fiducial is determined by vir calculating the centroid of the point cloud set. In terms of intraoperative automatic detection, a dedi- cated dataset has been created to facilitate instance segmentation of fiducials under monocular camera observation. We have enhanced the state-of-the-art YOLOv8 instance segmentation model to suit our spe- cific requirements. Throughout the surgical procedure, the robot executes real-time instance segmentation on multiple fiducials, calculating the average of the pixel set for each specific fiducial. This process effectively extracts the central P of the circular fiducial. pix 2.2.2 Estimation of Correspondence Relationships Upon successful automatic detection of preopera- tive and intraoperative fiducials, a critical challenge For the 2D point set  , an unordered set of pix remains: estimating the correspondence between these points can be transformed into an ordered set two sets. This paper proposes a novel method that 1 2 n S ={P , P , ..., P } by employing the 2D KD-Tree pix pix pix pix Chen  et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 7 of 27 construction rules. At this point, geometric features thereby enhancing exploration. The inertia weight ̟ is introduced to control the exploration capability of the F ={V , V , ..., V } can be constructed, where V is 1 1 2 n−1 i particles. Thus, the correspondence between the 2D and the vector established between two adjacent points: 3D fiducial markers is successfully established. i+1 i V = P − P . i (7) pix pix 2.2.3 Estimation of C orrespondence Relationships Additionally, under the projection in the direction of Furthermore, to achieve high-precision registration, a unit vector F = (a , b , c ) , the set  is projected V V V V vir it is essential to reconstruct the depth information of onto the plane a x + b y + c z = 0 through the ori- V V V the 2D point set in S , thereby acquiring the com- pix 1 2 n gin, generating the point set S ={P , P , ..., P } . xyz xyz xyz xyz plete 3D geometric features of the fiducials in the sur - Because the vector F can be obtained by rotating the 1 2 n gical space. Let  ={Q , Q , ..., Q } denote an vir vir vir vir unit vector n (0, 0, 1) around the X and Y axes by α and ordered set of 3D points in the medical imaging space β , resp e ctively , S can be further transformed to the xyz 1 2 n and  ={Q , Q , ..., Q } denote the correspond- pix pix pix pix XOY plane and rotated around the vector n by the angle ing set of 2D points in the pixel space, where each 1 2 n γ , generating a 2D point set S ={P , P , ..., P }. xoy xoy xoy xoy i i pair (Q , Q ) forms a known 3D-2D correspond- vir pix ence. Furthermore, assume an ordered 3D point set  2 2 α = arctan(b a + c ),  V V V 1 2 n ={Q , Q , ..., Q } in C : cam c � cam cam cam β = arctan(a c ), V V (8)   i i � �  α 0 c 0 P = M(α) · M(β) · P , i x xyz vir  Q  pix   i i z = 0 β c 0 Q , i y (11) cam P = M(γ ) · P , xoy xyz 1 0 0 1 0 where M(α) , M(β) , and M(γ ) are the rotation where α , β , c , and c are the camera intrinsic parame- x y matrices for α , β , and γ . Following the construc- ters. Thus, by uniquely determining the depth distances tion rules of the 2D KD-Tree, the geometric features [z , z , ..., z ] in C , the position information for all points ′ ′ ′ 1 2 n c F ={V , V , ..., V } is constructed. There exists a 2 1 2 n−1 in  can be established. To calculate z , a loss function cam i set of variables [α, β , γ ] , such that the geometric differ - is constructed as follows: ence |F − F | → 0 . Moreover, the established point sets 1 2 S and S correspond to each other. This paper uses an pix xoy min S(z , ..., z ) 1 n z ,...,z ∈D 1 n optimization method to solve for the variables [α, β , γ ] , n−1 constructing a loss function F(α, β , γ) as follows:  (12) i i+1 i i+1 2 = (Q − Q  −Q − Q ) , vir cam cam vir min F(α, β, γ ) i=1 α∈A,β∈B,γ ∈C i+1 n−1 i i i+1 (9) where Q − Q and Q − Q represent the vir cam cam 2 vir = V − V , A, B, C = [−π, π], distances between adjacent points in the ordered point i=1 sets. When the loss function is minimized, it indicates that the geometric distances between the fiducial points when F(α, β, γ) → min , [α, β , γ ] can be determined as in C , at that depth distance are consistent with those in the optimal solution. At this point, the correspondence C . Generally, the depth information of all fiducials in the between S and S can be established from  and pix xoy vir v surgical space is correctly reconstructed. . This paper employs the PSO algorithm to solve this pix To boost the computational efficiency and success rate optimization problem. In each iteration, the particles’ of the two-stage optimization problems associated with velocities and positions are updated based on the follow- correspondence estimation and depth recovery, this ing equations: paper introduces an integrated optimization framework, t+1 t t t v = ̟ v + c r (pbest − x ) + c r (gbest − x ), 1 1 i 2 2 depicted in Figure 4. i i i i t+1 t+1 x = x + v , In this setup, the loop arrows ①, ②, ③, and ④ rep- i i i (10) resent the process where, upon determining that the t t optimization results meet the requirements, the results where v and x represent the velocity and position of the i i from the ith round of optimization are fed into the (i+1) ith particle at iteration t , resp e ctively . c and c are the 1 2 th round as the initial values for optimization variables. cognitive and social coefficients, governing the attrac - By appropriately setting the key parameters {p, m, t} and tion towards the individual best pbest and the global best f , the computational efficiency and success rate of gbest . r and r serve as stochastic weights that modu- 1 2 optimization can be significantly improved. Additionally, late the influence of personal and global best positions, Chen et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 8 of 27 Figure 4 Two-stage optimization for correspondence estimation and depth recovery 2.3 Visual Servo Control with APFF Admittance an extra PSO optimization module is implemented in the 2.3.1 Mo deling of Robotics Kinematics first stage of optimization to further improve the suc - The robotic execution subsystem primarily consists of cess rate without substantially reducing computational two components: the UR10 robotic arm and an autono- efficiency. Once the depth recovery of the point set cam mous puncture device. To achieve visual servo tracking is completed, it can be combined with the set  , and vir control of the robot, it is essential to conduct a kinematic registration can be performed using the Singular Value analysis of the robot’s operational entity. This paper Decomposition (SVD) method. Ultimately, the spatial employs the Product of Exponentials (POE) method transformation matrix T from C to C is obtained. Thus, v c for kinematic modeling of the robot, as illustrated in this section completes the introduction and theoretical Figure  5. The UR10 is a high-precision, high-load, col - derivation process of the monocular-based automatic laborative robotic arm with six degrees of freedom. registration method. The autonomous puncture device, a custom-developed electromechanical system, incorporates two degrees of Figure 5 Kinematic modeling of the robotic execution subsystem based on the POE method Chen  et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 9 of 27 freedom-linear feed and rotational cutting. Consider- u Th s, through the POE method, the kinematic mod - ing that the autonomous puncture device does not move eling and analysis of the robotic execution subsystem are during the robot’s registration tracking process, this completed. paper focuses on kinematic modeling of the six motion joints of the robotic arm. 2.3.2 V isual Servo Control Herein, ω represents the unit vector of the joint rota- APF method is an efficient local path planning algorithm, tion axis, r denotes a point on the axis, and v signifies i i yet it frequently encounters local minima issues and is the linear component in the spatial screw, and it is estab- typically applied to end-effector position constraints, lished that: complicating the concurrent consideration of its attitude. v =−ω × r = rˆ ω , i i i i i Utilizing this method to ensure safe and efficient robotic (13) ξ = (ω ; v ), i i i tracking during surgical procedures presents a consider- able challenge. To address this, the paper introduces a where ξ determines the unit spatial screw axis for joint i. 6D APFF admittance-based visual servo control method, By combining ξ with the exponential map, the corre- enhancing tracking efficiency while improving safety and sponding spatial transformation matrix can be obtained. reliability. u Th s, the kinematic model of the robot is established as Initially, potential field force modeling is conducted for follows: the surgical environment. Within this environment, the desired location exerts an attractive force on the robot’s ˆ ˆ ˆ ˆ ˆ ˆ t t ξ θ ξ θ ξ θ ξ θ ξ θ ξ θ t 1 1 2 2 3 3 4 4 5 5 6 6 T = h (θ) = e e e e e e h (0), b b b end-effector, whereas obstacles generate repulsive forces. (14) The attractive potential field integrates quadratic and coni - where T is the transformation matrix from the robot’s b cal potential fields, with the attractive force F (q) being att,i (17) base coordinate system C to the tool coordinate system defined as:  � � � �  −ζ (o (q) − o (q )), if o (q) − o (q ) ≤ d, � � i i i i i f f � � F (q) = o (q)−o (q ) att ,i i i � � � �  −dζ , if o (q) − o (q ) > d. i � i i �  � � �o (q)−o (q )� i i In the ith iteration, ζ represents the coefficient of the t t C , h (0) is the initial pose of T , ξ is the skew-symmet- b b attractive potential field, with d serving as the distance ric matrix of ξ , and θ − θ are the joint angles rotating 1 6 threshold where the potential field shifts from conical to about ξ − ξ , pre-set according to the actual safe work- 1 6 parabolic. The terms o (q) and o (q ) denote the robot’s i i S f space. Additionally, the spatial Jacobian matrix J (θ ) , current and desired positions during the ith iteration. For which converts the robot joint angular velocities into the the repulsive potential field, the repulsion becomes infinite end-effector space velocity, can be written as: as the robot’s position approaches a boundary. Conversely,   the repulsive force diminishes to zero once the robot is   ˙ beyond a certain threshold distance. The repulsive force S S ′ ′ ′ 2   V = J (θ)θ = (ξ , ξ , ..., ξ ) , (15) 1 2 6   F (q) is given as follows: ... rep,i 1 1 1 n ( − ) �ρ , if ρ ≤ ρ , i i i 0 ρ ρ i 0 ρ F (q) = i (18) rep,i 0, if ρ >ρ , i 0 S ′ ′ ′ J (θ) = (ξ , ξ , ..., ξ ),  1 2 6 ξ = Ad ξ , (Ex) i i (16) where ρ is the shortest distance from the robot’s posi-  ˆ ˆ ˆ θ ξ θ ξ θ ξ 1 1 2 2 i−1 i−1 tion at o (q) to any obstacle, and �ρ is the gradient Ex = e e ...e . i i of the distance field. n and ρ represent the repulsive i 0 potential field coefficient and the distance threshold for F (q) . Particularly, the patient’s head acts as an obsta- rep,i cle in the robot tracking process and is a critical safety consideration. By utilizing fiducials, a spherical surface is constructed through the random sample consen- sus to envelop the patient’s head. This sphere is consid - ered a typical intraoperative obstacle, exerting a virtual Chen et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 10 of 27 Figure 6 Visual servo control based on 6D APFF admittance repulsive force on the robot’s end-effector. With this, the puncture needle, is obtained. The calculation of F is mid,i modeling of APF in the robot’s workspace is completed. shown in Eq. (21). To address the challenges of applying the APF method to B B B F = F + F . path planning, this paper first introduces a 6D APFF admit - (21) mid,i top,i rem,i tance-based method, inspired by the mass-spring-damping The tracking process under the guidance of APF can be system, as illustrated in Figure 6. The end-effector needle is modeled as a mass-spring-damping system. Here, the vir- modeled as a line segment defined by the tip point P , the top tual mass of the puncture needle is denoted as M . The remote point P , and the midpoint P . rem mid difference S between the target pose Tar and the In the spatial coordinate system C , the force screws i S mid,i S S current pose Pose of the puncture needle forms a vir- F and F represent the forces acting on P and top top,i rem,i tual spring, with a spring constant K . Finally, the velocity P under the APF, respectively. It is assumed that C is rem S damping coefficient of the system is denoted as B: aligned with C .  � � B B B S S (22) MV + BV + KS = F . r × f mid,i mid,i mid,i mid,i S top,i top,i F = , top,i S top,i B In this model, V and V represent the veloc- mid,i mid,i (19) � �  S S  ity screw and the acceleration screw, respectively, of the r × f  S rem,i rem,i F = , rem,i S midpoint P on the puncture needle. Under the action f mid rem,i of F , the mass term M generates acceleration for mid,i S S where r and r denote the 3D coordinates of the movement of the puncture needle, addressing the top,i rem,i S S P and P in C . Corresp ondingly , f and f discontinuity in acceleration often seen in traditional top rem S top,i rem,i represent the forces acting on P and P under the APF methods where APF forces are directly mapped top rem APF. Additionally, in the body coordinate system C , it is to tracking velocities. The damping coefficient B helps established that: prevent excessively high tracking velocities, whereas the spring constant K serves an inspirational role, guid- B T S F = Ad F , top,i T top,i SB ing the puncture needle to escape from local minima. It (20) B T S is important to note that because motion screws cannot F = Ad F . rem,i T rem,i SB be directly subtracted, the transformation matrix from B B The adjoint transformation matrix Ad from C to C S B Pose (T ) → Tar (T ) is the exponential map of the SB i i i tar S S is used to describe the 6D force screws F and F motion screw S . This involves the following relation: top,i rem,i mid,i in C , enabling their transformation into the force screws B −1 B B B mid,i (23) F and F in C . Ultimately, the 6D force screw (T ) T = e . B i tar top,i rem,i F , representing the force of the APF acting on the mid,i The process of solving for S is denoted by mid,i Tar ≃ Pose . Finally, the iterative solution is obtained i i Chen  et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 11 of 27 through numerical integration. Assuming the control the target pose is primarily governed by attractive forces. period is ∆t, the following relation holds: Therefore, F can be reformulated as: mid,i  � � B B −1 B B B  ˙ F = K S , V = M F − BV − KS , e (26)  mid,i mid,i mid,i mid,i mid,i mid,i  B B V = V + V × �t, mid,i+1 mid,i mid,i where K is the equivalent stiffness of APF forces, and it Pose = Pose + V × �t, B B i+1 i  mid,i+1 is independent of S . Consequently, when S ≈ 0 , mid,i mid,i  B S = Tar ≃ Pose ,  i+1 i+1 mid,i+1 Eq. (22) can be rewritten as: F = APF(Pose ). i+1 mid,i+1 B K + K B B ¨ ˙ (24) S + S + S = 0. (27) mid,i mid,i mid,i M M We have also introduced the concept of multi-segment B In this case, the natural frequency ω and damping variable stiffness. When S < ε , as the tracking is mid,i ratio ζ can be defined as: about to be completed, the stiffness K is set to a higher value, whereas the potential field force is set to zero. This ω = (K + K ) M, n e approach can further accelerate convergence and reduce  (28) ζ = B 2 M(K + K ). oscillation. In this context, APF(∗) denotes the APF force function. During the iteration process, the output veloc- where ω and ζ are key determinants of the dynamic ity screw V is mapped to the robotic joint angular mid,i+1 characteristics of a second-order system. To further velocities θ through the velocity Jacobian matrix J : exe B { } analyze the influence of parameters M ,B,K ,K on the controller performance, the sensitivity is defined as −1 B θ = (J ) V exe B (25) mid,i+1 follows: Ultimately, θ is executed by the robot’s lower-level exe ∂f x S = · . controller, and the entire robotic tracking process is illus- x (29) ∂x f trated in Figure  7. Thus, the design and theoretical deri - vation of the visual servo control method based on 6D By jointly analyzing Eq. (28) and Eq. (29), the sensitivity APFF admittance are completed. of ω and ζ to {M ,B,K ,K } can be obtained, as listed in n e Table 1. 2.3.3 Parametric Sensitivity Analyses The proposed controller constitutes a typical discrete- time, nonlinear closed-loop system, in which the param- Table 1 Sensitivity of dynamic indices to controller parameters eters M , B , and K play a critical role in determining the Indice M B K control performance of the robot. To support the rational selection and tuning of these parameters, a sensitivity ω −0.5 0 +0.5 +0.5 analysis is performed to theoretically assess their influ - ζ −0.5 +1 −0.5 −0.5 ence on the system’s dynamic behavior. The tracking of Figure 7 Visual servo control process flow diagram Chen et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 12 of 27 robustness of the proposed algorithm, several distrac- tor markers were also randomly attached to the head phantom. The placement of the customized fiducials followed three key principles: (1) Randomness: Fidu- cials should be placed without fixed patterns to avoid location-specific biases and to test the generalizability of the algorithm. (2) Dispersion: Fiducials should be spatially distributed to prevent self-occlusion caused by overly compact arrangements, ensuring visibility robustness. (3) Visibility: The placement ensured that all fiducials remained within the camera’s field of view, reflecting perceptual completeness. Figure 8 Overview of the visual servoing experimental setup Figure 9 Design and placement of fiducials 3 Experiments and Results In this section, preliminary experiments were conducted, primarily including: automatic detection of fiducials, monocular-based registration, and visual servo control with 6D APFF admittance. The overall experimental plat - form consists of a robotic execution subsystem, a punc- ture device, and a vision navigation subsystem built with an HD video camera of NDI Polaris Vega VT, as shown in Figure  8. NDI Polaris Vega VT can achieve a meas- urement accuracy of up to 0.12 mm within the effective workspace. It is widely used in clinical surgeries with surgical robots [43, 44]. The host PC is equipped with an Intel Core i9-12900H CPU at 2.5 GHz, offering high-per - formance processing capabilities. It has 32 GB of RAM and is fitted with an Nvidia GeForce RTX 3060 GPU, suitable for handling graphics-intensive tasks. Moreover, the accuracy verification of monocular registration is car - ried out by the IR sensors of NDI. 3.1 A utomatic Fiducial Detection Experiments Figure 10 Comparison between manual and automatic fiducial The customized fiducials and their physical place - annotation in the preoperative stage ment are illustrated in Figure 9. To further evaluate the Chen  et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 13 of 27 Table 2 Quantitative analysis of preoperative fiducial detection i Example 1 Example 2 X (mm) Y (mm) Z (mm) X (mm) Y (mm) Z (mm) d (mm) X (mm) Y (mm) Z (mm) X  (mm) Y  (mm) Z  (mm) d (mm) err err err err err err 1 −80.71 107.84 493.16 0.19 0.20 0.08 0.29 4.15 174.37 517.68 0.14 0.10 0.32 0.36 2 −81.00 107.49 493.44 0.10 0.15 0.20 0.27 3.98 174.01 517.89 0.03 0.26 0.11 0.29 3 −80.68 107.32 493.29 0.22 0.32 0.05 0.39 4.05 174.37 518.24 0.04 0.10 0.24 0.26 4 −80.96 107.81 492.96 0.06 0.17 0.28 0.33 3.92 174.37 517.89 0.09 0.10 0.11 0.17 5 −81.19 107.81 492.98 0.29 0.17 0.26 0.43 3.99 174.37 518.10 0.02 0.10 0.10 0.14 6 −80.71 107.72 493.47 0.19 0.08 0.23 0.31 4.16 174.28 518.11 0.15 0.01 0.11 0.18 7 −81.03 107.48 493.36 0.13 0.16 0.12 0.24 3.84 174.13 518.09 0.17 0.14 0.09 0.24 Mean −80.89 107.64 493.24 0.17 0.18 0.17 0.32 4.01 174.27 518.00 0.09 0.12 0.15 0.24 Auto −80.80 107.68 493.32 0.09 0.04 0.08 0.13 4.05 174.31 518.06 0.04 0.04 0.06 0.08 Chen et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 14 of 27 The framework proposed in this paper implements extraction of all pixel points corresponding to the fidu - an efficient detection of fiducials in S preoperatively, cial spheres in the image coordinate system. By calculat- med using a combination of thresholding and clustering ing the mean coordinates of these pixels, the center of methods. To validate the performance, the experimental each fiducial sphere can be determined, thereby complet - setup is as follows: Seven testers, after learning the basic ing the automatic detection process. For this study, the operations and marking methods of the Slicer software, baseline model used is YOLOv8x-seg (a variant within manually annotated an example containing nine fiducials. YOLOv8, specifically designed for instance segmentation The results and time cost for marking were recorded, tasks). During the training process, the evaluation results with each tester denoted as User-i. The proposed detec - on the validation set are shown in Figure 11(a). The focus tion method also performed the detection automatically, is particularly on the mask (M) accuracy and recall rate, recording the results and the time cost. The results are both of which eventually approach 1 with increasing iter- shown in Figure 10. ations, indicating that the model’s performance meets the Considering the clarity of the image, two marking requirements. Additionally, as shown in Figure 11(b), the results are provided. Black spheres represent the mark- model exhibits stable segmentation performance in pre- ings made by the testers, red spheres represent the mean dicting fiducials, even in the presence of distractor mark - of these black sphere markings, and blue spheres indicate ers. For videos with a resolution of 1280x720, the model the results of automatic detection. The red sphere, rep - processes at about 10 frames/s. resenting the mean position of all manual annotations, reflects the consensus region among annotators and can 3.2 Monocular‑based Registration Experiments be assumed as an ideal reference location. The dispersed This section primarily aims to validate the accuracy and distribution of black spheres illustrates inconsistencies efficiency of the monocular registration method. First, in manual labeling, highlighting the presence of annota- the accuracy of the fiducial localization during the reg - tion uncertainty and potential errors. The automatically istration process is evaluated. The experimental setup detected point is found to be in close proximity to the is as follows: (1) Using a single optical camera based on mean annotation, qualitatively demonstrating the reli- NDI, the 3D spatial coordinates of the fiducials in the ability and effectiveness of the proposed detection algo - surgical environment are estimated through correspond- rithm. A quantitative analysis is further performed, as ing estimation and depth recovery, and aligned to the listed in Table 2. NDI’s infrared coordinate system C ; (2) An NDI posi- In this context, X , Y , and Z represent the devi- tioning probe is employed to directly determine the 3D err err err ations of manually annotated points from the mean coordinates of the fiducials in C ; (3) Precision validation point along the X, Y, and Z axes, respectively, whereas experiments are conducted under six different positional d denotes the Euclidean distance between them. It is relationships between the cranial models and NDI. To observed that the manual annotation errors range from evaluate the impact of fiducial quantity on the method, 0.14 mm to 0.43 mm, further indicating the inherent a sensitivity analysis experiment is conducted; (4) Five uncertainty associated with manual labeling. Compared fiducials are used for registration, whereas the remaining to the average manual annotation errors (0.32 mm and fiducials are utilized to evaluate the registration accuracy; 0.24 mm), the proposed method achieves significantly (5) The registration efficiency is assessed under the same lower average errors of 0.13 mm and 0.08 mm, cor- hardware conditions. i i ∗ ∗ ∗ responding to reductions of 59.4% and 66.7%, thereby Suppose p (x , y , z ) and p (x , y , z ) represent i i i mon ndi i i i demonstrating a substantial improvement in localization the ith pair of points obtained from (1) and (2) in a single accuracy. Additionally, a comparison of time consump- experiment. The localization accuracy of the proposed tion is listed in Table 3. method is evaluated using the Euclidean Distance Error: The time taken by different testers varied significantly, i i with the fastest at 9.3 min and the slowest at 14.6 min, d = p − p . (30) mon ndi averaging 1.4 min per fiducial. Conversely, the automatic i=1 detection of nine fiducials took a total of only 0.1 s. In Additionally, the registration accuracy in the validation summary, the proposed automatic detection method has experiments should be evaluated with the Target Regis- clearly reached human-level accuracy and is more effi - tration Error (TRE). Suppose the verification fiducial on cient, saving a significant amount of valuable time in the the image is denoted as p , and the corresponding fidu - treatment of acute patients. pix The intraoperative automatic detection of fiducial cial in the surgical space obtained by NDI is p : ndi markers is primarily achieved through instance segmen- tation. Specifically, instance segmentation enables the Chen  et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 15 of 27 Table 3 Preoperative fiducial marking time consumption i 1 2 3 4 5 6 7 Auto Time 13.9 min 14.6 min 12.8 min 9.3 min 12.5 min 11.9 min 14.5 min <0.1 s provides richer spatial constraints during optimiza- ndi c i i TRE = T T p − p . (31) c p pix ndi tion, leading to more reliable alignment outcomes. It i=1 is worth noting that in the Test_2 scenario, the method ndi c still achieved high localization accuracy even with a Here, T and T represent the spatial transformation c p limited number of fiducials. This should be considered matrices from C to C and from the pixel coordinate sys- c i a case-specific anomaly rather than a generalizable tem to C . m denotes the number of validation fiducials. outcome. Therefore, to ensure consistent and reliable The Fiducial Registration Error (FRE) represents the reg - performance, it is recommended to utilize a greater istration error of the point set involved in the registration number of fiducials. after the registration is completed, and its calculation for- The results of validation experiments for the registra - mula is consistent with that of the TRE. The experimental tion are shown in Figure 14 and Table 4. results validating the spatial localization accuracy of the As shown in the figure, each of the validation experi - proposed method are shown in Figure 12. ments involved five registration fiducials and four valida - As depicted in the accompanying figure, each of tion fiducials. It can be observed that the mean source the six validation experiments for localization accu- registration error across the six experiments is 0.347 racy involved nine fiducials. It can be observed that, mm, indicating that the registration process was suc- excluding outliers, the X, Y, and Z coordinate errors cessfully completed. The average errors at the validation of the majority in each experiment are less than 1 mm. reference points were 0.74, 0.35, 0.56, 0.69, 0.67, and 0.48 Additionally, the mean and median errors of d are also mm, respectively, with a mean error of 0.581 mm and a below 1 mm. It is important to note that the fiducials median of 0.571 mm. These results meet the expected used in this experiment are spherical, requiring the use clinical registration requirement, which specifies that the of an NDI positioning probe to collect multiple surface median registration error should be within 1 mm. The points of the spherical reference, and the sphere center minimum average error was 0.282 mm, the maximum is obtained through sphere fitting. This may introduce was 0.902 mm, and the standard deviation was 0.245 some errors. Therefore, we conclude that the spatial mm, further demonstrating the stability of the proposed localization accuracy of the proposed method meets method. The results of time consumption experiments the requirements for clinical surgery. are shown in Figure 15. We evaluated the localization performance under It can be observed that the average time taken from varying fiducial configurations, specifically examining fiducial detection to the completion of spatial registra - the effects of fiducial quantity and spatial arrangement. tion across the six experiments is 0.228 s, with minimal The experimental design involved three test scenarios fluctuation. Among the processes, fiducial detection in which fiducials were progressively occluded, as illus - takes the longest time, averaging 0.132 s, which accounts trated in Figure  13(a). Notably, the specific markers for approximately 58% of the total time. This is because subjected to occlusion differed across scenarios, ena - higher-resolution images were used to improve the bling a preliminary assessment of how fiducial place - localization accuracy of the fiducials on the 2D image. ment influences localization accuracy. The results are Next, depth recovery and correspondence estimation, presented in Figure 13(b). both solved using optimized methods, took an average As illustrated in Figure  13(b), when the number of of only 0.058 and 0.033 s, respectively, demonstrating fiducials exceeds 4, the localization error consistently good real-time performance. Finally, the time for spatial remains below 1 mm. Additionally, the small vari- registration was negligible. In summary, the proposed ance observed in these cases indicates the stability of monocular registration algorithm satisfies the real-time the proposed method. However, when the number of requirements of clinical surgical robot registration and fiducials falls below 4, the localization accuracy pro - tracking. gressively deteriorates, accompanied by an increase in variance, suggesting a decline in the method’s robust- ness. This phenomenon can be interpreted from a feature-based perspective: a larger number of fiducials Chen et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 16 of 27 (a) (b) Figure 11 Intraoperative automatic fiducial detection: (a) Visualization of key metrics during the training process of YOLOv8x-seg. The plots include the precision, recall, and mean average precision (mAP) for object detection (B) and instance segmentation (M); (b) Visualization of the predicted results of fiducial markers by YOLOv8x-seg Chen  et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 17 of 27 3.3 Visual Servo Control Experiments Following the completion of the registration experiment, a tracking control experiment was conducted using the obtained registration results. In this experiment, the operator held a patient head model with attached fidu - cials, simulating various movements to replicate ran- dom intraoperative disturbances of the patient’s head. To evaluate our method’s performance, three methods were compared: the APF-velocity-based visual servo con- trol method (APF-V), the APF-acceleration-based visual servo control method without multi-segment variable stiffness (APF-A-I), and the method proposed in this paper, named APF-A-II. In Section  2.3.3, the sensitivity of the controller’s dynamic performance to the control parameters has been Figure 12 Validation experiments of the localization accuracy derived. Furthermore, based on relevant studies [45, 46] (a) (b) Figure 13 Impact of fiducial quantity on performance: (a) Experiment setup, (b) Localization error under different fiducial configurations Chen et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 18 of 27 Figure 14 Validation experiments of the registration accuracy Table 4 Registration errors Exp FRE (mm) TRE (mm) TRE TRE TRE (mm) Time (s) min max std (mm) (mm) 1 0.413 0.336 1.088 0.320 0.736 0.222 2 0.294 0.202 0.510 0.109 0.355 0.240 3 0.471 0.261 0.846 0.209 0.559 0.224 4 0.228 0.439 0.832 0.152 0.692 0.225 5 0.273 0.194 1.491 0.524 0.666 0.216 6 0.404 0.258 0.646 0.156 0.482 0.244 Mean 0.347 0.282 0.902 0.245 0.581 0.228 M = diag(0.05, 0.05, 0.05, 0.5, 0.5, 0.5), B = diag(3, 3, 3, 30, 30, 30), (32) K = diag(10, 10, 10, 10, 10, 10), K = diag(120, 120, 120, 300, 300, 300), where K and K represent the front and rear stages of 1 2 the multi-segment variable stiffness, respectively. To fur - ther evaluate the effectiveness of the set parameters, a simulation experiment was conducted, as shown in Fig- ure 16. In the experiment, M , B , and K were respectively set to {0.1, 1, 10}, {0.75, 1, 1.25}, and {0.1, 1, 10} relative to their set values for comparison. It is evident that the selected values of B and K achieve Figure 15 Time consumption of complete registration pipeline a more favorable trade-off between overshoot and response time. Notably, reducing the value of M gener- ally enhances the system’s dynamic performance, par- and with appropriate adjustments, the controller param- ticularly in terms of transient behavior. Nevertheless, this eter is set as: enhancement is typically accompanied by abrupt velocity Chen  et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 19 of 27 Figure 16 Sensitivity analysis of M , B , and K variations (the increased slope of x(t) ). In the context of speed may lead to fiducial blur, thereby degrading recog - surgical robotics, such rapid transitions may introduce nition performance. This resembles clinical conditions, potential safety hazards. The introducing M ensures where patient movement during surgery is typically slow velocity continuity, thereby alleviating this issue and fur- and of small amplitude owing to the effects of anesthesia. ther validating the stability of the proposed model. The experimental procedure is illustrated in Figure 17. A 16-second head motion experiment was conducted, At each second, the head motion status, camera recog- during which the head phantom was manually moved by nition results, and the corresponding tracking response the operator across four distinct positions to qualitatively were recorded. As illustrated in the figure, MonoTracker validate the effectiveness of the proposed method. In this consistently detected the fiducial markers even dur - setup, the head motion was intentionally kept relatively ing head movement and successfully completed spatial slow to mimic realistic patient movements. Excessive registration. Once the preoperative planning pose was Chen et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 20 of 27 Figure 17 Visual servoing tracking process with APF-A-II transformed into the robot’s coordinate frame, the sur- under complex conditions. The tracking process is illus - gical robot commenced tracking based on the proposed trated in Figure  18. To verify the advantages of the pro- method. The results demonstrate that the robot main - posed method, a comparative analysis is conducted from tained good real-time tracking performance. four perspectives: task reachability, motion smooth- The head movement disturbance was set at three lev - ness, environmental adaptability, and real-time tracking els: low, medium, and high speeds, to comprehensively performance. analyze the tracking performance of the three methods Chen  et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 21 of 27 Figure 18 Head tracking experiments with three methods (1) Task Reachability: As shown in Figure  18, the resulting in higher tracking accuracy compared to first column illustrates the desired pose of the track - APF-V. Additionally, the smoothness of the motion ing target, whereas the second, third, and fourth col- trajectory is enhanced, particularly in reducing sud- umns display the tracking motion trajectories of the den changes in acceleration as tracking approaches three algorithms under low, medium, and high-speed completion. This improvement is mainly due to movements, respectively. It can be observed that all mapping the potential field force to acceleration, three methods ultimately manage to track the target. making the robot’s motion more controllable. By However, APF-V and APF-A-I exhibit some oscilla- comparing the motion trajectories in Figure 18 and tory adjustments near the end of the tracking pro- the motion speeds in Figure  21, it can be observed cess. The proposed method, APF-A-II, achieves rapid that APF-A-II further enhances trajectory tracking convergence. This is primarily due to the approach accuracy and motion smoothness. Even when the of converting pose errors into 6D motion twists, desired trajectory changes abruptly, the smoothness which enables quick target tracking without causing of the robot’s tracking motion is still maintained, oscillations typically caused by tracking overshoot. greatly improving the safety of surgical robots in Additionally, the tracking trajectory of the proposed clinical applications. method shows the highest similarity to the desired (3) Environmental Adaptability: Considering the trajectory, indicating superior tracking accuracy dur- uncertainty of patient head movement during clinical ing the visual servo process. This is particularly sig - surgery, the robot must be capable of high-precision nificant for surgical robots, where high tracking pre - tracking at various speeds while maintaining smooth cision is crucial. motion trajectories. However, under different target (2) Motion Smoothness: As shown in Figures  19, movement speeds, owing to the limitations of the controller’s hyperparameters, it is difficult to bal 20, and 21, APF-A-I introduces a guiding term, - Chen et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 22 of 27 Figure 19 Motion performance analysis of APF-V ance high-precision tracking with trajectory smooth- ness. For APF-A-II, the introduction of the multi- segment variable stiffness method enables rapid 4 Discussion tracking and obstacle avoidance using 6D APF forces This paper proposes a fully automatic registration and when the tracking error exceeds a threshold, with tracking control framework based on a monocular cam- low stiffness for heuristic guidance. When the track - era. First, custom fiducials are designed, and automatic ing error is below the threshold, APF forces are no detection methods are developed to facilitate the auto- longer needed, and rapid tracking of the desired pose matic extraction of fiducials in C and C . By constructing v c is achieved through virtual spring forces under high 2D KD-Tree features and utilizing a two-stage optimiza- stiffness. tion method, the 2D-3D correspondence estimation and (4) Tracking Real‑ Time Performance: As shown in depth recovery are accomplished, followed by registra- Figure  22, the average computation times per itera- tion through SVD. The mass-spring-damping system tion for APF-V, APF-A-I, and APF-A-II are 0.0389, model is improved, and a visual servo control strategy 0.0388, and 0.0393 s, respectively. Therefore, the based on 6D APFF admittance is designed. Integrating improvements introduced in the proposed method the outputs of monocular registration, the robot’s pose is only slightly increase the computation time, without adjusted in real time, achieving dynamic tracking of the adding significant overhead. This ensures that the head movement. The experiments confirm the accuracy, real-time performance of the tracking motion control efficiency, and motion performance. is maintained. Chen  et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 23 of 27 Figure 20 Motion performance analysis of APF-A-I As listed in Table  5, this study compares the pro- several minutes. To our knowledge, there has been no posed registration method with some advanced meth- prior report of using a single RGB camera to achieve ods. Here, the variables primarily include the type intraoperative registration in seconds for neurosurgi- of surgery, whether markers are used, and the type of cal robotics. NIR-based registration offers the highest visual sensor. The method based on fiducial mark - accuracy, but owing to the costly markers and limited ers is abbreviated as MB (Marker-Based), whereas the workspace, its clinical application is subject to more method without fiducial markers is abbreviated as ML stringent conditions. The proposed method, which is (Marker-Less). Clearly, registration methods based based on monocular, balances registration accuracy on markers mostly achieve an accuracy within 1 mm. and efficiency and reduces hardware costs and system However, those marker-less methods often have errors complexity, holds significant clinical importance. greater than 2.5 mm, which does not meet the clinical However, because MonoTracker relies on geomet- requirements of neurosurgical operations. Therefore, ric features of fiducials for correspondence estimation, registration methods based on markers will continue its registration may fail when fiducials are occluded. In to be widely used in clinics for a considerable length fact, the information loss problem is an inherent draw- of time. In terms of visual sensor types, RGB-D cam- back in mono-modality surgical navigation systems eras are extensively used by marker-less methods, but and a common challenge faced by researchers [49]. their accuracy has reached a plateau. Meng et  al. [9] MonoTracker identifies the number of fiducials to detect first achieved intraoperative registration using a sin - potential occlusions and alerts the surgeon accordingly, gle RGB camera. However, owing to the use of multi- thereby ensuring surgical safety. Recently, several stud- view 3D reconstruction, a single registration can take ies have explored the methods of multi-sensor fusion Chen et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 24 of 27 Figure 21 Motion performance analysis of APF-A-II [50], robot-assisted tracking [43], and deep learning planning techniques based on APF by representing techniques [51] to alleviate information loss. Heunis the target pose as a 6D force acting on the end of the et al. employed eight infrared cameras in a large capture robotic arm. Combined with a mass-spring-damping solution to avoid dynamic obstacles, which may lead to model, a 6D APFF visual servo controller is designed, substantial costs [50]. Conversely, monocular cameras ultimately achieving good motion performance. In this are lower-cost, more compact, and offer a wider field of paper, the key parameters M , B , and K are determined view, making them more advantageous for multi-device empirically to suit general tasks for the surgical robot deployment in surgical environments. This provides system. However, owing to space limitations, the paper a feasible solution for MonoTracker to address occlu- does not delve into complex clinical environments with sion issues in clinical settings. Moreover, incorporating dynamic obstacles. Essentially, the nature of this visual temporal information by performing feature matching servo control method is to map potential field forces to between consecutive frames enables the identification of accelerations, enhancing the controller’s performance occluded fiducials, thereby providing an effective solu - in terms of velocity smoothness. Therefore, this method tion to the problem of partial occlusion. This also repre - can be generalized and applied to similar visual track- sents a promising direction for future research. ing tasks. In the field of visual servo control, the paper pre - In conclusion, this work, aimed at tracking the patient’s sents a tracking control method based on APFF admit- head during neurosurgical operations, has designed a tance. This method improves upon traditional path monocular-based automatic registration and tracking Chen  et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 25 of 27 Figure 22 Comparison of computation time for three methods Table 5 Comparison of advanced registration methods Ref. Procedure Sensor TRE (mm) Time (s) Note [20] Brain NIR 0.7 0.1 MB [47] Brain NIR 0.45 − MB [9] Brain RGB 1.39 >40 MB [48] Orthopedic RGB-D 2.74 0.2 ML [11] Brain RGB-D <3 <10 ML [12] Spine RGB-D 2.7 1.5 ML Ours Brain RGB 0.58 0.23 MB control framework, and has conducted preliminary work integrating monocular-based registration and exploration and validation theoretically and experimen- visual servo control. tally. Although some issues remain unresolved, it holds (2) The proposed registration module employs a 2D tremendous potential for future applications: (1) Com- KD-Tree-based feature extraction method and a pleting dynamic tracking of the patient’s head intraopera- two-stage optimization strategy to establish robust tively enhances the adaptability of neurosurgical robots 2D-3D correspondences and recover monocular and reduces the workload for surgeons. (2) The fully depth information. The transformation is estimated automatic real-time registration method lays the ground- via SVD, enabling efficient and accurate alignment. work for deeper integration of Mixed Reality (MR) and (3) A 6D APFF, inspired by the mass-spring-damper surgical robots. (3) The visual servo control method pro - system, is developed to ensure compliant motion posed provides a new approach for visual tracking tasks and stability during tracking. The strategy enhances with similar requirements. system responsiveness while maintaining trajectory smoothness. 5 Conclusions (4) Experimental validation demonstrates that the proposed registration method achieves clinically acceptable accuracy and computational efficiency. (1) To address the challenge of puncture planning The visual servo controller provides stable and deviation induced by intraoperative patient head smooth tracking performance, reducing the cogni- movement in neurosurgical procedures, this study tive and operational load on the surgeon. presents MonoTracker, a fully automated frame- Chen et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 26 of 27 [7] K Machetanz, F Grimm, S Wang, et al. Patient-to-robot registration: The (5) The MonoTracker framework shows strong fate of robot-assisted stereotaxy. The International Journal of Medical potential for application in various surgical robot Robotics and Computer Assisted Surgery, 2021, 17(5): 2288. navigation scenarios. Future work will focus on [8] A Fomenko, D Serletis. Robotic stereotaxy in cranial neurosurgery: A qualitative systematic review. Neurosurgery, 2018, 83(4): 642–650. addressing fiducial loss in the camera’s field of view [9] F Meng, F Zhai, B Zeng, et al. An automatic markerless registration through multi-view monocular tracking strate- method for neurosurgical robotics based on an optical camera. Inter- gies. Additionally, the integration of this method national Journal of Computer Assisted Radiology and Surgery, 2018, 13: 253–265. into Augmented Reality (AR)/Virtual Reality (VR)- [10] Y Su, Y Sun, M Hosny, et al. Facial landmark-guided surface matching for based surgical navigation systems will be explored image-to-patient registration with an RGB-D camera. The International to enhance surgical immersion and human-robot Journal of Medical Robotics and Computer Assisted Surgery, 2022, 18(3): interaction. [11] F Liebmann, M Atzigen, D Stütz, et al. Automatic registration with con- tinuous pose updates for marker-less surgical navigation in spine surgery. Medical Image Analysis, 2024, 91: 103027. Acknowledgements [12] G Fattori, A J Lomax, D C Weber, et al. Technical assessment of the NDI Not applicable. Polaris Vega optical tracking system. Radiation Oncology, 2021, 16(1): 1–4. [13] A I Omara, M Wang, Y F Fan, et al. Anatomical landmarks for point-match- Authors’ Contributions ing registration in image-guided neurosurgery. The International Journal Kai Chen was in charge of the idea conception, algorithm implementation, of Medical Robotics and Computer Assisted Surgery, 2014, 10(1): 55–64. data collection, and manuscript writing. Diansheng Chen was in charge of [14] G A Puerto-Souza, J A Cadeddu, G L Mariottini. Toward long-term and the idea conception, study oversight/supervision, manuscript review. Ruijie accurate augmented-reality for monocular endoscopic videos. IEEE Zhang was in charge of the algorithm implementation, data collection, manu- Transactions on Biomedical Engineering, 2014, 61(10): 2609–2620. script editing. Cai Meng was in charge of the study oversight/supervision, [15] F Liebmann, D Stütz, D Suter, et al. Spinedepth: A multi-modal data col- manuscript editing and review. Zhouping Tang was in charge of the study lection approach for automatic labelling and intraoperative spinal shape oversight/supervision, manuscript review. All authors read and approved the reconstruction based on RGB-D data. Journal of Imaging, 2021, 7(9): 164. final manuscript. [16] X S Hu, N Wagley, A T Rioboo, et al. Photogrammetry-based stereoscopic optode registration method for functional near-infrared spectroscopy. Funding Journal of Biomedical Optics, 2020, 25(9): 095001. Supported by National Natural Science Foundation of China (Grant No. [17] S Kim, H An, M Song, et al. Automated marker-less patient-to-preopera- 92148206). tive medical image registration approach using RGB-D images and facial landmarks for potential use in computer-aided surgical navigation of the Data Availability paranasal sinus. Proceedings of the Computer Graphics International Confer- The datasets used and analysed during the current study are available from ence, Shanghai, China, 2023: 135–145. the corresponding author on reasonable request. [18] L X Liang. Precise iterative closest point algorithm for RGB-D data registra- tion with noise and outliers. Neurocomputing, 2020, 399: 361–368. [19] Q Lin, R Yang, K Cai, et al. Real-time automatic registration in optical surgi- Declarations cal navigation. Infrared Physics & Technology, 2016, 76: 375–385. [20] Y Xu, F Gao, H Ren, et al. An iterative distortion compensation algorithm Competing Interests for camera calibration based on phase target. Sensors, 2017, 17(6): 1188. The authors declare no competing financial interests. [21] H Liu, J Fu, M He, et al. GWM-view: gradient-weighted multi-view calibra- tion method for machining robot positioning. Robotics and Computer- Integrated Manufacturing, 2023, 83: 102560. Received: 28 February 2025 Revised: 9 July 2025 Accepted: 21 July 2025 [22] A Taleb, C Guigou, S Leclerc, et al. Image-to-patient registration in computer-assisted surgery of head and neck: State-of-the-art, perspec- tives, and challenges. Journal of Clinical Medicine, 2023, 12(16): 5398. [23] A Martin-Gomez, H Li, T Song, et al. STTAR: surgical tool tracking using off-the-shelf augmented reality head-mounted displays. IEEE Transactions References on Visualization and Computer Graphics, 2023. [1] T Vos, S S Lim, C Abbafati, et al. Global burden of 369 diseases and injuries [24] J Zhang, Z Yang, S Jiang, et al. A spatial registration method based on in 204 countries and territories, 1990–2019: A systematic analysis for 2D–3D registration for an augmented reality spinal surgery navigation the global burden of disease study 2019. The Lancet, 2020, 396(10258): system. The International Journal of Medical Robotics and Computer Assisted 1204–1222. Surgery, 2024, 20(1): e2612. [2] V L Feigin, T Vos, F Alahdab, et al. Burden of neurological disorders across [25] M T Holland, K Mansfield, A Mitchell, et al. Hidden error in optical stereo - the US from 1990–2017: A global burden of disease study. JAMA Neurol., tactic navigation systems and strategy to maximize accuracy. Stereotactic 2021, 78(2): 165–176. and Functional Neurosurgery, 2021, 99(5): 369–376. [3] C Faria, W Erlhagen, M Rito, et al. Review of robotic technology for [26] Y Wang, W Wang, Y Cai, et al. A guiding and positioning motion strategy stereotactic neurosurgery. IEEE Reviews in Biomedical Engineering, 2015, 8: based on a new conical virtual fixture for robot-assisted oral surgery. 125–137. Machines, 2022, 11(1): 3. [4] Z Wu, D Chen, C Pan, et al. Surgical robotics for intracerebral hemorrhage [27] H Su, W Qi, C Yang, et al. Deep neural network approach in robot tool treatment: State of the art and future directions. Annals of Biomedical dynamics identification for bilateral teleoperation. IEEE Robotics and Engineering, 2023, 51(9): 1933–1941. Automation Letters, 2020, 5(2): 2943–2949. [5] T Haidegger, Z Benyo, K Peter. Patient motion tracking in the presence of [28] T Haidegger, S Speidel, D Stoyanov, et al. Robot-assisted minimally inva- measurement errors. Proceedings of the 2009 Annual International Confer- sive surgery—surgical robotics in the data age. Proceedings of the IEEE, ence of the IEEE Engineering in Medicine and Biology Society, Minneapolis, 2022, 110(7): 835–846. USA, September 3–6, 2009: 5563–5566. [29] S Dinesh, U K Sahu, D Sahu, et al. Review on sensors and components [6] G Z Yang, J Cambias, K Cleary, et al. Medical robotics—regulatory, ethical, used in robotic surgery: recent advances and new challenges. IEEE Access, and legal considerations for increasing levels of autonomy. Science Robot- 2023, 11: 140722–140739. ics, 2017, 2(4): 8638. [30] S Niyaz, A Kuntz, O Salzman, et al. Following surgical trajectories with concentric tube robots via nearest-neighbor graphs. Proceedings of the Chen  et al. Chinese Journal of Mechanical Engineering (2025) 38:168 Page 27 of 27 2018 International Symposium on Experimental Robotics, Buenos Aires, Kai Chen born in 1995, is currently a PhD candidate at School of Argentina, 2020: 3–13. Mechanical Engineering and Automation, Beihang University, China. His [31] W Park, Y Wang, G S Chirikjian. The path-of-probability algorithm for steer- research interests include surgical robotics and medical image pro- ing and feedback control of flexible needles. The International Journal of cessing. E-mail: [email protected]. Robotics Research, 2010, 29(7): 813–830. [32] A Segato, V Pieri, A Favaro, et al. Automated steerable path planning for Diansheng Chen born in 1969, is currently a professor at School of deep brain stimulation safeguarding fiber tracts and deep gray matter Mechanical Engineering and Automation, Beihang University, China. He nuclei. Frontiers in Robotics and AI, 2019, 6: 70. received his PhD degree from Jilin University, China, in 2003. E-mail: [33] A Hong, Q Boehler, R Moser, et al. 3D path planning for flexible needle steering in neurosurgery. The International Journal of Medical Robotics and [email protected]. Computer Assisted Surgery, 2019, 15(4): 1998. [34] L Hao, D Liu, S Du, et al. An improved path planning algorithm based on Ruijie Zhang born in 2000, is currently a master candidate at artificial potential field and primal-dual neural network for surgical robot. School of General Engineering, Beihang University, China. He received Computer Methods and Programs in Biomedicine, 2022, 227: 107202. his bachelor’s degree from Beihang University, China, in 2023. [35] S O Park, M C Lee, J Kim. Trajectory planning with collision avoidance for redundant robots using Jacobian and artificial potential field-based real- Cai Meng born in 1977, is currently an associate professor at School time inverse kinematics. International Journal of Control, Automation and of Astronautics, Beihang University, China. He received his PhD degree Systems, 2020, 18: 2095–2107. from Beihang University, China, in 2004. E-mail: [email protected]. [36] B Kovács, G Szayer, F Tajti, et al. A novel potential field method for path planning of mobile robots by adapting animal motion attributes. Robotics and Autonomous Systems, 2016, 82: 24–34. Zhouping Tang born in 1969, is currently a professor at Tongji Med- [37] L He, Y Meng, J Zhong, et al. Preoperative path planning algorithm for ical College, Huazhong University of Science and Technology, China. He lung puncture biopsy based on path constraint and multidimensional received his PhD degree from Tongji Medical College, Huazhong Uni- space distance optimization. Biomedical Signal Processing and Control, versity of Science and Technology, China, in 2004. 2023, 80: 104304. [38] G Tong, X Wang, H Jiang, et al. A deep learning model for automatic segmentation of intraparenchymal and intraventricular hemorrhage for catheter puncture path planning. IEEE Journal of Biomedical and Health Informatics, 2023. [39] J Han, J Davids, H Ashrafian, et al. A systematic review of robotic surgery: From supervised paradigms to fully autonomous robotic approaches. The International Journal of Medical Robotics and Computer Assisted Surgery, 2022, 18(2): 2358. [40] F Xu, H Jin, X Yang, et al. Improved accuracy using a modified registration method of ROSA in deep brain stimulation surgery. Neurosurgical Focus, 2018, 45(2): 18. [41] H Yasin, H J Hoff, I Blümcke, et al. Experience with 102 frameless stereo - tactic biopsies using the Neuromate robotic device. World Neurosurgery, 2019, 123: 450–456. [42] D Tian, Q Xu, X Yao, et al. Diversity-guided particle swarm optimization with multi-level learning strategy. Swarm and Evolutionary Computation, 2024, 86: 101533. [43] J Han, M Luo, Y You, et al. Optimization scheme for online viewpoint planning of active optical navigation system in orthopedic surgeries. IEEE Transactions on Instrumentation and Measurement, 2023, 72: 1–13. [44] L Chen, L Ma, F Zhang, et al. An intelligent tracking system for surgical instruments in complex surgical environment. Expert Systems with Appli- cations, 2023, 230: 120743. [45] Y Wang, W Wang, Y Cai, et al. Preliminary study of a new macro‐micro robot system for dental implant surgery: Design, development and con- trol. The International Journal of Medical Robotics and Computer Assisted Surgery, 2024, 20(1): e2614. [46] J Wang, C Lu, Y Lv, et al. Task space compliant control and six-dimensional force regulation toward automated robotic ultrasound imaging. IEEE Transactions on Automation Science and Engineering, 2023. [47] F Suligoj, M Švaco, B Jerbić, et al. Automated marker localization in the planning phase of robotic neurosurgery. IEEE Access, 2017, 5: 12265–12274. [48] H Liu, F R Y Baena. Automatic markerless registration and tracking of the bone for computer-assisted orthopaedic surgery. IEEE Access, 2020, 8: 42010–42020. [49] L Xu, H Zhang, J Wang, et al. Information loss challenges in surgical navigation systems: From information fusion to AI-based approaches. Information Fusion, 2023, 92: 13–36. [50] C M Heunis, B F Barata, G P Furtado, et al. Collaborative surgical robots: optical tracking during endovascular operations. IEEE Robotics & Automa- tion Magazine, 2020, 27(3): 29–44. [51] S Tukra, H J Marcus, S Giannarou. See-through vision with unsupervised scene occlusion reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(7): 3779–3790.

Journal

Chinese Journal of Mechanical EngineeringSpringer Journals

Published: Sep 3, 2025

Keywords: Neurosurgical robot; Automatic detection; Monocular registration; Visual servo control

There are no references for this article.