Estimating Depth and localizing endoscope in surgical environment is critical in many tasks such as, intra-operative registration, augmented reality, surgical automation, among many others. Monocular self-supervised depth and pose estimation methods can estimate depth and camera pose without requiring labels. However, how do these methods perform in presence of deformation while endoscope moves through the lumen is not known. Therefore, through this project we want to evaluate the effect of addition of primarily two modules on depth and pose estimation accuracy. These modules are TransUnet and Optical Flow Module. Optical Flow can capture the image intensity changes in the scene because of deformation. And TransUnet can potentially capture the temporal correlations between the image frames to give better pose and depth predictions. For the project open sources dataset and github codes will be utilized.
Next Steps: creating a 3D mesh from generated depth values.
Left : Ground Truth, Right : The 3D Depth prediction (Purple - Yellow : Farther - Close)
HuggingFAce link: https://huggingface.co/spaces/mkalia/DepthPoseEstimation
Simple Upload and Predict