We present a novel method to supervise 3D Gaussian Splatting
(3DGS) scenes using optical tactile sensors.
Optical tactile sensors have become widespread in their use in
robotics for manipulation and object representation; however,
raw optical tactile sensor data is unsuitable to directly
supervise a 3DGS scene. Our representation leverages a
Gaussian Process Implicit Surface to implicitly represent the
object, combining many touches into a unified representation
with uncertainty.
We merge this model with a monocular depth estimation network
which is aligned in a two stage process coarsely aligning with
a depth camera and then finely adjusting to match our touch
data.
For every input color image, our method produces a
corresponding fused depth and uncertainty map. Utilizing this
additional information, we propose a new loss function,
variance weighted depth supervised loss.
We leverage the DenseTact optical tactile sensor and RealSense
RGB-D camera, deploying both on a Kinova Gen3 robot, to show that combining touch and vision in this
manner leads to quantitatively and qualitatively better
results than vision or touch alone in a few-view scene
syntheses on opaque as well as on reflective, and transparent
objects. Our method is highlighted below, where prior methods fail to reconstruct the geometry of a mirror.