The point is, they aren't increasing the accuracy or resolution of the kinect. In their paper they mention that they strap a 15 mega pixel DSLR camera onto the kinect camera with a moving polarized lens. This camera needs 3 images to compute a normal map.
They then use the data from the kinect with the normal map to improve the data. However, you can make a structured light scanner with 3 images, which does not even need moving parts and would probably give you even better results.
They are making a very, very unfair comparison.