Common Sense: Notes on Panoptic Segmentation in Computer Vision

Tuesday, August 18, 2020

Notes on Panoptic Segmentation in Computer Vision

I am currently studying some research papers related to panoptic segmentation in the field of computer vision. It seems to be a current hot topic. Trying to unify the heretofore problematic schism/dichotomy of semantic segmentation and instance segmentation is certainly a promising approach.

However, I find panoptic segmentation so far to be not very convincing:

It is still an approach that depends heavily on supervised learning
The current dichotomy of semantic segmentation and instance segmentation is probably artificial and unnecessary
It operates on the pixel level for analysis and processing, while humans and most likely animals never look at an image at pixel level
They use and define terms like "things" and "stuff" to distinguish segmentation approaches. They also base metrics etc. on these terms. Sounds jovial, but not very scientific or convincing, rather ambiguous. It seems rather an indication that the researchers in this area do not yet have a good understanding or that their approach is a dead end
I am also not sure whether questions about foreground vs. background, depth issues, or overlapping and/or discontinuous objects in images have been addressed sufficiently or successfully
Then there is the issue of void labels or void pixels. If the total number of void pixels (pixels that are e.g. ambiguous, corrupted, etc.) in the image is relatively small or they are truly not too relevant for the image segmentation task at hand, then this maybe acceptable.
The major, primary datasets used for training and evaluation are reported to have only a pixel annotation coverage between 89% and 98%. Does this raise any doubts?

CVPR 2019: Panoptic Segmentation by Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, Piotr Dollar (at least two of these authors are preeminent researchers in computer vision)

Tuesday, August 18, 2020

Notes on Panoptic Segmentation in Computer Vision

No comments: