This is certainly a major effort to create a foundation model for image segmentation and a corresponding high quality dataset! It is quite an impressive and comprehensive work!
However, in my opinion, there were still too many human annotators involved in creating the masks and in the process of quality assurance of these masks.
The e.g. edge detection results of their Segment Anything Model seem to be quite good given that the model was not trained or fine tuned on this task.
These researchers from Meta/Facebook tried so hard to be politically correct and in their virtue signalling that it is cringe worthy. This paper reminds you almost more of a study in sociology than computer vision:
- I would roughly estimate that almost 10% of their 117 references belong to this category
- They tried to be geographically and regarding income distribution as diverse as possible in choosing the 11 million images of their dataset. However, Russia (ranked first by image count) and Thailand (ranked 2nd) each make up about over 800,000 and 700,000 images respectively (from the U.S. about 700,000 as well). From Africa with over 1.2 billion population only 300,000 images. Very curious! The authors sincerely state "We note that the top-three countries are from different parts of the world." Some AI PhDs have a strange mind! When diversity becomes an obsessive fetish! In my opinion, this effort was a waste of resources! Thorough deduplication and a large variety of image sources would have sufficed for now! Had they used only images from the U.S. and Europe that would have been good enough for now!
- The authors inferred the geographic location of an image from its caption. That is amateurish! As the authors themselves note "there are ambiguities and potential for biases with this method (e.g., “Georgia” may refer to the country or the US state)"
- They even used a "More Inclusive Annotations for People (MIAP)". Let's not speculate how much bias this may have introduced. Plus, they "use a proprietary dataset that contains annotations for the perceived Fitzpatrick skin [tone] type". Further, the authors "extend the analysis to segmenting clothing where we find an indication of bias across perceived gender presentation."
Figure 7: Estimated geographic distribution of SA-1B images
No comments:
Post a Comment