The Synset Boulevard dataset is designed for the task of vehicle make and model recognition (VMMR), and is—to the best of our knowledge—the first entirely synthetically generated large-scale VMMR image dataset.
Through the simulation of image data rather than the manual annotation of real data, it is intended to mitigate common challenges in state-of-the-art VMMR datasets, namely bias, human error, privacy, and the challenge of providing systematic updates. On the other hand, the provision and use of synthetic data introduce individual challenges, such as potential domain gaps and a less pronounced intra-class variance.
The dataset was generated using path tracing and physically-based, data-driven models, and contains 32,400 independent images (each with different imaging simulations and with/without masked license plates, leading to a total of 259,200 images) from 162 different vehicle models of 43 makes depicted in front view. It is split into 8 sub-datasets to investigate the influence of optical/imaging effects on the classification ability.