Unsplash, a website that enables anyone to share high-quality images under a Creative Commons Zero license, has announced the release of what it says is the ‘most complete high-quality open image dataset ever.’ The dataset contains more than 2,000,000 images, according to Unsplash, which sourced the images from more than 200,000 photographers around the world.
An image dataset is a collection of images that can be downloaded as a full batch; they contain relevant details, such as EXIF data, location information and more. In this case, Unsplash says that its dataset includes data on AI- and community-generated keywords for the images, landmark details when relevant, image categories and subcategories, download stats, the number of image views, groupings of images, user-generated collections and ‘keyword-image conversions in search results.’
All data included with the dataset is anonymized and private, with the only exception being attribution to photographers. Unsplash says that it sourced the data from ‘hundreds of millions [of] searches across a nearly unlimited number of uses and contexts.’
The ‘complete’ nature of this dataset distinguishes it from other open-source image datasets, which Unsplash notes often have various issues, such as relying on mass image labeling from third-parties, the use of low-quality images, size limitations and other issues that may limit their usefulness.
In its present form, the dataset is 16GB in size, but Unsplash says that it will continue updating the dataset with additional images and fields as its online library grows.
The dataset is available to download from a dedicated portal on the Unsplash website, where two download options are available: the full high-quality 16GB dataset, which is offered only for non-commercial use, and a ‘Lite’ version that is only 550MB and available for both non-commercial and commercial use.
The full dataset contains more than 2,000,000 images, 5,000,000 keywords and 250,000,000 searches. The ‘Lite’ data is limited to 25,000 images and keywords, as well as 1,000,000 searches. Whereas the Lite dataset is available for anyone to download, the full dataset requires users to request permission to download.
The company requires certain details from the user as part of their request, including name, email and the intended use of the data. In addition to the dedicated download website, Unsplash has published the related documentation on Github.
Unsplash remains as controversial as it is popular. The website has been integrated into a number of services, including Adobe, Trello, Wix, Medium, Facebook and thousands of other platforms. The service is distinguished from other free photo platforms by the high-quality nature of the images available to the public under a CC0 license, making them available for non-commercial and commercial use.
Professional photographers have criticized the platform as undermining the profession and photographers who contribute images as devaluing their work, among other things. Back in 2017, Unsplash founder Mikael Cho attempted to address these concerns in a blog post, stating, ‘We didn’t start Unsplash to reinvent an industry. We started Unsplash because we thought it might be useful.’