A useful feature for ensuring your Asset Bank is a single source of truth, without duplicates, is the inbuilt duplicate detection system. On upload, files that already exist as assets are identified and you can choose to delete the duplicates.
Duplicate detection is on by default but if you wish to check that it is on for your Asset Bank then please contact the Customer Support Team to check this for you.
If you wish to switch it off/on then you can do this via the setting duplicate-file-check-on-upload in the ApplicationSetting.properties file.
The method that Asset Bank uses to detect duplicates can be changed as well using the duplicate-asset-check-type setting. Below is a brief explanation of each duplicate detection method:
- data = on upload a unique identifying hash is generated so that any identical files that are uploaded are detected. This finds identical files even if the filename itself is different.
- perceptual = on upload a 'perceptual hash' is created for each image which forms a 'fingerprint' for the image to be used when comparing files. This means it can match other images even if they have a different size, name and file extension. With this active non-image files (like videos and PDFs) are treated as in the 'data' setting above.
Legacy settings (due to be removed)
- filename = uploaded file(s) will match any assets with the same original filename as the uploaded filename
- filesize = uploaded file(s) will match any assets with the same file size in bytes as the uploaded filesize
- both = uploaded file(s) will match any assets that have the same original filename AND the same file size in bytes as the uploaded file
- threshold = uploaded file(s) will match any assets that have the same file size in bytes AND (are smaller than duplicate-asset-size-threshold in size OR have the same original filename)
- either = uploaded file(s) will match any assets that have the same original filename OR the same file size in bytes as the uploaded file
Perceptual Hash Settings
Asset Bank has the capability to match assets on a perceptual basis by examining the actual image. This involves analysing the file on upload and is controlled with the following setting:
There is a small overhead involved in generating the perceptual hash.
The perceptual hash matching is based on how an image looks and so can identify images that have been resized (including changes to the image aspect ratio) and recoloured (greyscale images will match colour ones and vice-versa). It will not be successful matching a cropped image to an uncropped one.
The sensitivity is set with the duplicate-asset-similarity-tolerance setting. This can be a number from 1-64 where:
- 1 is the least sensitive (and may not correctly identify images which have been resized or colour changed).
- 5 will tolerate a few differences but is likely a very similar image.
- above 10 is likely to incorrectly match unrelated images.
Regenerate hashes for existing assets
If duplicate-file-check-on-upload=true and duplicate-asset-check-type is set to perceptual or data and there are existing assets in the system that were uploaded prior to this activation, then asset hashes need to be regenerated for these. This functionality can be found at:
Admin > System > Developer > Housekeeping Tools > Regenerate perceptual image hashes