A useful feature for managing the number of assets in your Asset Bank and reducing your storage usage is the inbuilt duplicate detection system. On upload, files that already exist as assets are identified and you can choose to edit the existing assets instead of uploading the duplicates.
Duplicate detection is on by default but if you wish to check that it is on for your Asset Bank or if you wish to switch it off then check that the setting duplicate-file-check-on-upload=true in the ApplicationSetting.properties file. If you host Asset Bank with us contact the Customer Support Team to check this for you.
The method that Asset Bank uses to detect duplicates can be changed as well using the duplicate-asset-check-type setting. Below is a brief explanation of each duplicate detection method:
- filename = uploaded file(s) will match any assets with the same original filename as the uploaded filename
- data = on upload a unique identifying hash is generated so that any identical files that are uploaded are detected.
- perceptual = on upload a 'perceptual hash' is created for each image which forms a 'fingerprint' for the image to be used when identifying whether a newly uploaded image is already in asset bank (even if it is a resized version). Non-image files are treated as in the 'data' setting with a unique hash generated.
- filesize = uploaded file(s) will match any assets with the same file size in bytes as the uploaded filesize
- both = uploaded file(s) will match any assets that have the same original filename AND the same file size in bytes as the uploaded file
- threshold = uploaded file(s) will match any assets that have the same file size in bytes AND (are smaller than duplicate-asset-size-threshold in size OR have the same original filename)
- either = uploaded file(s) will match any assets that have the same original filename OR the same file size in bytes as the uploaded file
Perceptual Hash Settings
Asset Bank has the capability to match assets on a perceptual basis by examining the actual image. This involves analysing the file on upload and is controlled with the following setting:
There is a small overhead involved in generating the perceptual hash (or if the file is not an image, a unique digital fingerprint is generated instead).
The perceptual hash matching is based on how an image looks and so can identify images that have been resized (including changes to the image aspect ratio) and recoloured (greyscale images will match colour ones and vice-versa). It will not be successful matching a cropped image to an uncropped one.
The sensitivity is set with the duplicate-asset-similarity-tolerance setting. This can be a number from 1-64 where:
- 1 is the least sensitive (and may not correctly identify images which have been resized or colour changed).
- 5 will tolerate a few differences but is likely a very similar image.
- above 10 is likely to incorrectly match unrelated images.
Regenerate hashes for existing assets
If duplicate-file-check-on-upload=true and duplicate-asset-check-type is set to perceptual or data and there are existing assets in the system, then asset hashes need to be regenerated for all assets in the system. This functionality can be found at:
Admin > System > Developer > Housekeeping Tools > Regenerate perceptual image hashes