Asset Bank has the capability to detect potential duplicates when a user attempts to upload a file.
The application setting to switch this on is:
The following setting specifies which type of simple matching to use:
- filename = uploaded file(s) will match any assets with the same original filename as the uploaded filename
- data = on upload a unique identifying hash is generated so that any identical files that are uploaded are detected.
- perceptual = on upload a 'perceptual hash' is created for each image which forms a 'fingerprint' for the image to be used when identifying whether a newly uploaded image is already in asset bank (even if it is a resized version). Non-image files are treated as in the 'data' setting with unique hash generated.
- filesize = uploaded file(s) will match any assets with the same file size in bytes as the uploaded filesize
- both = uploaded file(s) will match any assets that have the same original filename AND the same file size in bytes as the uploaded file
- threshold = uploaded file(s) will match any assets that have the same file size in bytes AND (are smaller than duplicate-asset-size-threshold in size OR have the same original filename)
- either = uploaded file(s) will match any assets that have the same original filename OR the same file size in bytes as the uploaded file
Perceptual Hash Settings
Asset Bank has the capability to match assets on a perceptual basis by examining the actual image. This involves analysing the file on upload and is controlled with the following setting:
There is a small overhead involved in generating the perceptual hash (or if the file is not an image, a unique digital fingerprint is generated instead).
The perceptual hash matching is based on how an image looks and so can identify images that have been resized (including changes to the image aspect ratio) and recoloured (greyscale images will match colour ones and vice-versa). It will not be successful matching a cropped image to an uncropped one.
The sensitivity is set with the following setting:
This can be a number from 1-64:
- 1 is the least sensitive (and may not correctly identify images which have been resized or colour changed).
- 5 will tolerate a few differences but is likely a very similar image.
- above 10 is likely to incorrectly match unrelated images.
Regenerate hashes for existing assets
If duplicate-file-check-on-upload=true and duplicate-asset-check-type is set to 'perceptual' or 'data' and there are existing assets in the system, then asset hashes need to be regenerated for all assets in the system. This functionality can be found at:
Admin > System > Developer > Housekeeping Tools > Regenerate perceptual image hashes