- Idée Inc brings image de-duplication to Digg
Idée’s image recognition technology is at the heart of Digg’s dedicated images section making sure duplicate images are identified.
In preparing to launch a dedicated images section, Digg recognized that they needed to deal with duplicate image submissions. When you have over a million registered users that’s bound to happen! Digg already had controls in place to help avoid duplicate stories from being submitted and the same needed to happen for image submissions.
Now recognizing an exact duplicate image file is easy. Simply use a good hash function on all previously submitted images and compare the hash value to the newly submitted image hash and voila, problem solved!
However in real life, the problem is that as an image is copied and exchanged, many small edits are made. Everything from resizing, cropping and file format changes to color shifts, rotation adjustments and text overlays commonly occur. With any of these edits, a hash function approach breaks down completely.
That’s where Idée’s image recognition technology comes into play: being able to compare each newly submitted image to potentially hundreds of thousands of previously submitted ones. And to do this all in a fraction of a second.
How does Idée’s image recognition work?
Idée’s proprietary algorithms look at the patterns within an image to identify it. Each image submitted to the image recognition server is analyzed to build up an image identifier comprised of hundreds of image fingerprints. If you want an analogy you can simply think that each image you look at has a unique digital identifier, this identifier is comprised of hundreds of image fingerprints. This allows our image recognition technology to identify an image even if it has been cropped, rotated, color adjusted and had border applied.To see great image recognition examples, visit the Idée image gallery.
We delivered this image recognition technology as a web service to Digg.
It has been designed and architected to deal with numerous image variations and the high volume of images expected to be submitted.Let me take you through a Digg image submission to see how the image recognition works:
Submit a page containing an image
If the image has been submitted previously, thanks to the good folks at Idée, you will see this page:
Happy image digging. We love feedback so please comment below or drop us an email with your feedback info@ideeinc.com
If you are curious about what else we are up to, visit the Idée labs.
- Digging the Same Image Twice
We are going to submit this original image into Digg’s new Image section:

and immediately after that, we’re going to submit this copy which has been cropped and converted to grayscale.

What will happen? Our next post will show you!

