Video Duplicate Finder

Video Duplicate Finder is a command-line program (and linux-only GUI) to search for duplicate and near-duplicate video files. It is capable of detecting duplicates even when the videos have been: * Resized (including changes of aspect ratio) * Watermarked * Letterboxed

Video duplicate finder contains: * A command line program for listing unique/dupliacte files in a filesystem. * An optional linux-only GUI (written in GTK) to allow users to examine duplicates and mark them for deletion

How it works

Video Duplicate finder extracts several frames from the first minute of each video. It creates a "perceptual hash" from these frames using 'Spatial' and 'Temporal' information from those frames: * The spatial component describes the parts of each frame that are bright and dark. It is generated using the pHash algorithm described in here * The temporal component describes the parts of each frame that are brighter/darker than the previous frame. (It is calculated directly from the bits of the spatial hash)

The resulting hashes can then be compared according to their hamming distance. Shorter distances represent similar videos.

Requirements

Ffmpeg must be installed on your system and be accessible on the command line.

Examples

To find all duplicate videos in directory "dogvids": * viddupfinder --files dogvids

To find all videos which are not duplicates in "dogvids": * viddupfinder --files dogvids --search-unique

To find videos in "dogvids" that have accidentally been replicated into "catvids" * viddupfinder --files catvids --with-refs dogvids

To exclude a file or directory from a search, e.g "dogvids/beagles" * viddupfinder --files dogvids --exclude dog_vids/beagles

To run the gui to examine duplicates: * viddupfinder --files dog_vids --gui

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.