Written on the 27th of February, 2005
Posted in Linux, Microsoft, Open Source, Software

dupliFinder

Pierece Triplets - dupliFinderdupliFinder is a graphical tool that searches directories on your computer for duplicate files by checking and compairing the MD5 sum of each file. This means that the contents of the file is examined, not the filename.

You then have the option of reviewing the duplicate files, and then deleting them. It’s great for finding duplicates in your MP3, image or movie collections. Versions 0.09 and down were written in Java 1.5 but the latest release I’m working on will be written with the .NET Framework 2.0
SourceForge.net


dupliFinder 0.09
dupliFinder 0.09 on Windows XP

dupliFinder is free, open source software © Marcus Wynwood


If you like it, buy me a beerVB
I’d like to donate $
so you can buy yourself a cold one

Your comments:


Running .JAR files on Windows and Linux
Windows XP
I think that Java 1.5 does all this for you, but if it doesn’t all you need to do is make sure that .jar is associated with “javaw”. One way to do this is to:

  1. Open up your control panel
  2. Open “Folder Options”
  3. Click the “File Types” tab
  4. Select the JAR extension from the list. If a jar extension can not be found, create a new one, and then select it.
  5. Click the Advanced button under the details panel.
  6. Select the open action and click edit, or create a new action for open if there isn’t one.
  7. Make sure that the “Application to perform the action” is: "c:\jdk1.5.0\jre\bin\javaw.exe" -jar "%1" %* The default path is “c:\jdk1.5.0\jre\bin” so you’ll have to change it if you installed Java somewhere else.

Linux
I think that Java 1.5 does all this for you, but if it doesn’t all you need to do is make sure that .jar files are associated with “java -jar“. If your distro uses Gnome, you can just right click and go “Open With...” to set this up.


Project Statistics

More stats here…

30 Responses to “dupliFinder”

  1. Nathan says:

    April 18th, 2005 at 12:12 pm

    Exception in thread “AWT-EventQueue-0″ java.lang.OutOfMemoryError: Java heap space

    Perhaps because it found too many duplicates: over 1000 as reported by another duplicate finder. The duplicates are across about 5 checksums, it appears. dupliFinder finds and prints text about matches until it dies.

    After I get the error, the program can’t exit either. I have to kill it. When I try to exit, it just gives another out of memory error.

    My system has 1GB of RAM. I ran dupliFinder with about 280MB free and watch my memory run out.

    It died processing the directory of the program at:
    jtcfrost.sourceforge.net

  2. Mugga says:

    April 18th, 2005 at 5:18 pm

    It looks like I’m not managing memory as well as I could.
    At the moment all the calculations happen in memory, this is fast for small numbers of files, but it looks like it’s no good for lots of files. Maybe I need to use some temp files instead. I might try that in the next verison.
    Thanks for your help - and thanks for trying dupliFinder :-)

  3. Desummoner says:

    April 25th, 2005 at 11:07 am

    Nice project.
    But it can be much more efficient if it first check for size of file.
    I.e. first parse file listing so that only files with duplicate sizes left and then calculate checksums for them.

  4. Bleeter Yaluser says:

    April 25th, 2005 at 4:31 pm

    Just been looking for a *good* duplicate file finder program. Imagine my amazwement when your page turned up! Haven’t tried the program yet, thoguht I’d send out mad greets from Hobart though.

    Cheers,

    Pete.

  5. Mugga says:

    April 25th, 2005 at 7:51 pm

    Thanks Desummoner :-) Version 0.1 has just been released

  6. Pierce says:

    May 9th, 2005 at 1:31 am

    It’ be nice if it did more than found *identical* copies (it’s called duplicate finder afterall).
    FSlint will find indetical copies for me, wht about visually identical or similar files? There
    are a few console apps listed on freshmeat (and several windows apps) that’ll do this
    but nothing cross-platform/linux that’ll let me dblcheck the results visually myself.

  7. Mugga says:

    May 9th, 2005 at 1:24 pm

    duplicate: Identically copied from an original (Dictionary.com)

    dupliFinder will let you preview images and text in the preview frame like in the screenshots. This is good if you wanna look at them yourself, but maybe you mean something that finds files that are *nearly* the same?

    Give me an example of what you mean and I’ll see if I can do it for the next version :-)
    Thanks!

  8. Moop says:

    May 29th, 2005 at 9:49 pm

    I merged a bunch of font collections, about 2000 files, and Duplifinder is very promising, except that I need to run “Find Duplicates” several DOZEN times to get it to find all of the duplicates, sometimes failing to delete files which don’t even have a read-only attribute. As soon as the algorithm finds 0 duplicates, I still need to find the rest manually using “FC /a filename1 filename2″ (for windows files…haven’t tested with linux/other)… Good work, but I hope you continue this and fix it. I’ve been looking for something like this for a long time to save time.

  9. Mugga says:

    July 18th, 2005 at 8:15 pm

    Bug fix coming soon…

  10. Walt Sloan says:

    September 10th, 2005 at 3:49 am

    It looks like dupfinder only works on ONE directory. I want to review many directories over many drives. How can I do this?

    // Walt //

  11. Mugga says:

    September 10th, 2005 at 1:06 pm

    At the moment dupliFinder will scan recursivly from one starting point. It can’t look in more than one starting point at a time. Maybe that’s an idea for the next version :-)

    If you are using Linux, the dupliFinder Script CAN look in multiple directories - give that a try :-)

  12. mwynwood.com » September Holidays says:

    September 16th, 2005 at 2:41 pm

    […] Maybe create a .net version of dupliFinder […]

  13. Jojo says:

    October 1st, 2005 at 7:53 pm

    It seems files with spaces in the names are not found. I copied some files to a directory. Then copied and pasted them in the same directory using windows, so the names became Copy of …, Copy of …, and used the program and it didn’t notice they were copies until I renamed Copy of, to filenameoriginalA.ext etc. then it worked.

  14. Mugga says:

    October 3rd, 2005 at 12:27 pm

    Version 0.09 still seems to be the best.
    I’m working on making a .net version of dupliFinder - would anyone be interested in that?

  15. francesco says:

    November 5th, 2005 at 8:55 am

    I tried today your program to make order in my backups. I moved all my data in a brand new disk, E:, with several subdirs and a lot of files (about 700 Gb of files).

    I tried first to run it from E:\ but it reported 0 duplicate files immediately. It is like it doesn’t ever care to recurse directory !!
    Then I created a “root” dir and moved all the other dirs in it; tried to scan e:\root but got same results.
    If I select E:\root\dirA\dirB , it works, scanning recursively.
    I’m using a windows 2000 advanced server with java 1.5 just installed.
    Francesco

  16. TwinsWanted says:

    November 11th, 2005 at 12:28 pm

    My wife always wanted twins so does this program work in reverse (or can you modify it so it creates twins)?

  17. Hayley says:

    November 19th, 2005 at 1:14 am

    Well done with the program! I wish I had downloaded it sooner! Dad had made a complete mess of the computer with copies of copies of copies of files… some with the same file name, and some different, and always in different folders! I did the first scan… just of “My Documents” and it found over a thousand duplicates! I decided delete manually first, mainly via drag & drop (didn’t want to delete anything by mistake). I got it down to just under 200, then used duplifinder’s lovely “Delete” Function in subfolders. Currently at 40 duplicates (still in My Documents) and will be finished very soon… finally!!!

    When I’m finished, I think I’m going to have to educate my father to not make this huge mess again!!

    P.S. One suggestion - could you make a button at the top of the checkbox column to “unselect all”, or leave them blank to begin with, as I often found I had to delete the ones on the right due to their location.

  18. SY says:

    December 1st, 2005 at 7:37 am

    anticipated to see the next release, when will it be available.?

  19. Francesco says:

    January 30th, 2006 at 9:46 pm

    Hello to everybody.
    I want to testify that this program HAS PROBLEMS. It failed to recognize duplicate files that I was able to find with one I wrote myself.
    The java version sometimes fails immediately, sometimes starts and works indefinitely and there are no clues on what it is doing…..

  20. Pete says:

    January 31st, 2006 at 11:01 am

    Yeah, it’s got a few bugs - but nothing major. Version 0.08 seems to be the best one. The .NET one looks cool - can’t wait till that’s ready to download.

  21. Administrator says:

    March 11th, 2006 at 11:38 pm

    […] […]

  22. Duese says:

    May 11th, 2006 at 11:16 pm

    Hello!
    First of all:
    -Nice tool!

    Some suggestions:
    -dupliFinder.log should be more csv like. so i can import it in excel
    -Offer a button to open the file
    -Right-click on duplicate offers menu:
    –open file
    —left
    —right
    —both
    –delete file
    —left
    —right

    Bug-like occurrences:
    -when i run through the list of duplicates by keys dupliFinder doesn’t change the preview

    But in total: Nice Tool!

    Bye
    Duese

  23. Duese says:

    May 11th, 2006 at 11:19 pm

    p.s. the button to open the file (left+right) should appear in below the preview menu…

  24. DHN says:

    October 16th, 2006 at 9:15 am

    I’d like to have it search multiple directories. Is this possible?

  25. ricky says:

    November 24th, 2006 at 10:56 am

    how about a copy duplicates to folder option? Somtimes i don’t want to delete just yet

  26. Marcus says:

    November 24th, 2006 at 11:45 am

    Thanks for all the suggestions - I’ll try and add them to the next version :-)

  27. Mike Jones says:

    December 12th, 2007 at 5:26 pm

    i stumbled apon your program on source forge quite a few months ago.
    i cant say if you ever read this page any more, as the last comment was over a year ago.
    however, some suggestions that you may wish to implement in your java version is to thread the duplicate finding, allow for background cpu usage (set it and forget it). additionally, i am not certain if your code takes this into account, as your in code comments are slightly confusing to me, as well as no javadoc, but you should probably consider improving the folder-recursing functionality. i have tested your implementation of the program on a folder which contains one subfolder that i know to contain duplicates (because i put them there myself) and several others with no duplicates, or duplicates of the file already in question. many times the program returns a false negative.

    a suggestion for that, in order to cut down on search time would be to impliment a hashmap.
    but as i said, i am uncertain if you do or do not do that, as the last time i read through your code i was a bit thrown off from the commenting.

    if you no longer work on this program, i would be thrilled to take over development from you. i am currently a computer science student and try to take every opportunity to do real world programing.

    cheers!
    -Mike

  28. Marcus says:

    May 21st, 2008 at 10:05 pm

    Hi Mike, I’ve sent you an email about this.
    I’d be happy for you to expand on what I have started :-)

  29. ???????? ??????????? MD5 « J3qx says:

    January 8th, 2009 at 8:50 pm

    […] dupliFinder? - ??? ??????????? ??????? ??? ?????? ????????????? ?????? ? ??????????? ?? ?????????? ????? ????????? ?? ??????????? ???? MD5. ??? ????????, ??? ???????????? ?????????? ??????, ? ?? ???. ????? ?????????? ????? ???????, ?? ????? ??????????? ? ???????. ?????? ???????? ??? ?????? ?????????? MP3, ???????? ??? ???????. ????????? ???????? ?? Java 1.5 ? ???????? ??? Windows ? Linux. http://mwynwood.com/blog/?p=99 […]

  30. joe says:

    July 7th, 2009 at 2:19 am

    This looks quite promising. I wonder if your new version will allow comparing of multiple directories instead of just dups in same directory. And maybe support resizing of large images in the preview pane.

Leave a Reply




By leaving a comment here, you agree that you are fully responsible for its content, not me. You may not post content that is libelous, defamatory, obscene, abusive, that violates a third party's right to privacy, that otherwise violates any applicable local, state, national or international law, or that is otherwise inappropriate. I obviously can not be held responsible for any comments posted here and I reserve the right to edit or remove any comment that is inappropriate. So please, don't be silly!

Count Downs