Written on the 27th of February, 2005
Posted in Linux, Microsoft, Open Source, Software
dupliFinder
dupliFinder is a graphical tool that searches directories on your computer for duplicate files by checking and compairing the MD5 sum of each file. This means that the contents of the file is examined, not the filename.
You then have the option of reviewing the duplicate files, and then deleting them. It’s great for finding duplicates in your MP3, image or movie collections. Versions 0.09 and down were written in Java 1.5 but the latest release I’m working on will be written with the .NET Framework 2.0
- SourceForge Project Page
- Download dupliFinder
- How do I run a .JAR file on Windows or Linux?
- Project Statistics
dupliFinder 0.09 on Windows XP
dupliFinder is free, open source software © Marcus Wynwood
Running .JAR files on Windows and Linux
Windows XP
I think that Java 1.5 does all this for you, but if it doesn’t all you need to do is make sure that .jar is associated with “javaw”. One way to do this is to:
- Open up your control panel
- Open “Folder Options”
- Click the “File Types” tab
- Select the JAR extension from the list. If a jar extension can not be found, create a new one, and then select it.
- Click the Advanced button under the details panel.
- Select the open action and click edit, or create a new action for open if there isn’t one.
- Make sure that the “Application to perform the action” is:
"c:\jdk1.5.0\jre\bin\javaw.exe" -jar "%1" %*The default path is “c:\jdk1.5.0\jre\bin” so you’ll have to change it if you installed Java somewhere else.
Linux
I think that Java 1.5 does all this for you, but if it doesn’t all you need to do is make sure that .jar files are associated with “java -jar“. If your distro uses Gnome, you can just right click and go “Open With...” to set this up.
April 18th, 2005 at 12:12 pm
Exception in thread “AWT-EventQueue-0″ java.lang.OutOfMemoryError: Java heap space
Perhaps because it found too many duplicates: over 1000 as reported by another duplicate finder. The duplicates are across about 5 checksums, it appears. dupliFinder finds and prints text about matches until it dies.
After I get the error, the program can’t exit either. I have to kill it. When I try to exit, it just gives another out of memory error.
My system has 1GB of RAM. I ran dupliFinder with about 280MB free and watch my memory run out.
It died processing the directory of the program at:
jtcfrost.sourceforge.net
April 18th, 2005 at 5:18 pm
It looks like I’m not managing memory as well as I could.
At the moment all the calculations happen in memory, this is fast for small numbers of files, but it looks like it’s no good for lots of files. Maybe I need to use some temp files instead. I might try that in the next verison.
Thanks for your help - and thanks for trying dupliFinder
April 25th, 2005 at 11:07 am
Nice project.
But it can be much more efficient if it first check for size of file.
I.e. first parse file listing so that only files with duplicate sizes left and then calculate checksums for them.
April 25th, 2005 at 4:31 pm
Just been looking for a *good* duplicate file finder program. Imagine my amazwement when your page turned up! Haven’t tried the program yet, thoguht I’d send out mad greets from Hobart though.
Cheers,
Pete.
April 25th, 2005 at 7:51 pm
Thanks Desummoner
Version 0.1 has just been released
May 9th, 2005 at 1:31 am
It’ be nice if it did more than found *identical* copies (it’s called duplicate finder afterall).
FSlint will find indetical copies for me, wht about visually identical or similar files? There
are a few console apps listed on freshmeat (and several windows apps) that’ll do this
but nothing cross-platform/linux that’ll let me dblcheck the results visually myself.
May 9th, 2005 at 1:24 pm
duplicate: Identically copied from an original (Dictionary.com)
dupliFinder will let you preview images and text in the preview frame like in the screenshots. This is good if you wanna look at them yourself, but maybe you mean something that finds files that are *nearly* the same?
Give me an example of what you mean and I’ll see if I can do it for the next version
Thanks!
May 29th, 2005 at 9:49 pm
I merged a bunch of font collections, about 2000 files, and Duplifinder is very promising, except that I need to run “Find Duplicates” several DOZEN times to get it to find all of the duplicates, sometimes failing to delete files which don’t even have a read-only attribute. As soon as the algorithm finds 0 duplicates, I still need to find the rest manually using “FC /a filename1 filename2″ (for windows files…haven’t tested with linux/other)… Good work, but I hope you continue this and fix it. I’ve been looking for something like this for a long time to save time.
July 18th, 2005 at 8:15 pm
Bug fix coming soon…
September 10th, 2005 at 3:49 am
It looks like dupfinder only works on ONE directory. I want to review many directories over many drives. How can I do this?
// Walt //
September 10th, 2005 at 1:06 pm
At the moment dupliFinder will scan recursivly from one starting point. It can’t look in more than one starting point at a time. Maybe that’s an idea for the next version
If you are using Linux, the dupliFinder Script CAN look in multiple directories - give that a try
September 16th, 2005 at 2:41 pm
[…] Maybe create a .net version of dupliFinder […]
October 1st, 2005 at 7:53 pm
It seems files with spaces in the names are not found. I copied some files to a directory. Then copied and pasted them in the same directory using windows, so the names became Copy of …, Copy of …, and used the program and it didn’t notice they were copies until I renamed Copy of, to filenameoriginalA.ext etc. then it worked.
October 3rd, 2005 at 12:27 pm
Version 0.09 still seems to be the best.
I’m working on making a .net version of dupliFinder - would anyone be interested in that?
November 5th, 2005 at 8:55 am
I tried today your program to make order in my backups. I moved all my data in a brand new disk, E:, with several subdirs and a lot of files (about 700 Gb of files).
I tried first to run it from E:\ but it reported 0 duplicate files immediately. It is like it doesn’t ever care to recurse directory !!
Then I created a “root” dir and moved all the other dirs in it; tried to scan e:\root but got same results.
If I select E:\root\dirA\dirB , it works, scanning recursively.
I’m using a windows 2000 advanced server with java 1.5 just installed.
Francesco
November 11th, 2005 at 12:28 pm
My wife always wanted twins so does this program work in reverse (or can you modify it so it creates twins)?
November 19th, 2005 at 1:14 am
Well done with the program! I wish I had downloaded it sooner! Dad had made a complete mess of the computer with copies of copies of copies of files… some with the same file name, and some different, and always in different folders! I did the first scan… just of “My Documents” and it found over a thousand duplicates! I decided delete manually first, mainly via drag & drop (didn’t want to delete anything by mistake). I got it down to just under 200, then used duplifinder’s lovely “Delete” Function in subfolders. Currently at 40 duplicates (still in My Documents) and will be finished very soon… finally!!!
When I’m finished, I think I’m going to have to educate my father to not make this huge mess again!!
P.S. One suggestion - could you make a button at the top of the checkbox column to “unselect all”, or leave them blank to begin with, as I often found I had to delete the ones on the right due to their location.
December 1st, 2005 at 7:37 am
anticipated to see the next release, when will it be available.?
January 30th, 2006 at 9:46 pm
Hello to everybody.
I want to testify that this program HAS PROBLEMS. It failed to recognize duplicate files that I was able to find with one I wrote myself.
The java version sometimes fails immediately, sometimes starts and works indefinitely and there are no clues on what it is doing…..
January 31st, 2006 at 11:01 am
Yeah, it’s got a few bugs - but nothing major. Version 0.08 seems to be the best one. The .NET one looks cool - can’t wait till that’s ready to download.
March 11th, 2006 at 11:38 pm
[…] […]
May 11th, 2006 at 11:16 pm
Hello!
First of all:
-Nice tool!
Some suggestions:
-dupliFinder.log should be more csv like. so i can import it in excel
-Offer a button to open the file
-Right-click on duplicate offers menu:
–open file
—left
—right
—both
–delete file
—left
—right
Bug-like occurrences:
-when i run through the list of duplicates by keys dupliFinder doesn’t change the preview
But in total: Nice Tool!
Bye
Duese
May 11th, 2006 at 11:19 pm
p.s. the button to open the file (left+right) should appear in below the preview menu…
October 16th, 2006 at 9:15 am
I’d like to have it search multiple directories. Is this possible?
November 24th, 2006 at 10:56 am
how about a copy duplicates to folder option? Somtimes i don’t want to delete just yet
November 24th, 2006 at 11:45 am
Thanks for all the suggestions - I’ll try and add them to the next version
December 12th, 2007 at 5:26 pm
i stumbled apon your program on source forge quite a few months ago.
i cant say if you ever read this page any more, as the last comment was over a year ago.
however, some suggestions that you may wish to implement in your java version is to thread the duplicate finding, allow for background cpu usage (set it and forget it). additionally, i am not certain if your code takes this into account, as your in code comments are slightly confusing to me, as well as no javadoc, but you should probably consider improving the folder-recursing functionality. i have tested your implementation of the program on a folder which contains one subfolder that i know to contain duplicates (because i put them there myself) and several others with no duplicates, or duplicates of the file already in question. many times the program returns a false negative.
a suggestion for that, in order to cut down on search time would be to impliment a hashmap.
but as i said, i am uncertain if you do or do not do that, as the last time i read through your code i was a bit thrown off from the commenting.
if you no longer work on this program, i would be thrilled to take over development from you. i am currently a computer science student and try to take every opportunity to do real world programing.
cheers!
-Mike
May 21st, 2008 at 10:05 pm
Hi Mike, I’ve sent you an email about this.
I’d be happy for you to expand on what I have started
January 8th, 2009 at 8:50 pm
[…] dupliFinder? - ??? ??????????? ??????? ??? ?????? ????????????? ?????? ? ??????????? ?? ?????????? ????? ????????? ?? ??????????? ???? MD5. ??? ????????, ??? ???????????? ?????????? ??????, ? ?? ???. ????? ?????????? ????? ???????, ?? ????? ??????????? ? ???????. ?????? ???????? ??? ?????? ?????????? MP3, ???????? ??? ???????. ????????? ???????? ?? Java 1.5 ? ???????? ??? Windows ? Linux. http://mwynwood.com/blog/?p=99 […]
July 7th, 2009 at 2:19 am
This looks quite promising. I wonder if your new version will allow comparing of multiple directories instead of just dups in same directory. And maybe support resizing of large images in the preview pane.