WeeDuplicateDetective 1.5

WeeDuplicateDetective is a helpful tool if you wish to get rid of duplicates of files. Its use is simple, and the duplicates found are organized efficiently, giving you full control over the clean-up process.

Index

Quick start guide

Using this software is easy, the interface is organized in the reading order (left to right and top to bottom).

Besides, a basic tutorial is displayed inside the application.
Look for the string beginning with "1." in the list of folders to scan. Follow the instructions, the next step will appear as soon as this one is performed.

  1. Add one or more folder to the list by
    1. either drag-and-dropping them in the list of folders to scan
    2. or clicking the button [BROWSE], selecting the folder you want to clean and then clicking on the button [ADD TO THE LIST]
  2. Select one or more comparison criteria and click on the button [FIND DUPLICATES]
  3. Once all duplicates that follow your comparison criteria have been found, mark all that you want to be removed and click on the buttom [CLEAN THE SELECTED DUPLICATES]
  4. Choose the cleaning method that best suits your needs
  5. Done!
[Back to the index]

What is a duplicate?

Basically, a duplicate is a copy of a file. That means that the content of the two files is the same.

In WeeDuplicateDetective, you can define what you want to consider as a duplicate, among specific criteria. For instance you may want to consider as duplicates all files in a specific folder sharing the same name, regardless of their content.

In order to compare the content of the files, WeeDuplicateDetective  uses a hash sum (also called "control sum"). This is a method that allows to calculate a rather unique identifier depending on the content of a file. Once computed this identifier uses only a few bytes of memory, and the file does not have to be read again (this is not the same as compression though, since the content of the file cannot be retrieved from its hash sum). See it as an indexing system, where each file will have its content indexed. If two files have the same content, then the hash sum will be the same, and vice-versa: if two has sums are identical, then so are the contents of their file.

The uniqueness of this identifier is not absolutely guaranteed when you use the hash sum method called MD5. It's the best compromise in terms of performances though.
There are indeed some rare cases where two files with a different content will produce the same control sum, this is called a "collision". To reduce even further, to a negligible amount, the risk of collisions, WeeDuplicateDetective  provides you with alternative, albeit a bit slower, hash sum calculation methods: SHA1 and SHA-256.

Typically, using MD5 will be more than enough though.

[Back to the index]

Detailed look at this software

Index

General note about the user interface

Whether WeeDuplicateDetective  looks good or not is a matter of taste, but there are some grounds rules that its user interface respects:

  • very few graphical elements - almost all elements are textual. Leaving out abstract icons allows for a better comprehension of what does what. Besides, I am not (really not) a graphic designer
  • western reading order - the main tasks of WeeDuplicateDetective  (selecting the folders, selecting the comparison criteria, scanning for duplicates, presenting the duplicates, and cleaning them) are presented left to right and top to bottom
  • Very descriptive text - everybody can read but not everybody can guess a functionality or its result by reading one word. Besides, this helps a lot if you are using a screen reader.
  • Clearly separated areas for the different tasks
  • Clearly identify what the user has to do next

All in all, this may make the UI a bit "full" and a bit devoid of color, but general usability is my main goal here.

[back to the index] [back to the main index]

Folders to scan

WeeDuplicateDetective  needs to know where to search for duplicate files. This information is provided on the top of the application. You can give several folders,

  • either by dragging and dropping them on the list (you can drag and drop several folders at once)
  • or by browsing your computer: button [BROWSE] and then button [ADD TO THE LIST]

You can remove a folder from the list by selecting it first, then clicking the button [REMOVE FROM LIST] (this operation will not delete the folder on the computer).

The order of the list items is important since WeeDuplicateDetective  will scan each one after another, starting with the top-most one. It is therefore possible to modify this order: select a folder and click on either [MOVE UP] or [MOVE DOWN] to move the item in the list.

help-folders

[back to the index] [back to the main index]

Comparison Criteria

Directly below the folder selection area you will need to select what criteria files must fulfill to be considered duplicates.

For instance,

  • if you select "Same size", then all files in the given folders that have the EXACT same size will be considered as duplicates of one another
  • if you select "Same size" and "Same name", then all files in the given folders that have the EXACT same size AND EXACTLY the same name will be considered as duplicates of one another

If you select "Same content" the scan will last much longer because WeeDuplicateDetective  has to read the full content of each file (something it does not do otherwise).

In "advanced Criteria" you can find less used duplicate criteria:

  • the date when the file was created
  • the date when the file was last modified
  • the hash algorithm used to compare the content of the scanned files. If you do not know what a hash algorithm is, then leave this option as it is. Actually changing it will have very little effect on the result: the accuracy of finding duplicates will only increase in very specific cases. Changing this option may also result in increasing the time needed for WeeDuplicateDetective  to complete its scan task.

You can filter the files types you want to be scanned (forgetting about everything else). The filter is based on the extension of the files. Some common file types are already defined. In case you need more, click on [EDIT...]

[back to the index] [back to the main index]

Define a file filter

You will get this dialog if you clicked on [EDIT...] in the comparison criteria area

The predefined file type filters cannot be modified here, but you can add new ones, or copy a pre-defined one and edit the copy.

  • Select a filter and click on [COPY] to make a copy, on [DELETE] to delete it from the list
  • Fill in the fields "file type" and "extensions" and click either [ADD] to add a new file type filter or [EDIT] (if available) to modify the selected filter.

Make sure the files type extensions are properly entered. To describe an extension, use "*." and the file extensions. E.g. *.jpg

To describe a list of extensions, separate them with a comma (the comma can be followed by a space, but this is not mandatory). E.g. *.jpg, *.bmp, *.jpeg

Clicking on [CLOSE] will save your changes and close this dialog.

[back to the index] [back to the main index]

Finding duplicates

Once you have selected at least one comparison criteria, the button [FIND The DUPLICATE FILES] will be enabled. It is located on the right of the area were you choose the comparison criteria.

Click on it to start finding the duplicates files in the folders you have chosen. A dialog window will appear, showing

  • the progress of the scan process - depending on how many files you have in the folders, and on whether you want to compare the actual content of the files as well (comparison criterion "Same content") this can last from a few seconds to hours.
  • the number of files to analyze - parallel to the scan, WeeDuplicateDetective  will determine how many files there are to analyze, this can last a a while as well (though not as long as the scan it self), therefore, instead of a number, you may read the text "(computing)" instead.
  • a big, red [STOP] button - this allows you to interrupt the scan process. The duplicates found so far will be shown. In order to scan the remaining files, you will need to restart a scan process (click on the button [FIND THE DUPLICATES FILES]) again. This is so, because files to scan may have been created or modified in the meantime, and WeeDuplicateDetective  would miss those changes.

[back to the index] [back to the main index]

File tree of the duplicates

Once the scan process is done, the file tree shows several things:

  • the duplicates are grouped together. the first item of each group is considered the original, regardless of file creation time: it's simply the first file of the group that was found by WeeDuplicateDetective . You can modify this and sort the files by creation time at a later time
  • Each item has a check-box, this is one of the three methods with which you select the files you want to be cleaned. You can select files manually, that means by directly clicking this check-box, and also with the auto-selection feature or the right-click pop-up menu (see below). All selection methods can work together to achieve fast but powerful results.
Files deemed unsafe for deletion by WeeDuplicateDetective  are:
  • Files with the system attribute set on
  • Files located in the WINDOWS folder and its sub-folders
All other files are considered safe by WeeDuplicateDetective , but it does not mean that you should still delete them. For instance, you may want to exclude backup files from the clean up process. How you can do this is explained below.
  • Columns in the files list:
  1. File name, this should be obvious
  2. Files unsafe for deletion are shown in bold, and a big red "No" can be seen in the 2nd column, titled "Safe?". If WeeDuplicateDetective  thinks the file is safe for deletion, the "Yes" will be shown.
  3. the amount of duplicates (not including the first found file) can be seen in the 3rd column, titled "#" (this symbol is commonly used to mean "amount" or "number")
  4. The path where the file is stored. This is the full path, including the drive letter. If you sort the list per Path, the drive letter will be taken into account by the sorting mechanism.
  5. the date the file was created
  6. the last time the file's content was modified
  • If you right-click on a list item (a "list item" is a line in the file list with all the information mentioned above about the corresponding file) you will get a pop-up menu. 
    • "Open this file" will try to open the file listed, as if you double-clicked on it
    • "Locate this file" will open the Windows File Explorer at the path where the file is stored (also, the file will automatically be selected there)
    • the menu items in the section "Select for cleanup" will give you several possibilities to select files in the current duplicate group. The current duplicate group is the group of duplicates to which the item you clicked on belongs
    • the menu items in the section "Unselect" offers similar actions, but with the opposite goal: deselecting files

[back to the index] [back to the main index]

Search bar

You can search the list of files for files which information contains the text you give in the big text block (on the screenshot below, this text area contains the text "res").

It's simple to use:

  1. select the type of information you where want to search (name of the files, its path, its size, its date, or even all of the information)
  2. type the text you want to search for (wildcards like '*' or '?', or regular expressions are accepted)
  3. click on [Find First] to find the first occurrence in the list where the text is matched.
  4. click on [Find Next] to find the next occurrences
The text area will remember your previous searches (this information is not saved if you close WeeDuplicateDetective  though), just click on the arrow pointing down.Also, only actual searches are remembered, this means that for a search pattern to be remembered, you need to have clicked at least once on [Find First].

Using regular expressions
When the first character of the search pattern is \ the rest is evaluated as a regular expression. This means that the first \ is not part of the regular expression, but is just here to help WeeDuplicateDetective  know the you are using one.
Actually, search patterns without \ as the first character, but containing the wildcards * and ? will be internally transformed into the corresponding regular expression, with a few specificities:

Regular expressions
Wild cards only
Case sensitive
Case insensitive
taken as is

^ and $ automatically added
. is replaced by \. and $ by \$
* is replaced by .*, and ? by .?

A syntax error in the regular expression will be notified to you, but it still may not return exactly what you want. This is usually because the regular expression you give is not correct. There are several good regular expression checkers on the Internet.

If you do not know what a regular expression is but are curious about it, then google it :)


[back to the index] [back to the main index]

Statistics

On the bottom you can see some statistics about WeeDuplicateDetective 's findings.

The most interesting are

  • how much space you could spare if you deleted all duplicates of file except only one instance in each duplicate group.
  • how many files are currently selected and how much disk space you would save if you clean them.

[back to the index] [back to the main index]

Side bar - Select items

This panel offers you a lot of helpful options to quickly mark a bunch of files for deletion.

The panel is contains three groups:

  1. the first five elements (radio buttons) are the selection mode. In other words, you chose here one main rule with which to select files in the list
    1. "all but each original" - for each group, all files will be selected, except the first one
    2. "each original only" - for each group, only the first file will be selected
    3. "all but original and 1st copy" - for each group, all files will be selected, except the first and the second one
    4. "each last copy" - for each group, only the last file will be selected
    5. "all files" - for each group, all files will be selected
  2. the next two elements allow to fine tune the selection mode
    1. "match the search pattern" - this check-box is available only if the text area in the search bar is not empty. If you check this check-box, the the files to be selected according to the selection mode will actually be selected only if they match the search pattern. See the example below.
    2. "Exclude unsafe files" - this ensure that files deemed unsafe for deletion will not be selected 
  3. The next three elements are the buttons that will launch the selection process
    1. [Select files] / [Unselect files] - select or unselect files according to your selection criteria
    2. "Unselect all" - self explanatory

Files deemed unsafe for deletion by WeeDuplicateDetective  are:

  • Files with the system attribute set on
  • Files located in the WINDOWS folder and its sub-folders

Example

How to select all *.BAK files found by WeeDuplicateDetective :

  1. in the search bar: select "Name" and enter ".bak" in the text area
  2. in the selection area: click on "all files"
  3. click on "match the search pattern"
  4. click the button [Select items]. Done.

[back to the index] [back to the main index]

Side bar - Organize the list

Three areas here again.

1. Expand / Collapse tree nodes

Expand or collapse the branches in the tree of files. In other words this will hide or show the copies for all groups of duplicates.

2. Sort the list items

Change the order in which the files are shown. Initially the files are shown in the list in the order in which they are found on the disk. You can easily change this order:

Sort - What to sort

  • "originals only" - the copies will not be sorted, only the originals will be , in respect to each other
  • "duplicates only" - inside each group and skipping the original file, all the copies will be sorted in respect to each other
  • "All files" - 1st all the copies in each group are sorted, including the original, then the originals are sorted in respect to each other

By - by which criteria to sort

  • When 'what to sort' is set to "originals only" then you get an additional choice: sort by "amount of duplicates".

Order

  • hopefully self-explanatory

3. Remove items from the list

You have the possibility to remove items from the list.This can be useful to exclude a specify type of file (therefore making sure it will not be cleaned) or, on the other hand, focus on a specific file type.

Once an item is removed from the list, it cannot be displayed again. The only way to get all the items back in the list is to launch the scan process again.

This operation does not actually remove any stored file, it only removes its reference in the file list in WeeDuplicateDetective .

The options are:

  • "have no duplicates anymore" - this will remove from the list all duplicate groups that contain only one item
  • "match the search pattern" - this will remove from the list all items that match the search pattern
  • "do NOT match the search pattern" - opposite of the above
  • "are unsafe for cleaning" - this will remove from the list all items that are deemed unsafe for deletionby WeeDuplicateDetective
  • "are checked for deletion" - this will remove from the list all items that are checked. Again, this will remove only the list item and not the real file.You can use this to clean up the list by first selecting all the files you know you want to keep and then using this option

Example

How to remove .BAK files from the list to make sure they will not be cleaned:

  1. in the search bar: select "Name" and enter ".bak" in the text area
  2. in the "organize" area: select "match the search pattern"
  3. click on the button [Remove]. Done.


[back to the index] [back to the main index]

Side bar - Manage and preview

This area is available once a file is selected in the list. You do no need to check the file for clean-up to select it, a simple left click in the list area, on any line is enough.

"Manage" consists of two buttons:
[Open it] - the selected file will be opened as if you double clicked on it in a file explorer, for instance. Which program/application is used to open the file depends on the file associations defined in your system.
[Locate it] - The folder containing the selected file will be opened and the file will be selected in the file explorer.

Those two operations are also available in the context menu.


"Preview" is a very primitive media player/viewer.

Once a file is selected, you can check "Preview the selected file" and WeeDuplicateDetective  will try to display or play the file if it is supported.

Pictures are simply displayed, with no additional control or information.
Supported pictures formats: JPG, GIF (also animated), PNG, BMP and TIFF

Videos and music files get the usual controls:

  • 3 buttons: [play], [pause] and [stop]
  • a timeline slider
  • a volume control

Click on [play] to start playing the video or the sound file. If you get an error message or if nothing happens, it means that the file format is not supported. Also, said error message comes from the codecs themselves, not from WeeDuplicateDetective . If you do not know what a codec is, ask Google :)
Supported video / sound formats: AVI (DivX/XviD), MPG, WMV, WMA, MID, MP3. Depending on the codecs installed on your system as well as on the version of .NET available, you may be able to playback more file formats (OGG, MP4, etc.).

This media viewer is not intended as a "real" media viewer (click on [Open it] if you want real playback), it's more intended as a quick view to help decide if a file should be cleaned or not.


[back to the index] [back to the main index]

Cleaning the duplicates

Once you have made your choice of which files you want to be cleaned, you can start with the actual cleaning process

  1. Click on the button [Delete selected duplicates], located on the bottom right
  2. Choose the cleaning method.
  3. If you chose to move the file, you will need to indicate to which folder
  4. Click on [START] to, well ... start the cleaning process. You will see a progress bar, and the amount of files yet to be cleaned will decrease. You can interrupt the clean up by clicking on [CLEAN]
  5. once the process is finished or interrupted, you can click on [CLOSE] to go back to the main window.

About the cleaning method "Move the files to a folder"
When choosing "Move files", the folder structure in which the files are stored originally will not be kept. The files will a similar name will nevertheless not overwrite each other. Indeed, if a file with a similar name already exists in the target folder, then the files being moved will be renamed using the following pattern: <file name>_<number>.<file extension> where number starts at 000001 and can go up to 999999.

1.
2.
3.
[back to the index] [back to the main index]

Misc: Title bar / About Box / Side bar Grip / Resize Grip

Title Bar

Additionally to the usual title bar buttons (Minimize, Maximize and Close) you can see a 4th button on the right end. Click on it to display this help file.


About box

Press the key combination Ctrl+F1 to see the about box. Nothing spectacular though here.

Side bar grip

Click on this vertical bar and move the mouse while holding the click.
This will resize the list area and the side bar.

Resize Grip

This area allows you to resize the main window. It's located on the bottom right corner. Click on it and move the mouse while holding the click.


[back to the index] [back to the main index]

Release notes

1.5.0.0 - 2012/01

  • remember the search patterns (in the combobox)
  • lighter UI
    • less borders, less color gradients and less animations (speeds up the UI)
    • the process of finding duplicates has its own dialog
    • layout made a bit less "full" for some options, mostly in the right side bar
    • removed a couple of unneeded text fields (ironically, I had duplicate information...)
  • files unsafe for deletion
    • show those files (files in the Windows directory and system and hidden files)
    • exclude those files from being selected or even remove them from the list
  • search
    • all fields at once
    • using wildcards (* and ?) or a regular expression
  • allow filtering of search files per file extension
    • new dialog to edit the type filters
  • add one criteria to remove files from the list
  • bug fixes
    • display of amount of files cleaned & of files scanned
    • crash when cleaning all files in a duplicate group
    • cursor when finding duplicates
    • remove files from list: stats not updated
  • help file (FINALLY!) - html file, with style reminiscent of 914915.onphp.net :)

1.1.0.0 - 2011/08/14

  • allow cancel during search for duplicates
  • support for animated gifs
  • bug fix:some minor issues

1.0.0.0 - 2011/07

  • drop shadows on borders
  • add remove from list criteria: files that do not math the search pattern
  • "unselect items" button
  • add '?' button in main clean window title
  • change: 'Close' to 'Cancel' during clean up, and 'Close' when it's done
  • paralellize sort
  • add progress for sort
  • bug fixes
    • bug fix: volume slider
    • bug fix: duplicate count
    • bug fix: sparable space count update during clean up
    • bug fix: disable Find duplicates button while cleaning
    • bug fix: done counter in the deletion dialog
    • bug fix: amount of duplicates (converter)

0.9.8.0 - 2011/06/26

  • different sortings of the tree
  • order by: file name / path / size / # of copies / date
  • sort: +button : remove singles from list
  • bug fixes
    • bug fix: use search Pattern: if already checked, do not change
    • bug fix: file stats (total #files, total space, total space sparable, etc.) not updated after clean up
    • bug fix: context menu over file tree list items is now displayed
    • bug fix: selected counter when clicking on find duplicates
    • bug fix: scroll behavior of the tree list view
 [back to the main index]


2011/2012 - Guillaume Ranslant - 914915 Software (content and design (HTML and CSS(3)), all software)