Skip to content


Cleaning Up Photo Storage

Photography is a great hobby. It can be easy or challenging. It’s as simple as pushing a button and it can require dozens of pieces of equipment and complex formulae. It’s as cheap or expensive as you could dream. It can be high art or clinical documentation. For all of these reasons, I love it. However, this isn’t about that.

This is about storing your photos digitally and a process I went through recently to clean them up. Most importantly, this has very little to do with photography and more to do with code. Specifically, bash scripting.

Browsing through Lightroom, I realized that I was missing some months of photos. This was easy to fix, since I keep everything backed up to an external, but I ran into a very annoying problem – some, but not all, of my photos were duplicates. The first problem I ran into were double and triplicate files like IMG_0001.CR2, IMG_0001-1.CR2 and IMG_0001-2.CR2.

Here’s how I solved that with one simple command. From the command line on Linux or a Mac, I ran this command from the top directory that stores all my photos:

find $PWD -regex '.*-[0-9]\.[jJcCdD].*' | xargs -I{} mv {} ~/tmp/

This first part of this command displays all files (including files in subdirectories) from your current location ending with an extension of a hyphen, some number and then a capital or lowercase j, c or d. For example: -1.JPG, -2.cr2 or -1.dng. The second half moves them all to a directory I created, ~/tmp/ so they’re not in my photography directory anymore. Alternately, I could just delete them.

The second problem I ran into was that some of my photos were stored in multiple formats. Meaning I’d have IMG_0001.JPG, IMG_0001.CR2 and IMG_0001.dng. I also couldn’t just delete all the JPG and dng files, because not all my photos had this problem, only some of them.

For this issue, I had to write a full-fledged script:

#!/bin/bash

if [ $1 ]
then
	cd $1 # this way if it's a relative path we'll still grab the absolute path
fi

DIR=$PWD
TRASH="~/tmp/"
LASTFILE=##

for i in `find $DIR`
do
	if [ -d $i ]
	then
		continue
	fi

	NEWFILE=$i
	if [ ${LASTFILE%.*} == ${NEWFILE%.*} ]
	then
		if [ ${LASTFILE#*.} == "CR2" -o ${LASTFILE#*.} == "cr2" ]
		then
			echo "Keep: ${LASTFILE##*/} / Delete: ${NEWFILE##*/}"
			mv $NEWFILE $TRASH
			continue
		fi
		if [ ${LASTFILE#*.} == "DNG" -o ${LASTFILE#*.} == "dng" ]
		then
			if [ ${NEWFILE#*.} == "CR2" -o ${NEWFILE#*.} == "cr2" ]
			then
				echo "Keep: ${NEWFILE%.*} / Delete: ${LASTFILE##*/}"
				mv $LASTFILE $TRASH
				continue
			fi
			if [ ${NEWFILE#*.} == "JPG" -o ${NEWFILE#*.} == "jpg" ]
			then
				echo "Keep: ${LASTFILE##*/} / Delete: ${NEWFILE%.*}"
				mv $NEWFILE $TRASH
			fi
		fi
		if [ ${LASTFILE#*.} == "JPG" -o ${LASTFILE#*.} == "jpg" ]
		then
			echo "Keep: ${NEWFILE%.*} / Delete: ${LASTFILE##*/}"
			mv $LASTFILE $TRASH
		fi
	fi
	LASTFILE=$NEWFILE
done

What this script does is look at every file in all the subdirectories where you run it and compares each one to the next one. If the names (minus file extension) match, it figures out which one to keep. The preference here is CR2 > DNG > JPG. It moves the one you’re not keeping to the ‘trash’, which in this case is the same ~/tmp directory I created earlier.

In the end, I removed nearly 3600 duplicate photos. Way to many to have done this by hand!

I hope these commands might help someone else in the future. I’ll answer any questions as best I can.

Posted in Linux, Mac.


0 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.



Some HTML is OK

or, reply to this post via trackback.