Photography is a great hobby. It can be easy or challenging. It’s as simple as pushing a button and it can require dozens of pieces of equipment and complex formulae. It’s as cheap or expensive as you could dream. It can be high art or clinical documentation. For all of these reasons, I love it. However, this isn’t about that.
This is about storing your photos digitally and a process I went through recently to clean them up. Most importantly, this has very little to do with photography and more to do with code. Specifically, bash scripting.
Browsing through Lightroom, I realized that I was missing some months of photos. This was easy to fix, since I keep everything backed up to an external, but I ran into a very annoying problem – some, but not all, of my photos were duplicates. The first problem I ran into were double and triplicate files like IMG_0001.CR2, IMG_0001-1.CR2 and IMG_0001-2.CR2.
Here’s how I solved that with one simple command. From the command line on Linux or a Mac, I ran this command from the top directory that stores all my photos:
find $PWD -regex '.*-[0-9]\.[jJcCdD].*' | xargs -I{} mv {} ~/tmp/
This first part of this command displays all files (including files in subdirectories) from your current location ending with an extension of a hyphen, some number and then a capital or lowercase j, c or d. For example: -1.JPG, -2.cr2 or -1.dng. The second half moves them all to a directory I created, ~/tmp/ so they’re not in my photography directory anymore. Alternately, I could just delete them.
The second problem I ran into was that some of my photos were stored in multiple formats. Meaning I’d have IMG_0001.JPG, IMG_0001.CR2 and IMG_0001.dng. I also couldn’t just delete all the JPG and dng files, because not all my photos had this problem, only some of them.
For this issue, I had to write a full-fledged script:
#!/bin/bash if [ $1 ] then cd $1 # this way if it's a relative path we'll still grab the absolute path fi DIR=$PWD TRASH="~/tmp/" LASTFILE=## for i in `find $DIR` do if [ -d $i ] then continue fi NEWFILE=$i if [ ${LASTFILE%.*} == ${NEWFILE%.*} ] then if [ ${LASTFILE#*.} == "CR2" -o ${LASTFILE#*.} == "cr2" ] then echo "Keep: ${LASTFILE##*/} / Delete: ${NEWFILE##*/}" mv $NEWFILE $TRASH continue fi if [ ${LASTFILE#*.} == "DNG" -o ${LASTFILE#*.} == "dng" ] then if [ ${NEWFILE#*.} == "CR2" -o ${NEWFILE#*.} == "cr2" ] then echo "Keep: ${NEWFILE%.*} / Delete: ${LASTFILE##*/}" mv $LASTFILE $TRASH continue fi if [ ${NEWFILE#*.} == "JPG" -o ${NEWFILE#*.} == "jpg" ] then echo "Keep: ${LASTFILE##*/} / Delete: ${NEWFILE%.*}" mv $NEWFILE $TRASH fi fi if [ ${LASTFILE#*.} == "JPG" -o ${LASTFILE#*.} == "jpg" ] then echo "Keep: ${NEWFILE%.*} / Delete: ${LASTFILE##*/}" mv $LASTFILE $TRASH fi fi LASTFILE=$NEWFILE done
What this script does is look at every file in all the subdirectories where you run it and compares each one to the next one. If the names (minus file extension) match, it figures out which one to keep. The preference here is CR2 > DNG > JPG. It moves the one you’re not keeping to the ‘trash’, which in this case is the same ~/tmp directory I created earlier.
In the end, I removed nearly 3600 duplicate photos. Way to many to have done this by hand!
I hope these commands might help someone else in the future. I’ll answer any questions as best I can.
0 Responses
Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.