Cleaning Up Photo Storage

Photography is a great hobby. It can be easy or challenging. It’s as simple as pushing a button and it can require dozens of pieces of equipment and complex formulae. It’s as cheap or expensive as you could dream. It can be high art or clinical documentation. For all of these reasons, I love it. However, this isn’t about that.

This is about storing your photos digitally and a process I went through recently to clean them up. Most importantly, this has very little to do with photography and more to do with code. Specifically, bash scripting.

Browsing through Lightroom, I realized that I was missing some months of photos. This was easy to fix, since I keep everything backed up to an external, but I ran into a very annoying problem – some, but not all, of my photos were duplicates. The first problem I ran into were double and triplicate files like IMG_0001.CR2, IMG_0001-1.CR2 and IMG_0001-2.CR2.

Here’s how I solved that with one simple command. From the command line on Linux or a Mac, I ran this command from the top directory that stores all my photos:

find $PWD -regex '.*-[0-9]\.[jJcCdD].*' | xargs -I{} mv {} ~/tmp/

This first part of this command displays all files (including files in subdirectories) from your current location ending with an extension of a hyphen, some number and then a capital or lowercase j, c or d. For example: -1.JPG, -2.cr2 or -1.dng. The second half moves them all to a directory I created, ~/tmp/ so they’re not in my photography directory anymore. Alternately, I could just delete them.

The second problem I ran into was that some of my photos were stored in multiple formats. Meaning I’d have IMG_0001.JPG, IMG_0001.CR2 and IMG_0001.dng. I also couldn’t just delete all the JPG and dng files, because not all my photos had this problem, only some of them.

For this issue, I had to write a full-fledged script:

#!/bin/bash

if [ $1 ]
then
	cd $1 # this way if it's a relative path we'll still grab the absolute path
fi

DIR=$PWD
TRASH="~/tmp/"
LASTFILE=##

for i in `find $DIR`
do
	if [ -d $i ]
	then
		continue
	fi

	NEWFILE=$i
	if [ ${LASTFILE%.*} == ${NEWFILE%.*} ]
	then
		if [ ${LASTFILE#*.} == "CR2" -o ${LASTFILE#*.} == "cr2" ]
		then
			echo "Keep: ${LASTFILE##*/} / Delete: ${NEWFILE##*/}"
			mv $NEWFILE $TRASH
			continue
		fi
		if [ ${LASTFILE#*.} == "DNG" -o ${LASTFILE#*.} == "dng" ]
		then
			if [ ${NEWFILE#*.} == "CR2" -o ${NEWFILE#*.} == "cr2" ]
			then
				echo "Keep: ${NEWFILE%.*} / Delete: ${LASTFILE##*/}"
				mv $LASTFILE $TRASH
				continue
			fi
			if [ ${NEWFILE#*.} == "JPG" -o ${NEWFILE#*.} == "jpg" ]
			then
				echo "Keep: ${LASTFILE##*/} / Delete: ${NEWFILE%.*}"
				mv $NEWFILE $TRASH
			fi
		fi
		if [ ${LASTFILE#*.} == "JPG" -o ${LASTFILE#*.} == "jpg" ]
		then
			echo "Keep: ${NEWFILE%.*} / Delete: ${LASTFILE##*/}"
			mv $LASTFILE $TRASH
		fi
	fi
	LASTFILE=$NEWFILE
done

What this script does is look at every file in all the subdirectories where you run it and compares each one to the next one. If the names (minus file extension) match, it figures out which one to keep. The preference here is CR2 > DNG > JPG. It moves the one you’re not keeping to the ‘trash’, which in this case is the same ~/tmp directory I created earlier.

In the end, I removed nearly 3600 duplicate photos. Way to many to have done this by hand!

I hope these commands might help someone else in the future. I’ll answer any questions as best I can.

Posted in Linux, Mac.

rev="post-383" No comments

By Tyler Johnson – June 5, 2010

Hiking Santa Barbara – McMenemy Trail

The McMenemy Trail is nestled in the foothills of Montecito, CA and offers a nice day hike with great views of Santa Barbara. The trail is accessible either directly from Mountain Drive in Montecito, or via the San Ysidro Canyon Trail (which also starts from Mountain Drive, about 1.5 miles east). This is where I started the trail; it seems to be a popular spot for many and accommodates walkers, runners, bikes and horses.

The McMenemy trail begins a half mile into the San Ysidro trail, cutting off left (to the west). After crossing a creek, it climbs steadily upward until you reach the stone bench commemorating the trail. After that it dips down for about a half mile before beginning to climb upwards one last time before the final descent. If you hike straight across, you’ll see about 850 feet in elevation gain over a little under two miles (although the actual trail is longer, the rest of it is downhill).

For most people, they hike out the same way they came in (and back to their car). Hiking up the bench and back makes a nice hike of about 3 miles. The bench area also provides great views of the coastline and the Santa Barbara area. Although it’s in the sun, it makes a nice rest spot.

If you’re looking for more, instead of hiking straight across and back there is a nice loop you can take that extends the total to about 6 miles. You will see a sign for the Girard Trail as you’re passing the bench. If you keep going on the McMenemy Trail for a little less than a mile, you’ll find another trail going off to your right for Saddlerock. Following the Saddlerock Trail will have you hiking the ridge to the north until you finally meet up with the Girard Trail and end up back at the bench.

Overall, this is a great hike both for those experienced and those looking to get into hiking. While the elevation gain may prove a challenge to those not in shape, the distance is short enough to not provide too much of a challenge. The views are great and the trail is popular and well maintained.

For more information, the website SantaBarbaraHikes.com offers step by step instructions here.

Posted in Uncategorized.

Tagged with exercise, fun, Hiking, nature, photos.

rev="post-235" 3 comments

By Tyler Johnson – May 6, 2010

If It’s Tuesday, This Must Be Belgium

I like trying things. I love being somewhere new, under circumstances I never anticipated. If I’m trying something new, I’m having a good time. If I’m trying something I didn’t even know existed before, I’m having a great time. That being the way it is and I am, keeping this blog limited to one or several topics will mean a flurry of posts about once a year and not much else the rest of the year.

A good author knows his audience. He knows what his audience is interested in and doesn’t switch from an academic to a personal voice without good cause and reason or vice versa. He doesn’t write about subjects his audience doesn’t like, or care about or understand (unless it’s for introducing a topic). He also writes what he knows. I am not a good author.

This blog is a project of mine to communicate with others. Originally, I hoped to keep a narrow focus and write a steady stream of articles about one or two topics that people would enjoy and respond to. An entire month just passed without any of that happening. So, I am going to abandon this design but not the overall purpose.

I will write what I am doing. I will write what my escapades teach me and where my intense but short attention span takes me. If you read this blog, you will learn about music, sports, technology, the outdoors, travel and occasionally things actually in the news. You will hear my opinion in varying quantities and about varied subjects.

Feedback is always welcome and encouraged. If there’s anything I don’t like, it’s ignorance. So help cure this ignoble ignorance and share what you know.

Posted in Personal.

Tagged with Personal.

rev="post-230" No comments

By Tyler Johnson – April 5, 2010

Wrongful License Suspension, a summary.

This month has been an educational, enlightening, infuriating and prolonged affair. On January 6th, 2010 I discovered my license had been suspended.. Today, January 29th, 2010, I once again have a valid license.

Here is a summary of the fiasco:

Jan. 6th – Insurance company calls me, says their records show my license has been suspended (for months). I know nothing about this.
I go to the DMV where they tell me my license has been suspended and I learn:
- There is a ticket from Riverside, CA for driving without a seatbelt, and my name is on the ticket
- My license has been suspended since September 2009
- If I attempt to drive w/o a license and get pulled over, I will be arrested and have my vehicle impounded.
My driver’s license has a hole punched through it and I’m told the Court of Riverside has to fix the problem.
I call the Court of Riverside where I learn:
- Someone with my first name, last name and date of birth from Virginia got a ticket in Riverside.
- The DMV is incompetent and thought anyone matching an entire three characteristics is the same person.
- The Court believes me and this is entirely the DMV’s fault (see #3)
The Court makes a note on my record and says it should be cleared up in a few days.
Jan. 14th – I go back to the DMV, ready to renew my license.
The DMV keeps my license (before they punched a hole in it).
The DMV tells me the Court has to fix the issue (haven’t they already?) and the DMV will need an ‘abstract’
I call the Court and am told that nothing has been done to resolve my issue. I will need to appear in Court.
Instead of appearing in court physically (I have no transportation) I find out I can do a Trial by Declaration
I write a letter to the Judge, attach umpteen thousand pieces of proof I am who I say I am and mail it (certified).
Jan 22nd – The Court receives the letter.
Jan 26th – I still haven’t received notice from the Court. I call them and find out:
- The Court had already reached a decision and mailed a notice.
- The notice was mailed to the Virginia address (!!!)
The clerk from the Court is sympathetic and promises to call me back with more information.
The clerk’s supervisor actually calls me back! They write down my actual address to send me the notice.
The supervisor tells me to go to the DMV after a few days, the Judge ruled in my favor. (huzzah!)
Jan 29th – I return to the DMV for the last time.
The DMV shows that my license is not suspended, nor ever has been.
No record of the citation is present, so they don’t believe my story.
I pay $25 to pay for a duplicate license after mine was ‘lost’ and receive a temporary print out.

I have a developing belief that our DMV system was based off of an unpublished short story by Franz Kafka. On the plus side, my legs are in great shape from all the walking I’ve had to do…

Posted in News, Personal, Politics.

Tagged with ca, dmv, frustration, license, Personal.

rev="post-209" 2 comments

By Tyler Johnson – January 29, 2010

Adding a Twitter Feed to Your Website

You post to Twitter but you’d like to use to talk to more than the people that actually bother to log into Twitter. Maybe you’ve set up the Facebook app that posts all your Tweets to Facebook. Here I’ll talk about how to add your twitter feed to your own website.

First off, if you’re just looking for a generic Twitter feed, there are many prebuilt for you, including several from Twitter itself. This article is for those that want the experience of building their own or need the fine-grain control you get only by understanding all the underlying pieces.

The Basics
We will be utilizing PHP and MySQL on a Unix based server. You should be at least vaguely familiar with the first two. There are two parts to the script, one to connect to Twitter and add tweets to the database (caching them) and one to read from the database and display the tweets. We are going to be caching the twitter feed instead of connecting to Twitter every time someone loads your website.

Twitter API
Here are some basic facts about the API we will need before we move forward:

Entirely HTTP based
Conforms to REST
150 connection / hour limit
Limitations are IP Address or Account based

What does this mean? Well, #1 means that we can use the HTTP ‘get’ protocol to retrieve all the information we need. Which is good, because it means we can then use cURL, which is a command line based tool for transporting data with the URL syntax – or just what we need. It also has implementations in many, many programming languages.

#2 tells us how the information is formatted. REST stands for reStructuredText and is compatible with many XML parsers – also good.

#3 is very important and why we will be caching. It means that if you connected to your Twitter feed more than 150 times in one hour, Twitter would stop responding to you. If you weren’t caching, it means that you would have no data in your twitter feed. For high volume websites (or even medium volume ones) that number could be reached in minutes or even seconds, making your feed useless.

#4 tells us a little bit more about this limitation, though, and gives us hope. If limitations are based on IP address or accounts, it means that if Neil Patrick Harris decides he’s going to look at my Twitter feed 10 times a minute every minute then it will only lock him out, not me (assuming we’re not sharing a computer). Be careful with this one, though, because if you’re using a shared hosting server (if you can’t answer this question then assume you are) then another website on the same server connecting to twitter a lot could affect your website. So we’re going to use an account to log in – which means you can connect 150 times per hour on the same account before you get locked out. Sounds sufficient to me and it is.

You will need to find the address of your RSS feed. From your profile (http://www.twitter.com/username) find the ‘RSS Feed of Username’s Tweet’s’ on the righthand side. It should look like ‘http://twitter.com/statuses/user_timeline/14287293.rss’

Caching
At this point we’ve covered how we will connect to Twitter (cURL), what protocol to use (‘GET’), we know to authenticate as our account and we know that our tweets will come to use formatted as REST – and that we can use an XML parse like SimpleXML in PHP to read it.

Once we retrieve a tweet and decode we will want to cache it. In this case, we’ve chosen to use MySQL. I’ve created a database called ‘generic’ that I store all my random tables in. For this script, I’ve created a table called ‘twitter’ with the appropriate fields to store the data from a tweet. In MySQL, make sure you have selected the appropriate database and type:

CREATE TABLE IF NOT EXISTS `twitter` (
  `title` tinytext NOT NULL,
  `description` tinytext NOT NULL,
  `pubDate` datetime NOT NULL,
  `guid` char(60) NOT NULL,
  `link` char(60) NOT NULL,
  PRIMARY KEY  (`guid`)
);

This is the table we will be storing everything. Now, every time someone connects to our website we will read from this database instead of going to Twitter. On top of the limitations Twitter has given us, this would be a good idea anyways because it will speed up how fast your website loads by not having to go to Twitter every time.

Beginning
We begin to have an idea of what our script should look like. At this point, we have an outline that should look something like this:

Script 1:
Connect to Twitter using cUrl
Retrieve our Twitter feed with the 'GET' command
Read REST formatted data
Store data in MySQL database (caching)

Script 2:
Connect to MySQL database
Retrieve our Twitter feed
Display on website

That’s great! Let’s look at the first part of the script:

Script 1 – twitterParse.php

<?php
/*
This page will connect to the RSS feed for a single twitter user and pull the most recent updates (up to the # in $cnt)
It will then update a database as specified in db_conn.php and upload any tweets not already in the database. It is 
not recommended to use this script on a browser facing site, but rather to call it via cron and have the visual 
Twitter feed pull from the database. The reason for this is that Twitter only allows 100 calls an hour which would
be quickly maxed out on a website with any traffic.
Written by Tyler Johnson (c) 2009
*/
$dbserver =''; // address of your dbserver
$dbuser = ''; // MySQL user 
$dbpass = ''; // MySQL user's password
$conn = mysql_connect('$dbserver', '$dbuser', '$dbpass') or die(mysql_error());
mysql_select_db('generic', $conn);
$username = ''; // Twitter username
$password = ''; // Twitter password
$userrss = ''; // Twitter RSS location


// Setup cURL connect
$ch = curl_init($userrss);
// Define parameters
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // this means wait for a server response
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_USERPWD, "$username:$password");
// Execute cURL
$data = curl_exec($ch);
// Close connection
curl_close($ch);


// Parse XML
$xml = new SimpleXMLElement($data, LIBXML_NOCDATA);

$cnt = 20; //count($xml->channel->item);

for($i=0; $i<$cnt; $i++) // start looking at results
{
	$title	=	mysql_real_escape_string($xml->channel->item[$i]->title);
	$desc	=	mysql_real_escape_string($xml->channel->item[$i]->description);
	$pubDate=	mysql_real_escape_string($xml->channel->item[$i]->pubDate);
	$guid	=	mysql_real_escape_string($xml->channel->item[$i]->guid);
	$link	=	mysql_real_escape_string($xml->channel->item[$i]->link);
	
	// input results into MySQL
	$query = "INSERT INTO `generic`.`twitter` (`title`, `description`, `pubDate`, `guid`, `link`) VALUES ('$title','$desc',STR_TO_DATE('$pubDate','%a, %d %b %Y %H:%i:%s %x'),'$guid','$link')";
	$result = mysql_query($query);
	$error = mysql_errno(); // Record errors

	if($error) // If error...
	{
		echo $error; // then halt MySQL
		break;
	}
}
?>

There’s a lot to understand in this script. I would recommend Googling any PHP functions you don’t understand.

Now, this page will connect to Twitter, grab your 20 most recent tweets and add them to your MySQL database. Because there is no unique ID on the Twitter table, identical tweets will overwrite each other so you will not get any duplicates.

While this code does what we want, do do not want to run this directly from our website. If we did, there would be no point it caching – instead, we want to put this script into a cron job, which is a chronological based job scheduler on Unix platforms. I have mine running every minute, which puts me fair below the 150 connections a minute and seems more than sufficient to me. To add this, you’ll need to add the following like to cron by typing ‘crontab -e’ and adding the following line, editted with your scripts location:

* * * * * /usr/local/php5/bin/php -q /location/of/script/twitterParse.php > /dev/null 2>&1

Retrieving & Displaying Your Tweets
The hard part is over! The next script is the one we will be putting on your website to viewers to see. It’s also much simpler to write. I won’t spend too much time on it, so here it is:

Script 2 – twitread.php

<?php
$tweeter = ""; // Twitter username
$dbserver =''; // address of your dbserver
$dbuser = ''; // MySQL user 
$dbpass = ''; // MySQL user's password
$conn = mysql_connect('$dbserver', '$dbuser', '$dbpass') or die(mysql_error());
mysql_select_db('generic', $conn);

// Connect to DB and pull $tweetLimit latest tweets
$tweetLimit = 5;
$query = "SELECT * FROM `generic`.`twitter` WHERE 1 ORDER BY `pubDate` DESC LIMIT $tweetLimit";
$result = mysql_query($query);

echo "<div class='twitter'>\n"; // begin Twitter display
echo "<div class='tweeter'><a href='http://twitter.com/$tweeter'>$tweeter</a></div>\n";

while($row = mysql_fetch_array($result,MYSQL_ASSOC))
{

	$desc = $row['description'];
	$link = $row['link'];
	$date = $row['pubDate'];
	
	$pattern = '/^([A-Za-z0-9]*):(.*)/'; // let's separate the name and the tweet
	$replaceTweet = '\2';
	$noname = preg_replace($pattern, $replaceTweet, $desc);

	$pattern = '/@([a-z0-9]*)/';
	$replaceName = '@<a href="http://twitter.com/\1" class="twit">\1</a>';
	$tweet = preg_replace($pattern, $replaceName, $noname);


	echo "<div class='tweet'>$tweet</div>\n";
	echo "<div class='tweetDate'><a href='$link'>$date</a></div>\n";

}

echo "</div>\n";

?>

That’s it! You’re done! You can tweak the HTML in the second script to your hearts desire and add CSS to format it anyway you like.

This tutorial was much heavier on the ‘theory’ behind why the script is written this way and less on the techniques used. I will be happy to answer and questions you may have to help you get this running.

Posted in Web.

Tagged with learn, Tech, tutorial, twitter, Web.

rev="post-94" 4 comments

By Tyler Johnson – January 25, 2010

Fun Times in Beaurocracy

Fun story.
So there’s this guy, we’ll call him Tyler. Anyways, Tyler gets this email from his insurance company saying that their records show his driver’s license has been suspended and they will not be able to renew his insurance policy. Now, Tyler realizes this is obviously a mistake and so calls his insurance rep and explains that this is impossible. The rep understands and offers to personally run a check against the DMV and see if he can find the error or at least get more information. Tyler happily agrees and proceeds to go about his day, unworried.

An hour goes by and the rep calls back. There’s a problem, he tells Tyler. It seems that the suspension is real and in fact has been in place for almost four months! Astonished, Tyler replies that he has no idea how this could be. He certainly never received anything in the mail from the DMV or the courthouse! Wishing Tyler well, the rep says that if it isn’t fixed soon Tyler will be dropped from the policy.

Rushing to the DMV, Tyler hurriedly explains his dire situation. After punching a hole through his (now invalid) driver’s license, they show him the culprit: a citation for failure to appear in court after receiving a ticket in Riverside, CA. But how, Tyler asks, can this be true if not only have I never been to Riverside but I was out of the country when this ticket was issued? Call the courthouse, the DMV responds, and leave us alone.

Frustrated and confused, unable to legally operate a motor vehicle inside the United States of America, Tyler calls the courthouse and manages to reach an actual human being in only a brief 45 minutes or so of hold music and dropped calls. Ah hah! the Courthouse says, we see the problem. There was another person with your same name and birthdate, the DMV must be confused. But the issue isn’t on our end, because we don’t even have your driver’s license written down here. It’s obviously someone else!

Thank god, Tyler thinks. This will be straight forward and easy to fix. They believe me! I’m not a criminal, didn’t fail to appear in Court and soon this will all be over!

Setting off on his bicycle, Tyler rides the several miles back to the DMV happy that this will all soon be over. He cheerfully takes another ticket and waits in the corner that passes for a line until his number is announced. Walking up to the counter, he recites his situation once again. This is the Court’s problem, the DMV says. Tyler explains that the Court said it was the DMV’s problem. Well it’s not, the DMV replies. Talk to the Court. Exasperated, Tyler is sent away again and rides cheerlessly back to his apartment.

Suffice to say, but this late hour in the day (nearly 3:30pm) the Court was closed until the following week.

Posted in News, Personal, Politics.

Tagged with ca, dmv, frustration, license, Personal.

rev="post-89" 3 comments

By Tyler Johnson – January 6, 2010

Online Secrecy – Right or Privilege?

Many people enjoy the notion that they are anonymous while online. It is, really, one of the common defining preconceived notions people have about the internet. It’s also, generally, a lie. Should it be? When you visit a website, post a comment on a blog or play an online game, should anyone else, whether it’s the police or a disinterested website administrator be able to track you down?

Consider this: every time you visit a website, your computer sends out a request asking for data. Along with this request is information on how to get back to your computer – otherwise it would be a lot like sending a letter asking for a response but failing to include a return address. This return information is available to any website you visit, although whether or not the information is stored is another question entirely. You may be surprised at how often it is.

The accuracy of this information is debatable in the hands of a web administrator. While specific to your computer, unless you are surfing the web from an office location or small school, the information will only narrow down your location to perhaps your city or even neighborhood – generally not your exact location. You will notice, however, I said ‘in the hands of a web administrator’. Your Internet Service Provider has much, much more information they could potentially hand over.

Recently the police were able to track down a wanted fugitive through World of Warcraft. Blizzard readily handed over his IP address, account information and billing address. While this may be no shock to some – obviously a company you do business with has your billing address – remember the anonymity the fugitive thought he enjoyed.

Should there be a reasonable right to privacy online? Do you want anyone to be able to see who you are, or should your online speech be protected through secrecy? One thing is certain, the perception of being anonymous on the web has become an ingrained part of the online culture. If or when that changes, the internet will be a very different place.

Cleaning Up Photo Storage

Hiking Santa Barbara – McMenemy Trail

If It’s Tuesday, This Must Be Belgium

Wrongful License Suspension, a summary.

Adding a Twitter Feed to Your Website

Fun Times in Beaurocracy

Online Secrecy – Right or Privilege?

About /dev/null

Main

Archives

Meta

Tags

Cleaning Up Photo Storage

Hiking Santa Barbara – McMenemy Trail

If It’s Tuesday, This Must Be Belgium

Wrongful License Suspension, a summary.

Adding a Twitter Feed to Your Website

Fun Times in Beaurocracy

Online Secrecy – Right or Privilege?

Subscribe

About /dev/null

Main

Archives

Meta

Tags