Qureet.com : Find Customers On Twitter


The Most Useful Shell Script I Ever Wrote

The Most Useful Shell Script I Ever Wrote

(I’m Andrew, creator of boot-strapped Twitter-Curation startup ‘TweetQureet‘. I’m a programmer and I’m trying to use my experience to launch an online business. I know I’m not alone in this goal. This blog talks about my journey. There’s an RSS feed too.)

TweetQureet learns your top topics on Twitter and sends you a digest email each day with top tweets matching those topics. Each user’s interest profile is unique, as is their timeline. As such, TweetQureet creates and uses various databases for each user.

These take the form of CSV (Comma-Separated-Value) files on disk.Is this a hack? Maybe. Should I have used a nice, scalable, transaction-based database solution instead of file DBs? Possibly. Did I want to get the TweetQureet experiment off the ground as soon as it’s humanly possible for one hacker to do?

You Betcha.

One of the main advantages of using CSV files for databases is that they are both human-readable and -editable. This makes tweaking user configurations, inspecting tweet queues, searching interest profiles, etc., a breeze. It’s just a case of cracking open the correct file on disk.

Working with CSV Databases Means LOTS of Filter Commands.

grep is my friend. One of my best friends, in fact. A huge amount of the work I do in writing, testing and debugging new features in TweetQureet involves commands a bit like this:

find foo/bar/users -name user_config.csv |
xargs grep -Hn 'send_emails, true'

If the above command looks like gobbledygook to you: it finds all files named user_config.csv under directory foo/bar/users and prints out the file+line number of any of them which have the send_emails property set true.

Useful. But a bit of a mouthful, isn’t it? It’s not that intuitive to read – unless you’re well-versed in Linux command-line lore. And composing the command in the first place is even more of a pain.

And it doesn’t end there. I use many commands following similar patterns to the above. But they’re all subtly different. Often, I need to restrict searches to certain sub-paths, tweak the search string, modify various flags … and so on and so forth.

What emerged from the above is a realization. In my day-to-day dealings with TweetQureet’s csv-file databases, the Unix bash-shell’s history command – whilst fantastic – wasn’t going to be enough. I needed a powerful way to recall specific commands and tweak them at will.

Getting More Blood Out Of “history”

Firstly, I’ve setup a few useful options in my .bashrc file:

shopt -s histappend
HISTSIZE=5000
HISTFILESIZE=5000
alias h='history'

The first line sets up the shell so that if I enter !3141 at the command prompt, the 3141st most-recent command text gets reproduced at the cursor. But with histappend set, before it executes I get to edit the command before pressing return. Nice – I use this all the time!

The second and third lines just tell the shell to keep a long history of commands. This is useful because I might well need to recall and tweak a command from a few days ago. Since the default history length is only 500 lines, older commands might have fallen off the end of the list – especially if I’ve been a busy bee lately.

The fourth line just aliases the command “h” to “history”. So I can recall the latest command history by just typing “h” <RETURN>. Nice and fast.

A More Versatile “history” command

The above tweaks help … but they still don’t get me the recall power I require. We need a nested grep chain for that. To recall the aforementioned filter command, we’d need something like this:

h | grep find | grep user_config.csv |
grep 'send_emails, true' | grep foo/bar

Hopefully, you can start to see how useful grep chains like this become. “Find me all recently-run ‘find’ commands which involved the file ‘user_config.csv’ and the ‘send_emails, true’ property, and involved high-level path ‘foo/bar’ …” My day-to-day involves countless incarnations of this kind of grep chain.

Kind of a messy command though, isn’t it? At this point, I’d like to introduce a new shell script – called “hh” for simplicity. It replaces the above command with this shorthand:

hh find user_config.csv 'send_emails, true' foo/bar

Done. Easy as pie. The output of this command will be a filtered list of commands, much like that provided by the vanilla history command itself. You can then just select the required command from this result list, !123 it, edit at the prompt as desired, then hit <RETURN> to execute.

BANG! A whole bunch of repeated, error-prone, tedious typing just got eliminated. The time savings (and error-reduction) over time in my work have been tremendous.

Does this script look like it would be useful to you? View and download the source here.

Does your Twitter timeline feel overwhelming?
Signup for daily digests of tweets on your top topics below:

sign-in-with-twitter-2


Leave a Reply

Subscribe: rss | email | twitter | Google+