Qureet.com : Find Meaning On Twitter

TweetQureet: a full Golang technology stack

Get your top timeline tweets daily with Qureet’s targeted social filtering. I’m Andrew, a programmer who created bootstrapped startup Qureet.com.  I’m building an online business, and I know I’m not alone in this goal. This blog talks about my journey. There’s an RSS feed too.

Over the years, I’ve written production software in many languages. During my time at Google (in Irvine, CA) we were introduced to the Google Go project, aka golang. The language was brand new at the time, but I instantly became interested. Golang tries to take the best of several other languages and combine them into one:

Go Borrows Good Bits From More Established Languages

Golang takes the power of C++ but simplifies its dependency graph, which can cause exponential growth in compile time for large codebases.

Golang borrows from the Java and Python world: useful built-in APIs for performing many commonly-required services, eg string parsing, network i/o, file management etc.

Golang is compiled and statically linked. Once your binary is built there is no chance of encountering run-time dependency issues (missing shared objects, DLL not found, .jar file missing etc). I’ve found that these problems can all arise in C++, Java and Python systems.

Golang builds unit-testing right into the language core. This is fantastic. Table-driven unittests are a Godsend for all but the most trivial algorithms. They are wonderful documentation for your code. Write your unittest before you write the code it is testing. This is an excellent tool for building an understanding of what the code actually needs to do. No more, no less. Just honour the contract dictated by the unittest.

This might all sound obvious and trite, but 10 years ago this is not the way most code was written. It probably still isn’t, even today. For shame!

Golang makes concurrency a core part of the language through several keywords and library services. Instead of the nightmare that is concurrency via mutexes, semaphores and explicit locking, use goroutines and channels instead. They are clear, clean and maintainable. If you ever wrote parallelized code in C++, you’ll know that none of these adjectives really applied.

Go promotes itself: Go Playground and Go Tour

Golang is primarily intended for server-side systems. Websites fall squarely into this category. Golang highlights this nicely with two of its best features: the Go Playground lets you write, debug and execute Go code from a web-page. The Go Tour is a great online tutorial which showcases the core set of language features. Again, it teaches by encouraging you to write and run code inside a webpage.

The newest golang codebase allows these code snippets to be included in third-party web pages too (look for the “Run” buttons). This is just brilliant, in my opinion. Writing feature-rich and unique websites should be fast and clean! Golang is a great step in that direction.

(I’m not a Ruby aficionado, so can’t speak about the pros and cons between Ruby and golang. I’ve heard great things about the rouge-stone language though.)

Go as an all-round Web Platform

Have gotten the bug for online products during my time at Google, I decided that golang was a good technology to use for launching a website. So TweetQureet is 100% golang (apart from a bit of CSS and Javascript on the webpages – which probably goes without saying). Here are the broad strokes of how I tap into golang’s power:

1) The webserver front-end uses the net/http package to route incoming http requests to the correct golang handler routine. Goroutines and channels are used under the covers to enable a lightweight webserver to handle multiple incoming requests concurrently. Perfect – just what I wanted.

This is one of golang’s great strengths: implementing non-trivial website functionality is fast and clean. WordPress is great for lots of things (including this blog!) But if you are moving beyond basic blog/CMS functionality, the php and plugin world can quickly get messy. I wanted a single codebase that did everything.

2) If you’ve done web development, you’ll know that tag substitution is very important for generating web pages with dynamic content. Golang’s html/template package is great for this. It uses a simple syntax. You can pull in dynamic strings from your database on the fly in your http request handler.

Databases, databases everywhere – but not a MySQL instance in sight …

3) TweetQureet uses a CSV (Comma Separated Values) file database. The reason is that they are easily human-readable and editable. I want that functionality as I build and debug. JSON or XML might be tighter and more performant, but they lack the human-readable element that I so covet. Decision made.

The file-based database could prove problematic if I need to massively scale up in the future. But TweetQureet is a Beta, and I wanted to get going fast. I can worry about large-scale databases later!

Once again, the CSV databases were a breeze when I used golang’s built-in io and ioutil packages, along with the path package. File databases come into play everywhere in TweetQureet:

Interestingly, there’s now an encoding/csv package in golang. This didn’t exist when I started playing around with file-based databases in golang, so I wrote my own csv functionality. It took time and there were lots of bugs, but eventually I got a solid platform.

It would probably make sense to migrate my code onto encoding/csv now that it’s stable. But, as they say, if it ain’t broken …

Java Serialization … Done Right

4) Version 1.1 of the Twitter API includes tight rate-limiting to prevent abuse/excessive usage. Avoiding redundant calls to the API gives you more wiggle-room at the client end. To this end, TweetQureet stores a tweet cache for each user.

Once a Tweet has been downloaded, it can be referred to multiple times without having to be requested from the Twitter API each time. Each tweet in the cache is stored on disk using the encoding/gob package. Gob takes an arbitrary golang struct or variable and streams it to a binary file on disk for storage.

Recreating that stored variable in memory later is simple. Pass an empty instance of the correct type to a gob Decoder and you’re done. Brilliant – it just works. All the Tweet processing happens directly in my golang code. So cached Tweets don’t need to be stored in human-readable files like my .csv databases. Binary files work fine in this case. They are also more compact on disk.

Blowing a Big Raspberry at Amazon EC2, Microsoft Azure, Google App Engine et al

Because compiled Go code is lightweight and performant, the webserver I’ve outlined above is very resource-efficient. So much so that I launched the entire Qureet system on a single RaspberryPi based behind my home router.

That’s right. The web-server and the various background cron-jobs which run throughout the day to process Twitter timelines and do the machine-learning – all on a single credit-card sized Linux box! “Now that’s Magic!” as Paul Daniels would say …

[Update 7/2/14: Since this post was written, Qureet's user-base has grown substantially. As such, I've recently had to migrate the service onto a more powerful server. Nevertheless, my heartfelt thanks go to the folks at the Raspberry Pi Foundation - what a great way to launch my startup!]

There’s lots more areas where I tap into the juicy golang API, but I’ll go into those in more detail later.

Are you using golang to do any of the above? If so, I’d love to hear from you! Just comment below.

Does your Twitter timeline feel overwhelming?

Signup for daily emails of tweets about your top interests below:

sign-in-with-twitter-2



Leave a Reply

Subscribe: rss | email | twitter | +