Blog

Jobson: Webify CLI Applications

This is a post about Jobson, an application I developed and recently got permission to open-source along with its UI.

Do any of these problems sound familiar to you?:

I’d like my application to have a UI.

I want to trace my application’s usage.

I want to share my application.

They’re very common problems to have, and are almost always solved by building a UI in a framework (e.g. Qt, WPF), sprinkling on usage metrics, and packaging everything into a fat executable.

That development approach is becoming challenging to do well nowadays because clients are more likely to use a mixture of OSs and 3rd-party resources. Therefore, new projects tend to need to choose between several options:

  • Develop a CLI application. The easiest thing to develop in many situations. However, they can be challenging to use and have the usual client-side distribution problems (installers, etc.).

  • Develop a UI application. Easier to use use. However, they’re usually less flexible for developers to use and are more challenging to develop.

  • Develop a web API. Has similar flexibility to client-side CLI applications but doesn’t have the distribution issues. However, web APIs are challenging to develop because you need to deal with networking, sysadmin, setting up a domain, authentication, etc.

  • Develop a full (API + UI) webapp. Easy to use and flexible (because of the API). However, webapps have all the complexities of the above and more: you effectively need to develop a full web API and a working UI.

When scoped in this (biased) way it’s clear that webapps are ideal vehicles for delivering the full “product” but CLI applications are ideal for ease-of development and flexibility.

It’s clear that developing a method for turning CLI applications into webapps would be valuable. It would enable developers to rapidly develop and roll out applications. That’s what I explored with Jobson: a web server that turns CLI applications into webapps.

Jobson’s Approach

Jobson has now been working in a production environment for a few months now and has proven to be an effective vehicle for delivering new platforms. However it cannot turn any application into a webapp. That would be tough: the potential “inputs” (applications) and “outputs” (webapps) are far too varied.

Jobson’s primary limitation is that the only applications it handles are batch applications that: a) start, b) take typical inputs, c) write typical outputs, and d) end. Most applications follow this model (e.g.echo, gcc, nmap, and ls).

With that limitation in place, Jobson could then be designed with simple principles in mind:

  • Input applications are explained to Jobson via “specs” (at the moment, YAML files). The applications themselves should not need to be changed in order to suit Jobson.

  • Jobson uses the specs to host a standard HTTP REST API which can be used by all general programming languages and utilities (e.g. curl jobson-server/v1/jobs)

  • The API is regular and predictable: job specs should always create the API you’d expect.

  • Jobson’s authentication uses standard protocols (e.g. HTTP Basic) that can be integrated into a larger systems easily.

  • (Default) persistence uses plaintext files on the filesystem, which allows for transparent debugging and administration.

  • Persistence can be swapped for other implementations (e.g. a MongoDB implementation)

  • Job execution itself uses the underlying operating system’s fork(2), and exec(2) syscalls, which allows Jobson to run any application the host OS can run.

  • By default, Jobson has no external dependencies beyond Java. It should be bootable from a barebones linux server with minimal configuration.

Overall, this design means that Jobson can webify almost any application very quickly. I could webify a major CLI tool in less time than it took to write this post, resulting in a web API and UI for that tool.

Why It’s Useful

Here are some typical development problems Jobson could help out with:

Problem: You’ve got a cool application idea you want to share

Without Jobson:

  • Develop the idea
  • Develop all the crap necessary to share that idea (download links, websites, databases, etc.)
  • Share link

With Jobson:

  • Develop the idea
  • Write a Jobson spec
  • Share link

Problem: You’ve got a toolchain (e.g. a pentesting toolchain) that has hard-to-remember commands. You want to streamline that toolchain such that you can run and queue multiple commands.

Without Jobson:

  • Develop a UI around the tools, dealing with the usual development headaches
  • Use the tool

With Jobson:

  • Write a few small scripts around the tools (if necessary)
  • Write a job spec
  • Use the Jobson UI

Problem: You want to automate data requests at work, but there’s a lot of flexibility in the requests (FYI: this is why Jobson was made).

Without Jobson:

  • Write an application to handle typical data requests. The application will have to deal with request queues, request structuring, etc.
  • Find that end-users suddenly want to make different requests
  • Redevelop, etc.

With Jobson:

  • Write short scripts for handling a data request
  • Host it with Jobson, which automatically deals with all the other stuff
  • Find that end-users suddenly want to make different requests
  • Add a new script + spec into Jobson

Overall, I believe this approach to developing a job system is extremely flexible and much easier to work with. Jobson abstracts all the typical job system faff (auth, APIs, queues, etc.) away from what’s actually important (an application that does something), resulting in a much cleaner system.

The next steps with Jobson are to streamline installation (by implementing installers), add a landing+documentation page, and start making tutorial videos for it. Watch this space =)


Textadventurer Has Actual Games Now

I’ve been distracted by other things going on, but I finally managed to spend an evening or two adding actual games into textadventuer.

The following games were added:

  • Droids (a text adventure game), python3, I found here
  • DUNGEON (Zork 1), native C application, One of the earliest interactive fiction computer games ever made (wiki). I compiled the open-source port of it for Textadventurer
  • Descent Into Evil, python2, I found it on Tyler Kershner’s blog
  • Zork-py, python3, A fan’s interpretation of Zork (above) that I found here
  • Python Text Adventure Game, python2, someone’s collage project that I found here
  • little-dungeon, python2, posted on reddit with accompanying source

One thing the game search did for me was find similar platforms. I should’ve looked before hacking away at textadventuer because some of those platforms (e.g. http://textadventures.co.uk/) are very well made. Some even use clever tricks like emulating DosBox in the browser. However, I’m yet to come across a platform that can run any application via a browser (textadventuer’s “USP”, if you will), so hopefully other developers will find the server and UI source helpful if they have an idea along those lines.


Text Adventurer

One of the first things people learn when they start programming is how to write a text prompt. It’s a decades-old exercise that teaches new programmers input-output.

In order to inject a little excitement, learners are normally encouraged to write interactive games using standard IO. This helps them learn programming by interactively - they will need to learn conditional logic to handle the “Will you stab the monster with your sword or run away?” prompt.

This creative learning process is great but, unfortunately, it’s hard to show the creations to other people. Any players will have to undergo the nuisance of installing an interpreter, libraries, and the game in order to play. These deployment problems are alleviated on the web: javascript can be distributed and executed remotely in a browser with no effort required from the client.

If people wrote their text adventures in javascript, they could be easily be shared with a URL. However, javascript is not necessarily a good teaching language. Enforcing people to write their text adventures in it for the sake of distribution detracts from the learning experience.

An ideal system would allow text adventures (console applications) to be written in any language but also be distributed on the web. This is what my latest project, textadventurer, tries to achieve (gallery).

textadventurer keeps interaction in the browser while moving execution onto the server. Communication between those two layers is achieved with websockets. This makes it easier for people to play the game. The frontend focuses on presenting games (CLI applications) to the players and provides a basic UI for sending input (STDIN) to the server and recieving output (STDOUT) from the server.

Using process forking server-side affords a lot of flexibility: the server is completely agnostic to the language or framework that the game is written in. This means I can use textadventurer to distribute any standard interactive CLI application. With that in mind, I plan on deploying historic text adventure games to textadventurer when I get the chance so that people can enjoy those games once more without having to faff around with installers, legacy interpreters, etc.


Keep Side Projects Small

I’ve officially thrown a deployment of PlateyPlatey (PP) up onto plateyplatey.com (galley) along with its server and frontend source code. I previously explained PP in a blog post.

PP is missing many features I would of liked to have. However, a few weeks ago I took a long look in the mirror and decided to just tidy it then throw it up on the net. This let me start sharing it on sites such as reddit to gauge if anyone’s interested. After about a week of PP being up, I concluded that no, no one is interested and yes, I spent way too much time coming to that conclusion.


This is my attempt to articulate general some general rules of thumb I have picked up over the last few side-projects. It was written just after PP, one of my bigger projects, didn’t live up to my expectations. It is not a definitive guide nor something you should take seriously without reading other developers’ thoughts.


I used to approach software side projects as an opportunity to write full-blown applications that I might someday commercialize or turn into a big hit. This, I’ve learnt, is a silly approach. Side projects should be approached as side projects, not as full-blown projects.

If a side project requires a light scripting engine, command controller, custom UI elements, a fair amount of interactivity, servers, etc. then it’s doomed to failure. This isn’t because these features can’t be implemented—I believe any developer can implement any feature if given enough time—but because big projects don’t suit only getting “and hour here” and “an evening there”.

The problem with PP is that I specced my minimum viable product way too ambitiously. The spec included several headline features:

  • The user should be able to visualize a plate in the browser
  • That plate should be interactive, allowing the user to click wells and selectors on it
  • Interaction with the plate should provide visual cues in the table, so the user can visualize the link between plate and tabular data
  • The table should be interactive, with features resembling Excel tables
  • The UI should have commands that let the user manipulate the table and plate layouts
  • Those commands should have disable logic and carry metadata such as tooltips
  • There should be light support for scripting so the user can write/record macros
  • etc. etc.

In isolation, each of these features are easy to implement. However, implementing all of them into one system requires more care and, crucially, architecture.

If you write software full-time as a job then you can afford to spend a month or two making prototypes, designing hierarchies, and, ultimately, implementing a system that’s quite large. However, side-projects won’t get more than, say, ~20 % of the time an at-work project will get. Because of that, those few weeks of prototyping and planning will stretch out across months.

In principle, many months isn’t a problem for a side-project with no time limits. However, spending many months without making anything requires a lot of discipline: no one finds planning and prototyping as fun as diving into the heart of an idea. The same goes for boring old architecture documents, automated testing, and design specs.

Those things mostly exist to make working on bigger projects more predictable and to make the maintenance of a working system easier for other developers. A small side project is only going to be worked on by one, maybe two, developers over a relatively short period of time; therefore, it isn’t worth worrying about the theoretical overhead of maintaining and sharing it. If a small side project becomes an overnight sensation then—because it’s small—there is scope for dropping the initial codebase and starting again with all the knowledge and expertise the first runthrough gave you.

Dropping those things for a bigger side project, though, is dangerous. Not having plans, specs, architecture, or tests in place for a bigger project will result in development grinding to a halt under its own technical debt. When that happens, you’ll need to do a big refactor. This happened roughly twice with PP. First, to move all the javascript code into separate hierarchies with a makefile build system. Second, to translate all the javascript into typescript and use webpack to build everything. The time taken by those reorganization steps really sucked some of the momentum out of PP’s development. Especially because the actual functionality of PP didn’t change after each step - they just made working on the codebase easier.

The main take home is that side projects get developed much more slowly than real on-the-job projects because of time constraints. That makes working on bigger side projects especially painful because the boring planning, speccing, and testing steps—which tend to come in handy for bigger projects—last too long to keep the momentum of a side project going. By contrast, smaller projects work well with a “code and fix” development model which gives immediate feedback and a semi-working system very quickly. This helps maintain momentum and interest; however, “code and fix” gets treacherous once projects start growing, so don’t let them grow.

Keep small. Keep agile. Try things out. Get the audience’s opinion. Then grow.


Low Level Programming

I haven’t posted in a while because I have been busy with a few random studies and projects.

Low level programming has always, to a high-level programmer like myself, felt like a mysterious playground where all the real hackers write graphics engines, password crackers, and operating systems. I have always had a fascination with people who can write something hardcore like an emulator. Sure, I can open a window, draw some graphics, and update the UI to reflect state changes. However, an application that does all that and some low-level wizardry is in an entirely different league of complexity.

Many prominent programmers recommend learning something like C because it is a good way of learning how computers really work. Apart from the occasional toe-dip into C, I have almost exclusively coded in higher level languages.

I want to get better at programming. I’m getting bored of the high-level: there’s only so many ways classes, functions, currying, type systems, can be repackaged before you’ve seen it all. I like the idea of mastering the low level. However, deciding on where to start was challenging.

Last weekend, however, I decided to set myself what I thought would be a simple exercise:

Write a websocket server that capitalizes and echoes any text it recieves. You cannot use a websocket library.

I eventually hacked something together that works (github). I took a shortcut and used the nodejs http-parser and a base64 encoder written by Brad Conte. However, the websocket implementation was derived directly from RFC6455.

Even though it’s pathetically basic, is blocking (I don’t fork or do any fancy event queueing on the connections), and uses compile-time allocated buffers for most of the messages (so it can’t stream big websocket frames), this was still a very eye-opening first step into some low-level programming. I somewhat regret writing the process forking for text adventurer in clojure (with the help of websocketd) after seeing how straightforward a lower-level implementation would be.


Engineering Flexibility

It’s tempting to believe that having a flexible file format for a system’s configuration is enough to ensure forward-compatibility and flexibility as the system grows. The thought process is that with a suitable data format—for example, XML or JSON—extra information can be added without breaking existing systems. For adding new fields into the data, this is true. However, if you don’t engineer additional flexibility into the fields themselves then you may end up needing to make breaking changes anyway or, worse, end up with a much more complicated data file.

Imagine you are implementing an ecommerce server that exposes products. The spec for the server stipulates that there will only be a very small number of products. It also stipulates that rebooting the server to refresh the system with new products is fine. With that in mind, you add a products field to the server’s configuration:

port: 8080
products:
 - name: Acme Explosive
 - name: Fake Backdrop 

One day—and this day always comes—a change to the specification comes: some products should be hidden on the site. To implement that feature, you add a hidden: field to the product. It is assumed that if a hidden: field is not defined for a product then it should default to false. This is backwards-compatible and easy to implement given the extensible YAML format being used:

- name: Fake Backdrop
  hidden: true

Later on, it emerges that product-hiding should not be binary. It should depend on the group that the user belongs to. To implement that, you add a hidden_to: field. However, to allow for backwards compatibility, it is enforced that hidden_tos entries take priority over the hidden state. Side-rules are beginning to creep in:

- name: Fake Backdrop
  hidden: false
  hidden_to:
    - roadrunners
    - unregistered-customers

As the system gains popularity, it becomes increasingly necessary to be able to add products to a running system. It is decided that products should be held in a database rather being loaded from the configuration file. The configuration file, therefore, should support providing database details as an alternative to products. However, backwards compatibility should still be maintained to prevent legacy deployments and tests from breaking:

port: 8080

use_database: true
database:
  url: //localhost/database
  user: productapp
  pw: password123

A heuristic is adopted that if use_database has a value of true then the server should load products via the database defined in database; otherwise, it should load them via products.

The sysadmin then mentions that it would be useful to allow the server to listen on multiple ports, some of them being HTTP and some of them being HTTPS. Marketing also wants to integrate email updates into the server. However, one week into developing email updates, they also mention that some systems may need to send tweets instead of emails. Your implementation strategy follows the same logic as previous development efforts:

port: 8080

other_port_configurations:
  - port: 8081
    protocol: http
    port: 8082
    protocol: https
	
use_database: true

database:
  url: //localhost/database
  user: productapp
  pw: password123database:

send_tweets_instead_of_email: false

email:
  host: smtp.app.com
  port: 442
  username: productapp-email
  password: thisisgettingsillynow

This process of introducing additional data that is handled by special-case heuristics is a common pattern for extending configurations without breaking existing fields. However, it can become quite difficult to keep track of all the different rules that arise out of this gradual growth in complexity. Not only that, the resulting configuration file can be difficult to work with.

What would make this process a lot easier would be to extend the actual fields themselves. However products, for example, is defined to contain an array of products. The only actions that can be carried out on products are to add or remove products.

The solution to this is to to ensure that there are extension points at major parts of the configuration hierarchy when it’s initially designed. Relegate primitive values (which can’t be extended later) deeper into the configuration hierarchy and have clear extension points.

An example of adding extension points would be:

products:
  product_data:
    - name:
        value: Acme Explosive

In this design products, product (within product_data), and name could all be extended later with more type switches or other information to allow them to be loaded/handled differently. Applying this nesting concept to the configuration at each step of the above scenario would result in something like this:

server:
  application:
    port: 8080
    protocol: https
  admin:
    port: 8081
    protocol: http
  internal:
    port: 8082
    protocol: http
	
products:
  source: database
  database:
    url: //localhost/database
    user: productapp
    pw: password123database:database:

product_notifications:
  type: email
  settings:
    host: smtp.app.com
    port: 442
    username: productapp-email
    password: thisisgettingsillynow

There would still be some heuristics involved with handling missing keys, defaults, type switches, etc. However, the hierarchy between features is maintained and there are now no sibling heuristics. As a result, the config-handling code can be modularized to then read each key in a configuration in isolation. The products in this improved example could be handed to .handleProductsConfiguration(ProductsConfiguration conf), a function that now doesn’t also need to receive sibling use_database and database keys.

A strict tree hierarchy maps cleanly onto polymorphic systems and is surprisingly easy to implement using libraries such as jackson (java):

@JsonTypeInfo(
  use = JsonTypeInfo.Id.CLASS,
  include = JsonTypeInfo.As.PROPERTY,
  property = "source")
interface ProductsConfiguration {
  <T> T accept(ProductsConfigurationVisitor<T> visitor);
}

class DatabaseProductsConfiguration implements ProductsConfiguration {
  @JsonProperty
  private String url;
  @JsonProperty
  private String user;
  @JsonProperty
  private String pw;

  @Override
  public <T> T accept(ProductsConfigurationVisitor<T> visitor) {
    visitor.visit(this);
  }
}

The above is compile-time verified, type-safe, and involves no conditional logic. Completely new product sources can be configured by defining a configuration that implements ProductsConfiguration and updating the relevant visitor. Clearly, this implementation requires more up-front engineering than the original design. However, for projects with changing specification (i.e. most projects) allowing for flexibility pays for itself.


The Cycle of Platform Maturity

In programming, “mature” is a funny word. When I first started programming, it had negative connotations. A “mature” programming language was boring, had complicated tools, over-complicated design patterns, and would likely die off soon. By contrast, non-mature languages were exciting, free-form, and carving out a new frontier in software.

My opinion has evolved since working as a programmer rather than studying as one. Mature now implies that a language will have have good debuggers and profilers. It will have an established build and testing systems for the language. Its core libraries won’t change midway through a critical project. Most importantly, though, mature languages (tend to) have moved past the same boring old problems that every language ecosystem encounters in its early days.

JavaScript’s history is a good case study for this. I started on JavaScript in its early (IE4) days. Back then, I would open notepad, write a few HTML tags, a <script> tag, and voila I could see stuff happening in my browser that my code put there. Simple.

Unfortunately, it’s unlikely that my code would work in any other browser. The JavaScript language platform was horribly fragmented. Internet explorer’s implementation of browser functionality was non-standard. Even the most trivial DOM manipulation or standard library call would behave differently between IE and Firefox. It was not unusual to have a lot of if(browser === IE) tests throughout a codebase. This made working with JavaScript less simple.

jQuery stabilized JavaScript development somewhat but it wasn’t officially supported as a language standard. Because of that, a large amount of JavaScript libraries ended up with a tangled mess of hidden dependencies on different versions of jQuery. This sometimes resulted in dependency hell. Also, adding jQuery to a website project was considered the “done” thing at this stage in JavaScript’s history. Because of that, you couldn’t just open notepad and write a few tags and some JavaScript, you would also have to import jQuery and become familiar with that. The complexity was creeping in.

Later, Firefox, chrome, and—because of the blockbuster release of the iPhone—safari began to eat away at IE’s market share. They were standards compliant, IE wasn’t. Web developers started to make sites that wouldn’t work in IE, which prompted IE users to install IE’s competitors. The overall effect of this was that a growing share of users started using standards-compliant browsers. JavaScript, the language platform, became (mostly) stable and predictable.

Because of this stability, web developers could build bigger, more complex, sites. Animations, realtime updates, and dashboards became more popular. As a consequence, tools for debugging this complexity became more valuable. In an effort to secure their market share, Chrome and Firefox began to include developer tools which enabled developers to set breakpoints and profile their websites. Having these tools made web development a lot nicer; however, the workflow was now more complex.

The next issue developers encountered was managing libraries and dependencies. Traditionally, libraries would be manually downloaded and included in <script src="..."> tags in the page. However, it was tedious to do this once projects became split over many files. It was also inefficient because the HTTP/1.1 protocol doesn’t suit downloading many small files.

Backend frameworks, notably Rails, began to offer asset pipelines for combining, minifying, and including scripts in websites. This enabled developers to build bigger codebases fragmented over many files; however, they would also need to learn how their build system links files and use appropriate strategies to debug minified files.

Once using a build system with JavaScript became the norm, developers continued to up the ante, creating more complex web applications. However, the JavaScript ecosystem was beginning to crack at the edges under this new complexity. Managing and installing the ever-increasing glut of project dependencies became tedious. bower and npm raced to fill the gap, allowing developers to automatically acquire and install dependencies. It became much easier to install JavaScript libraries; however, you’d have to learn bower or npm.

Now that developers could install many libraries easily and painlessly, they installed many libraries easily and painlessly. The JavaScript language didn’t support declaring dependencies on a module-by-module basis. Consequently, dependency hell ensued. Libraries such as requirejs came along to fix the problem, allowing developers to declaratively state a module’s dependencies. Developers no longer had to reorder <script> tags to make the application work; however, they did need to learn requirejs.

At this stage, JavaScript was beginning to become mired in libraries for doing stuff that the language ought to support. ES6 (/ ES2015+) was specced out to deal with some of those issues. It offered standardized higher-level language abstractions for building bigger systems (classes, getters, setters, proxies), language-level protections (const, Symbol), and declarative library loading (import). However, it also made the language a little more complex.

Today, JavaScript is a (mostly) standardized platform with good debugging and profiling tools, batteries-included libraries for standard operations, a popular dependency manager (npm), and module-level dependency management to prevent namespace pollution (import). It finally has all the features every other old-fogie language has.

However, if I was starting out today, I wouldn’t use JavaScript. Clearly, the future is akewley-lang. It’s amazing. You can open a text editor, write plot my data as a bar chart please and see stuff happening right away. It’s so simple. Now, all I need is to generate that bar chart and have a whizzbang animation when it loads! Luckily, there’s a library for that on the akewley-lang community forum I can download. akewley-lang is so much simpler than JavaScript, ruby, python, C#, F#, java, C++, and C. The designers of those complicated languages (and associated ecosystems) didn’t know what they were doing.


Just Make Stuff

Software development can be a bottomless well of languages, build environments, platforms, coding standards, and cat analysis algorithms. For someone who likes to study and learn new techniques, this is great. A bottomless well of stuff to better at. However, after almost ten years of doing just that, I’m willing to make a bold statement: curiosity can be maddening and knowledge can be hindrance.

Curiosity is maddening because the more I learn about assembler, linux, garbage collection, compiler theory, engineering techniques, software principles, functional programming, networking, etc. The more I’m convinced that I know almost nothing about computers.

Every time I turn over a figurative “small pebble” in the software world I end up discovering that there’s entire industries dedicated to what’s under that pebble. Even a statement as innocuous as “I’d love to know how my CPU actually sends input requests to this USB hard drive” - can yield a very long (but interesting) explanation.

Complexity isn’t unique to software development. However, combined with curiosity, software is one of the few fields where there is practically no inertia to ping-ponging. If I like the look of writing client side 3D games in C++, I can be doing it in under 20 minutes for free. 30 minutes later I can be writing a plugin for firefox that makes it purr whenever I visit a website with “cat” in the URL.

You’d think that healthy curiosity is a good thing: over time, it grows knowledge. However, knowledge can become a hindrance if you leverage it a little too much. A simple “Hello World!” application can scream out with potential pitfalls - do we have standard IO? What about cross-platform? What about internationalization? Is there a more flexible design for delivering “Hello” and “World” to the console? What character encoding is being used?

This, I think, is why there’s a significant amount of very young, inexperienced, software developers making interesting—sometimes very valuable—products in the field. They don’t know that their servers will implode once they have a few thousand concurrent connections. They don’t know that it’s going to be very painful to refactor that UI they hacked together with four bleeding-edge UI frameworks. They don’t know that there’s a big competitor in the market with a lot of money. They shoot first and ask questions later, which makes them one of the few demographs doing the shooting.

We’re all aware of this phenomenon. What I’m saying isn’t news. However, here I am, writing heavily designed, SOLID certified, high unit-test coverage software that few people use. Meanwhile, there’s teenagers out there hacking together VR experiences in which I get to fly around space as Thomas the tank engine.

What’s missing? Isn’t the reason we get curious, study, and train so that we can make even cooler stuff? Yes. However, as presented, curiosity is maddening and knowledge can be a hindrance. The trick, I’m slowly convincing myself, is to have a cool and interesting problem or project to apply that curiosity and knowledge to and not get hung up on the details. Otherwise, you may end up spending hours analyzing the random noise of a bingo game.

So, with that in mind, over the next year, I’m going to be making a conscious effort to just “make stuff” and care alot less about how philosophically pure its software engineering is or what amazing 0.5 % speed bonus I’ll get if I change to some slightly newer framework half way though.

Now, all I need to do is write a design document, get people to approve on it, do a technology review, select a programming language, ah yes, mutability is evil! It’s the reason I haven’t created a decent app yet, I should look into Haskell! Ok, now possible test harnesses, oh I wonder what’s the difference between this test harness and this other very similar test harness, that could really have a big impact on my project and I don’t want to make an error before I even start…


PlateyPlatey

This blog post is about PlateyPlatey, a webapp I am developing to solve the (quite specific) problem I outline in this blog post. PlateyPlatey is in the very early prototype stages of development. I am having fun seeing how far I can develop it. If it saves even one researcher from the complete ballache of reorganizing research data, I will be ecstatic.

Once upon a time, I was a lab researcher. Boy, was it fun. This one time, I spent a month trying to crystallize a particularly annoying-to-crystallize molecule only to find that a decent proportion of the batch I was working with was actually silicone grease which, consequently, made crystallization impossible. Another time, my wrist twitched midway through adding LiAlH powder to a reaction and I managed to fill an entire schlenk line and vacuum pump with a mixture of corpse-scented phosphine and hazardous LiAlH powder. I now can’t smell phosphines without vomiting - fun.

Clearly, I wasn’t a particularly good synthetic researcher. This is probably because I get jittery when faced with anything that lacks a console or a robotic arm (take note, ladies). So, with that in mind, I reoriented my focus toward analytical and automation techniques.

Boy, was it fun. This one time, I spent several long days hunched in a fume cupboard with a 96-well plate (below) in one hand and a multi-tipped pipette in another. In most experiments, I had to prepare enough of these plates to tile the roof of a small mansion.

A 96-well plate - something Im glad to see the back of.

After preparing the plates, I’d spend maybe another day or two reorganizing my plate’s data and input variables so that they could be imported into three different pieces of analysis equipment as run files. After the analysis was complete, I would then spend another day or two reorganizing the data into a format that worked with the analysis and visualization software I was using at the time.

After all that, I’d end up with—if I’m lucky—one line of a multiline plot. If the plot told an interesting or novel story (< 5 % of all plots) it might have ended up in the thesis, doomed to an eternity of sitting on a dusty hard-drive somewhere. If it was very interesting (< 0.1 % of all plots) it might have ended up in a journal - fun.

I was spending a lot of time preparing samples and rearranging data. Unfortunately, my samples couldn’t be prepared with our automated dispensers. However, when it came to data organization, the ball was in my court; accordingly, I invested a lot of time developing methods for linking sample and analytical data together.

My first approach followed the path of the programmer - “It’s all about the data structure!”. I designed a strictly-formatted excel spreadsheet. The spreadsheet layout was like something you’d get from an SQL query: each factor/variable was a column and each sample was a row (below). In that format, it’s very easy to filter and reorganize the data: most equipment, plotting software, and scripting languages are compatible with a row-oriented data structure. Further iterations of this design included unique sample ID generators, and helper sheets for re-joining data by ID.

Well Stirring Speed / rpm Temperature / oC Probability of working / %
A1 200 20 0
A2 200 25 0
A2 200 30 0

However, this design had problems. It wasn’t compatible with my—quite physical—mental model of a plate’s data. When I’m mid-back deep in a fume cupboard, I really didn’t like having to perform the following mental dialog each time I plated up a sample:

I’ve just pipetted 2 uL of my sample into row 2, column 3, of the plate and, uh, that’s row 27 in the excel sheet because it rearranges the plates row by row and it’s column 3 in the excel sheet because that’s the column I’m putting sample amounts in. Ok! Next sample.

Overall, this design shortened the data management step at the cost of making the (already quite tedious) sample preparation step more tedious.

My next approach followed the path of the designer. I designed a spreadsheet that lays out information in a plate-like layout and, using formulas, translated the data into a standard row-oriented spreadsheet. This was a direct mapping of the physical world (a plate) to the datastructure (a table).

This was much easier for to use, however it was quite hacky—VBA isn’t my personal weapon of choice—and it had several shortfalls:

  • Some control was sacrificed by using VBA/excel rather than a general-purpose programming language. UX was especially impacted. For example, I couldn’t find a clean way to translate selection logic across plates & tables such that it was maintained as I moved between those formats.
  • The plate-to-table translation was one-way. Users were expected to input data into a plate and formulas were used to translate it to the appropriate cell in the table. User’s couldn’t directly import the data in a tabular form and then manipulate it as a plate. Making it two-way would require even more VBA hacking.
  • Each “plate” (or, in effect, “column”) occupied an individual sheet in the workbook. Therefore, there was a lot of sheet switching occurring when moving between plates/columns.
  • Because it’s excel, it only really works for tabular arrangements of samples. Circular/carousel sample racks—which are more popular than you’d guess—don’t work well in it.

Even with those shortfalls, though, the excel workbook mostly served its purpose and I used various iterations of it throughout my PhD. I didn’t look back at it. Since then, though, I have worked around several other academic and industrial labs to find a worrying pattern: a lot of research groups have followed a similar path.

Apart from finding out that, no, I am not a special snowflake for creating a custom excel workbook, I found a potential gap in the software landscape - how exciting.

So, under the (potentially mistaken) belief that other people might find it useful, I decided to try and do a proper job of translating between physical (i.e. plate) and “ideal” (i.e. tabular) layouts in a UI. I even thought up an amazingly imaginative name—PlateToTable—for the product. However, Jane Whittaker thought it sounded like a restaurant, so I eventually settled on PlateyPlatey after it won out over PlateyMcPlateFace.

PlateyPlatey is a webapp that lets you select wells in a plate layout and then assign values to those wells by typing stuff in. It’s currently in the very early stages—it’s essentially a proof-of-concept at the moment—but there are several key design goals I’m trying to stick to:

  • It should be easy to get the tabular data out of the app and into other table-based tools (Excel, Libreoffice Calc, etc.).
  • There should be a clear association between the plate layout and the table layout.
  • It should be easy to select (patterns of) wells/rows
  • It should be easy to configure new physical layouts. Plate configurations should be provided declaratively.
  • Rearranging data should be easy.
  • It shouldn’t assume one particular UI design: users should be able to configure the appearance, layout, keybinds, and behavior of PlateyPlatey to suit their needs.

The last one has proven particularly difficult to aim toward. However, I am quite happy with how far it’s come along in the short amount of time dedicated to it (~2 months of weekends). I look forward to developing it further and, if you’re a frustrated lab worker who is sick to death of populating a plate, check it out. Even if you think it sucks—because in a lot of ways, it does suck at the moment—I’d really appreciate feedback on it.


Going Full Circle

Write code to control your application. Configure the code’s behavior using standard configuration files. After that, go full circle: write code to control the code that runs your application. In this blog post I’ll explain when you might want do that.

Software ideas tend to start out as hard-coded prototypes. As those prototypes are built upon (don’t do this) or are redeveloped to full-blown applications, the hard-coding is supplanted by configuration files.

Most big applications we take for granted are implemented this way. I’m even willing to proclaim that most big applications got big because they were implemented this way - their configurability gave them a long-term edge over less-configurable competitors. Configurability, therefore, is an important part of any aspiring application’s architecture. Consequently, choosing the right format for your configuration data is important.

For most applications, I personally use very standard configuration file formats (e.g. .json, .xml, .ini). However, on my current project—PlateyPlatey—I’m getting very close to going, what I call, “full circle” with my configuration files by writing them in a scripting language. This might seem extreme, but allow me to give an example of how a typical project would, if it got popular enough, go down that path:

Make an application that prints labels. The application must prompt the user to type the label’s text into a textbox. After the user has finished typing, they will press the “print” button, which will print the label.

The (javascript-style) pseudocode this app would look something like this:

printButton.onClick(() => {
	const text = textBox.value;
	print({ content: text, fontSize: 14, fontFamily: Arial });	
});

Easy, but surely most developers know what’s coming next:

That label software is neat; however, some users would really like to be able to control the size and font-family of all the text that appears on the label. This does not need to be integrated into the UI.

Configuration files come to the rescue:

{ "fontSize": "14pt", "fontFamily": "Arial" }
printButton.onClick(() => {
	const text = textBox.value;
	print({ content: text, fontSize: config.fontSize, fontFamily: config.fontFamily});
});

As the software gets more popular, special cases begin to creep in:

The label software is great - a lot of people are using it now. However, a new user, George, mentioned that he’d like an “Are you sure you want to print?” confirmation to appear when he clicks print.

Configuration files also come to the rescue:

{
	"showAreYouSureYouWantToPrintDialog": true,
	"areYouSureYouWantToPrintText": "Are you sure you want to print?",
	"fontSize": "14pt",
	"fontFamily": "Arial"
}
function print(text) {
	print({ content: text, fontSize: config.fontSize, fontFamily: config.fontFamily });
}

printButton.onClick(() => {
	const text = textBox.value;

	if(config.showAreYouSureYouWantToPrintDialog) {
		if(confirm(config.areYouSureYouWantToPrintText)) {
			print(text);
		} else return;
	}
	else print(text);
});

Notice, though, that I’ve had to change the code behind my print button to facilitate this special request. Special cases are the bane of all software developers. As any software gets more popular, they start to pop up alot more frequently. Integrating each one into the codebase is asking for trouble.

If reconfiguring parts of the application is going to happen often, then the next step is to provide configuration options for adding extra behaviors. The logical next step would be to implement something like this:

{
	"extraBehaviors": [{
		"when": "print-button-clicked",
		"do": "show-a-confirmation-dialog",
		"associatedData": { "confirmationText": "Are you sure you want to print?" }
	}]
}

However, it’s difficult to capture the essence of how the application should behave in such a configuration format. Do I just want to show a confirmation dialog or do I want the print operation to stop if they clicked no in the dialog? As the software grows in popularity, there will almost certainly be a special case that requires either behavior.

What we really need to capture is new control flow and logic in the configuration. Hm, I wonder if we know of a data format that lets us describe flow an logic?. Wait—yes—we do: programming languages! The circle is complete:

function parse(expr) {
	// Parse a configuration expression, using the current
	// application expression context to resolve any function
	// calls.
}

// These are "native" javascript commands that are exposed to the
// expression on evaluation. They act as the glue between
// expression-land and javascript land.

function subscribeTo(eventName, callback) {
	applicationEvents.subscribe(eventName, callback);
}

function showConfirmationDialog(message) {
	return confirm(message);
}

function setConfigValue(key, value) {
	config[key] = value;
}

expressionContext.register("subscribe-to", subscribeTo);
expressionContext.register("show-confirmation-dialog", showConfirmationDialog);
expressionContext.register("set-config-value", setConfigValue);

printButton.onClick(() => {
	const eventResponse = applicationEvents.broadcast("print-button-clicked");

	if(eventResponse) {
	    const text = textBox.value;
        print({ content: text, fontSize: config.fontSize, fontFamily: config.fontFamily});
	}
});
(set-config-value "fontSize" 14)
(set-config-value "fontFamily" "Arial")

; George's confirmation dialog
(subscribe-to "print-button-clicked" (lambda ()
	(if (show-confirmation-dialog "Are you sure you want to print?")
		true
		false)))

Although I’m greatly simplifying the process, the idea is there: code low-level primitive functions that allow a scripting language to control the application. I call this “full circle” because you end where you started: writing code.

It seems like a lot of work but, unlike the code running your application, you have complete authoritarian control over how, and what, your scripting language can do. This provides very tight encapsulation and, more importantly, prevents your auxiliary functions from trying to do too much with lower-level functions.

This concept isn’t new—EMACS (elisp), Excel (VBA), and the web (javascript) are prime examples of it—but it doesn’t seem as though many modern, big, single-page webapps have started to explore web frontend scripting languages yet. I plan to explore how useful the idea really is with PlateyPlatey over the next few weeks.


Free Online Courses

With the rise of free online courses, it’s becoming much easier to learn programming. Over time, more people will learn programming through those courses. Overall, this is a good thing. It means I’ll be able to buy a singing fridge sooner. However, beware of the dragons.

Big IT companies are suspiciously keen to provide free online software courses. Take bigdatauniversity.com for example. It’s a very slick site with lots of content. However, that content is tainted by the site’s ulterior motive: it doesn’t just want you to learn big data, it wants you to learn how IBM is the company to use if you’re working with big data.

So, instead of explaining abstract big data concepts, some of the material descends into the corporate agenda: how big data makes corporations’ wheels turn 15 % faster, improves customer turnover by a factor of 3, and how IBM could’ve gotten you all of that yesterday - for a small fee.

This agenda ties into the other dragon of free online courses: the fact that most tend to ignore the installation, deployment, and integration of software. With most courses, you’ll never install an interpreter, set environment variables, open a port on your firewall, deal with that incorrectly-versioned dependency a library uses, or integrate your code against a 20-year-old (but working and stable) x-ray diffraction library written in FORTRAN.

Not learning those skills could be a big loss for new developers: the murky edges of cross-language, cross-process, and cross-application integration is where a decent amount of magic tends to happen. Think about how hopeless most web frameworks would be if they could only use databases, web servers, and admin systems written in their core language.

Big IT companies make alot of money capitalizing on developers not knowing that stuff. With the rise of powerful, reusable, and composable open-source libraries, the big boys have evolved from implementing frameworks and application architectures themselves to integrating open-source software stacks. They’re making a killing selling web frontends for Hadoop, GUIs for rsync, and lipstick for Wordpress.

Companies embedding themselves in education or “borrowing” from the open-source movement isn’t a new thing: Microsoft has been handing out free copies of Visual Studio to Universities and repackaging platform-independent languages since forever. It’s also not a bad thing: it can be fun to write an application in C# with Visual Studio. However, I’d highly recommend that new developers try to invest some time into learning integration and deployment. Who knows, one day you might end up making billions selling a haskell-based data analytics stack.


Projects That Break You

Everyone has had one: a project that’s so hard that it breaks you.

I was recently flicking through my old project folders and found that very project. It was an innocuous form-generating WPF application (source) that I thought I had thoroughly buried in my projects-dropped/ folder. However, while browsing through that folder tonight, the project managed to catch my eye.

You can see more details about the application on the Github page. It was an application that was trying to solve every problem that COSHH-form-generating software just needed to solve. It needed composable document models. It needed a variety of IO methods. It needed to be functionally pure. It needed to be architecturally easy to understand. It also needed to be fast, run on any PC, and sell nationwide on day one.

Combine those out-of-control—usually regularly changing—requirements with (at the time) relative inexperience in C# application development and the ending is obvious. For ~6 months, I spent almost all of my available spare time coding and studying coding to try and roll out this perfect, crystalline, piece of software. Eventually, I ran out of steam—I had a PhD to finish—and called it quits.

With that in mind, you might be tempted to think I’m sad when I look at one of these failed, Adam-breaking projects. Not so. Tonight, I enjoyed looking through this particular codebase. Sure, it didn’t deliver its (for my skill level at the time) over-ambitious promises, but I learnt more about writing code in those 6 months than I did in the 5 years previous. I realize how much I had gotten very wrong, but I also realized how much I nearly gotten very right.

One lesson to take from this is that releasing a product that contains a few bugs and is missing a few features today is better than never releasing the perfect one. See Microsoft Windows/Office version 1 for an example. The primary lesson, however, is that failure is only truly failure if you don’t learn anything from it.

This post is an acknowledgement of my own little slice of failure. I hope there are more failures to come, although I also hope they aren’t all I end up getting.


Don't Spare the Low-Level Details

Abstractions are wonderful things, that is until they leak. At that point, I tend to wish someone didn’t spare me the low-level details.

Recently, I was tasked with developing a system that continually logs temperature readings from 12 hotplates. The plates’ use an RS232 communication interface, which is very easy to negotiate.

With only those high-level details available, I declared that the logging software would be an “afternoon in n’ out job” and created an appropriate design for the timescale / effort:

  • Hard-code the 12 plates’ COM/ttyS port identifiers and output paths
  • Loop through each port/path pair
  • Send the GET_TEMPERATURE request
  • Write the response to the output file

Job done.

Port identifiers can change.

Wait, who are you?

The god of “You don’t know enough yet”.

OK. Why would port numbers change? They’re physically wired into the RS232 box. It’s not like they will re-wire themselves overnight.

Port identifiers are a convenient abstraction the operating system, or your RS232 box’s drivers, provide to make it easier for your application to request handles to them.

Fine, Mr. Theoretical. That might be the case, but the OS won’t change port identifiers in any real circumstance.

Try plugging your USB-to-RS232 box into a different USB port.

Oh crap, some of the devices have switched around! Now my software is writing temperature readings from a different plate to the same file!

Yeah, about that, you aren’t actually writing temperature readings as regularly as you think.

Why not? The loop iterates through the 12 hotplates and uses a timer to synchronize the next iteration. The temperature reading is practically instantaneous.

Experimentalists regularly turn the hotplates off, especially overnight.

Ah yes, they do that sometimes, but I’ll just add a small timeout that will skip a measurement if a response does not come back in a timely manner. I’ll set the timeout to ~100 ms, which is way smaller than the measurement interval (1.5 sec).

The interval between measurements is now greater than 1.5 seconds, which is greater than specified.

That’s mathematically impossible! Even if 11 balances were turned off then the maximum delay between reads would be ~1100 ms, which is far below the interval.

Disk writing takes a non-negligible amount of time. Adding that time to your timeout interval pushes your cycle time to over 2 seconds.

Clearly, that disk is far too slow. I’ll install a new one.

You can’t. The experimentalists are pointing the output to a network drive, which is why writes can occasionally go slowly.

Fine, I’m sure they will live with a slightly longer interval if I explain the situation, at least the application isn’t skipping measurements.

Your application is missing measurements. Whenever the network goes down, your application either crashes or (at least) skips measurements.

OK, I’ll write a memory cache that holds onto any measurements that didn’t write to the output and then, when the network goes back up, I’ll flush the cache to the output folder.

Your application now has a memory leak.

OK, I’ll write it all to the local system disk—that surely won’t go offline—and then, each iteration, try to copy the data file to the network drive.

Your application now has a disk-space leak. Because you are copying the entire data file each iteration, your application now runs very slowly once the output goes beyond a reasonable size.

OK, I’ll keep track of what—specifically—didn’t flush to the network drive. I’ll also keep cache and output limits to prevent memory/drive leaks. Job done. Now that I’ve got a reliable output algorithm and a timeout for whenever the plates are off, this entire system is bombproof.

Just because the plates have sent a response in time does not mean they have sent the response you wanted.

That’s ridiculous! I’ve asked for a temperature reading, they respond with a temperature reading. Why would they respond with anything else?

RS232 is only a serial communication standard that specifies how signals are sent between your computer and a device. It does not provide a TCP-like transport abstraction that automatically deals with parity checks or flow control. Whenever your hotplate runs into a parity error, your application will encounter unexpected behaviour.

I’ll put parity & handshake checks into the code then. Now this application is surely done!

Mike wants a big green “Start Measuring” button.

Oh for fu-


Turn any Long-Running Command-Line Application into a FIFO Server

I’ve been using a web scraper—named scrape-site, for the sake of this blog post—that takes around 5 minutes to scrape a website. During one of my scrape sessions, I’ll continually look for more sites to scrape. Because it would be annoying to wait, I’d like to be able to immediately queue any site I find; however, scrape-site is just a plain-old command-line application. It wasn’t designed to support queueing.

If scrape-site was a UI-driven commercial product, I’d be furiously writing emails of displeasure to its developers: what an oversight to forget a queueing feature! Luckily, though, scrape-site only being a single-purpose console application is its biggest strength: it means that we can implement the feature ourselves.

If I had a list of sites (sites-to-scrape) in advance then I could use xargs to do the following:

$ cat sites-to-scrape
http://siteA.com
http://siteB.com

$ xargs -I {} scrape-site {} < sites-to-scrape

This works fine; however, it requires that I have a complete list of sites-to-scrape in advance. I don’t. My workflow has a dynamically changing queue that I want to add sites to. With that in mind, one change would be to omit the sites-to-scrape input, which will cause xargs to read its input from the console:

$ xargs -I {} scrape-site {}
http://siteA.com/
http://siteB.com

This is better: I can just paste a site into the console and press enter to queue it. However, I’m now restricted to writing everything into the console rather than being able to submit list files. In effect, I’ve gained the ability to add sites dynamically (good) but can now only write, or copy and paste, items into a console window (bad).

What we need is a way of having the xargs -I {} scrape-site {} application listen on something that can dynamically receive messages from any source at any time. One way to do this is to setup a server that listens for queue items on a socket. Applications can then just write messages to that socket.

That would require a fair bit of coding it if was done bespokely. However, luckily, we live in a world with netcat. I wrote about the fun and games that can be had out of netcat previously. I’ve been falling in love it ever since. It’s a fantastic union of network protocols (TCP/UDP) and standard input/output, which is exactly what we need.

With netcat, almost any command-line application can be setup as a FIFO server:

$ netcat -lk -p 1234 | xargs -I {} scrape-site {}

This command causes netcat to listen (-l) on port 1234. Whenever it receives a message on that port, it will write it to its standard output. In this case, its standard output has been piped into an xargs instance that, in turn, calls scrape-site with the message. netcat can also be told to keep listening after receiving a message (-k).

With the server setup, we then configure a client command that sends a message to the server. This can also be done using netcat:

$ echo "http://siteA.com" | netcat localhost 1234

This echoes the site into netcat’s standard input. The message is then sent to localhost (assuming you’re running the server on the same computer).

I found this approach very useful during my scraping sessions because I could just continually queue up sites at will without having to worry about what was currently running. Because it’s so simple, the entire thing can be parametrized into a bash script quite easily:

scrape-srv.sh

#!/bin/bash

# Usage: scrape-srv

netcat -lk -p 1234 | xargs -I {} scrape-site {}

scrape.sh

#!/bin/bash

# Usage: scrape site_url

echo "$1" | netcat localhost 1234

Another benefit of this is that I can now run a remote queueing server, which I doubt scape-site was ever designed for. The magic of the Unix philosophy. I imagine this pattern will come in handy for any long-running or state-heavy application that needs to continually listen for messages.


IECapt for Corporate Website Slideshows

Big companies tend to use a variety of webapps to show their news, stats, and announcements. Some locations in the company—usually, the tearoom—might contain displays showing some of that material. A clean way to automate these displays might be to use a relevant API or script for each webapp. However, this assumes two things: the site has an API, and you have enough time to use it.

Another approach is to automatically take screenshots of the webapp. A little dirtier, but much easier to implement and much more amenable to change. Here, I’ve written up that approach.

IECapt

Corporate websites tend to be designed for Internet Explorer (IE). They also tend to have bizarre authentication, redirects, and security certificates. Most website screenshot applications can have difficulty dealing with that. With that in mind, I followed one principle:

The application must behave as if I opened a link in IE, took a screenshot, cropped just the website, and saved the result

IECapt ticked those boxes. It also had the added benefit of being dependency free (C++, only uses windows 7+ libraries). Further, it is open-source and entirely contained in one C++ file, so it was easy to tweak and bugfix (my tweaks).

Scripting

Scripting involved giving IECapt its source data and setting up cronjobs. This process would likely be running unsupervised by me, so it’s important that non-developers could edit the system. To facilitate that, I created a simple .csv file containing the sites of interest:

$ cat webpages-to-screenshot.csv

URL,Output Filename,Minimum Width (pixels),Wait Before Taking Screenshot (ms)
https://corporate-homepage.com,corporate-homepage.png,1200,5000
https://corporate-share-price.com,share-price.png,1200,5000
https://safety-stats.com,safety-stats.png,1200,30000

.csvs can be edited in common software such as excel, so non-programmers can add more sites without having to edit code. The next stage was to write a basic ruby script that iterates through the csv and runs IECapt:

# $ cat generate_webpage_screenshots.rb

require'csv'

sources_to_image =
  CSV.
  read('webpages-to-screenshot.csv').
  drop(1) # header

sources_to_image.each do |url, filename, min_width, capture_delay|
  
  arguments = [
    "--url=\"#{url}\"",
    "--out=out\\#{filename}",
    "--min-width=#{min_width}",
    "--silent",
    "--delay=#{capture_delay}",
    "--min-height=1500"
  ].join(' ')

  # Take the screenshots
  `IECapt.exe #{arguments}`
end

The final convenience step is to batchify the script so that users can click it to run a screengrab session:

$ cat generate-website-screenshots.bat

ruby generate_webpage_screenshots.rb

Automation can be configured by setting up a Windows task to run the batch file every so often (say, once an hour).

The Slideshow

To start the slideshow, users open an image in the output folder (out/) in Windows’ native photo viewer followed by turning on slideshow mode. Slideshow mode then iterates through images in the folder. This exploits the fact that the Windows photo viewer lazily loads the next file in a directory. Lazy-loading means that, even if new images are added to the output folder, the photo viewer iterates through all current images in the folder.

The main shortfall of the slideshow is that the Windows photo viewer does not allow you to run multiple instances of slideshow mode. Some of our presentation computers are multi-screen, so it’s imperative to get around that. However, I’m yet to come across software that supports multiple instances and multiple monitors, and lazy loading.

Conclusions

Overall, this setup lets us:

  • Write URLs into a simple csv file
  • Run a one-click batch script
  • Start a slideshow using a familiar application most users are familiar with (Windows photo viewer)
  • Automate the process using Windows’ task scheduler

Reflecting on this micro-project, while it isn’t a super-sophisticated application that bespokely solves the problem perfectly, I believe it follows a few principles which are important for any application—big or small—to try and follow:

  • There’s a clear data source (webpages-to-screenshot.csv)
  • There’s a clear “operation” (generate-webpage-screenshots.bat)
  • There’s a clear output (webpage-screenshots/*)
  • It’s transparent how that output data is used (windows photoviewer)

I’m trying to follow those principles in my bigger projects. It’s got nothing to do with code quality or complexity: it’s architecture. Architecture lets a non-developer open the project folder and take a pretty good guess at what’s going on. Which is exactly what they’ll need when it needs the occasional kick. Regardless of its size or complexity, well-architected software should try its hardest to present a coherent model of its “world” to the user.


Netcat: The best tool for playful backdooring

Just because it made me giggle so much, I thought I’d write up a classic shell prank: pumping messages into someone else’s shell terminal.

If you can ssh into the computer, then you can write messages to other user’s terminal with wall:

adams-ssh-account@computer $ echo "You suck!" | wall
target-user@computer $

Broadcast Message from adams-ssh-account@computer                                      
        (/dev/pts/1) at 21:48 ...                                              
                                                                               
You suck!

However, that’s making it far too easy for the target. The message itself gives the game away! Also, you’ll need an account on the target computer, which means you’ll have to get sudo access. People might leave their computers unlocked but it’s unlikely they’ll have root (at least, without you knowing their password).

One of my favorite linux utilities is netcat. netcat allows you to listen on a port (-l). When a data is received on that port, it will write that data to the standard output. It would normally send responses using its own standard input, but that can be disabled (-d). You can also prevent it from closing when the input stream contains an EOF (-k). Because of these features, netcat is a core component of some backdoors. We’re also using it as a backdoor, but our purposes are fun, so it’s ok.

If your target has a habit of leaving their computer unlocked but you can’t get sudo access, the approach I take for launching this prank is to add this line to .bashrc:

targets-unlocked-system@computer ~ $ cat .bashrc

# Sneaky sneaky, open port 38487 (arbitrary) and listen
# for incoming messages. Write STDERR messages to /dev/null
# so that any bootup issues (e.g. port in use)
# get hidden from the target
netcat -lkd -p 39487 -q -1 2> /dev/null & 

Then, whenever I see the target hacking away in the shell I connect from my own:

adamkewley@own-computer $ netcat target-pc-addr 38487
Hey
Hey you.
Yeah you!
I'm in your computer.

And this, folks, is why you shouldn’t leave your computer unlocked (although please do, I can get alot more devious).


Fakeonium

I work with large research data systems. One of those systems—lets call it Choogle, for the sake of this post—is nearly two decades old, which is practically forever in the IT world, which is impressive. Choogle has been around so long that much of the lab’s analysis equipment is tightly integrated with it. For example, a researcher can enter a Choogle ID into an analysis instrument to automatically link their analysis with the sample’s history. This is neat, provided the researcher incorporates Choogle as a central component of their workflow.

From a top-down viewpoint, making researchers submit their sample’s information to Choogle is a better situation than each researcher having a collection of loosely formatted labnotes. Designing lab equipment to require Choogle is a way of encoraging conversion, which is the intention.

What happens, though, if researchers don’t particularly want to use Choogle? Maybe they’re already incorporated a similar (non-Choogle) research system, or maybe they just don’t like the UI. When those researchers want NMR plots, the Choogle requirement becomes a barrier.

A barrier-smashing gameplan emerges. Researchers enter the bare-minimum amount of information required to yield a valid Choogle ID and use that ID to perform analysis. Choogle developers respond by adding validation to force researchers to enter more information. The obvious countermove develops: fill syntactically valid—but garbage—information to bypass the form’s validation.

This cycle continues forever because it’s fundamentally an arms race between researchers, who can “tech up” at will, and Choogle, that can only deploy rigid countermoves. Eventually, Choogle’s developers give up on trying to police the system with code and turn to human engineering: make the researcher’s bosses enforce compliance. However, that just transforms the a human-vs-machine arms race into a human-vs-human one.

I’ve seen this pattern emerge many times. It’s especially prevalent when the system is perceived to be a timesink by its users (that’s usually a design and communication challenge). In Choogle’s case, PhD-qualified scientific researchers can be particularly clever in their validation circumvention. Unfortunately, I’m a data scientist tasked with mining data from Choogle. One thing I’ve got to do is filter out all the “placeholder” samples submitted by devious researchers. The arms race has made my job hard.

For example, one thing I analyze is what components are used in mixtures on Choogle. Easy data to mine. However, there’s a validation rule that prevents researchers from creating a zero-component mixture on Choogle. Some lab analyses only allow “mixture” Choogle IDs. So, knowing the ball game, guess what the researchers do? Of course, thousands of mixtures containing one ingredient (usually water, because it is always going to be available on any chemical research platform).

Choogle, and the tightly-integrated lab kit, is extremely expensive to modify at this point of their lifecycle (estimate a cost a freelance developer would charge to add a validation rule to an <input> element, multiply your estimate by at least 50). Because of that, I’m thinking of inventing a brand-new chemical ingredient in Choogle: fakeonium.

Fakeonium is a farcical chemical that researchers can enter as a mixture ingredient to bypass the one-component validation rule. I can easily filter out fakeonium-containing mixtures—much easier than filtering out the other 500 farcical ingredients. Other data scientists might be pulling their hair out at the idea of this approach, the data must be pure, but real-world constraints and the limitations of IT systems always lead to unforeseen usage patterns.

Fakeonium might seem like an admission of failure on Choogle’s behalf, I disagree. I think it’s an admission that we can’t plan for everything. Heavily integrating general-purpose lab equipment against monolithic systems like Choogle will always lead to these kinds of shortfalls eventually.


Pretty Molecules

I have created a few scientific journal covers and renders. Eager to make their own designs, a few colleagues asked me about the process. It’s not a super fancy process and was refined out of a designs I’ve done over the years. The process attempts to go “from nothing to done” while accounting for changing requirements, rollbacks, and tweaks.

Tools

Design

I used to jump straight into modelling; however, experience shows that spending some time to design is time-effective: a well thought out design is much easier to implement than a vague one. With that in mind, I usually spend at least an hour or two throwing out random ideas until something “clicks”. My ideas are usually pretty simple, conventional designs. If you are artistically talented, here’s the chance to go wild.

I sketch up a basic concept drawing at a high enough fidelity to communicate the idea without overcommitting time to drawing. This might seem like time that could be better spent modelling but it’s worth it: having a blueprint on paper makes it amenable to quick tweaks. You can also send these drawings to stakeholders and get an immediate opinion. If it turns out they hate it then you’re only “down” a sketch, rather than a full render.

An early cover design

Setup

Once I’ve got an acceptable design, I’ll setup a standard project folder. I use the structure below. Keeping the project structure the same between projects makes it easier to write scripts and keeps assets etc. in predictable locations.

$ ls project-folder
assets/
correspondence/
tools/
concept-art/
journal.md
composition.psd
render.blend
render-raw.png
render-post-processed.png
render-post-processed_with-headings.png

I also recommend using a version control system such as git or cvs so that you can snapshot your work at any point. It keeps the project folder clean from the usual composition-before-i-went-to-bed-on-wednesday_final_real-final.psd file names that pop up whenever version control isn’t used. If version control seems scary then maybe establish a convention where you back up your work folder each day and name it by date.

You might also notice the correspondence/ and journal.md files. They’re a long-term investment. Whenever I get a new project I consult those two resources in previous projects to try and get an idea about what what went well and want went badly the last time.

Setup (render)

I can’t make too many specific recommendations here. How you model a scene will be based on your previous modelling experience. However, one thing I will recommend is to make as much of your composition as possible programmable and logically labelled. This is because—like any project—a large proportion of rendering time will be spent tweaking and moving items around the scene. If resizing a “xenon atom” (sphere) has to be manually done for each sphere in the scene rather than adjusting a “parent” atom then you’ll have a sad time.

An early scene setup

Initial Prototype

Using the concept drawing as a template, I’ll model the composition. Even for a prototype, I’d recommend modelling carefully. Haking the perspective and meshes so that they look good at one particular angle with one particular lighting setup is a recipe for disaster in the tweaking stage.

Once the scene is setup, I’ll pick materials that are vaguely right and prepare a first render. The first render only need a few basic touch ups. The main idea here is to get something rendered that’s close enough to the final goal that stakeholders know what to expect of the final render—moreso than the concept art, at least—but the prototype isn’t so close that you’ve over-invested. Because, like any process, changes will be requested.

An early prototype of the cover

Feedback & Tweaks

Once I’ve got a prototype, I’ll send it over to stakeholders. Unlike the concept sketches, a prototype is much closer to the appearance of the final product. Because of that, I tend to recieve much more detailed (read: nitpicky) feedback. This can be a good thing; however, I’d recommend to keep the background of whomever gives feedback very clearly in mind. Scientists tend to prefer scientific accuracy over artistic vision. Artists are entirely the opposite.

Ultimately, though, the target for a design is whomever we want to publish it. In the case of journals, this is usually either a designer or editor: they (probably) just want a nice-looking cover for their next monthly issue. In the case of being paid by the stakeholder, regardless of the design being published, your target audience is them, so do whatever they say.

While art might have elements of NP complexity—it’s much easier to appreciate excellence of a complete art piece than it is to create it—opinions tend to dominate the feedback process. Because of that, I’ve had to make design decisions that maintain artistic clarity but annoy some stakeholders. The trick isn’t to make it into an us-vs-them scenario; rather, a “composition must win at all costs” one. Perform tweaks based on that instinct.

Final Render

Once I’ve made the final tweaks to the composition, it’s time to perform the final render. At each step of the creation process, changes have become increasingly expensive to make. A high-fidelity (>300 DPI A4) render will likely be computationally expensive, so this is usually the part where the design is “set in stone”. Make sure you (and the stakeholders) are ok with it—you don’t want to have to re-render and -touchup.

The final render, straight from blender

Getting the Most Out of the Render

Post-processing and manual touchups can be the difference between a composition that looks flat and one that “pop” (I hate that word). Once I’ve got a render that I know won’t change much as all, I’ll spend a while touching it up. This requires a little basic experience with a drawing package such as photoshop or GIMP. But, in most cases, it comes down to changing the levels, hue, and saturation and doing manual touch ups with burn/dodge tools.

An early prototype of the cover

And that’s it. My process for making covers. No magic, no super-advanced tricks or funny gizmos. Just paper, a free 3d renderer, and some photo manipulation software. I hope you found it insightful. In general, my actual skill at each part of the process hasn’t explosively improved other the years but, rather, I now spend more time designing and planning the work. I’ve become a true old-guy ;)


Complicated HTTP APIs

I occasionally have to write HTTP clients to handle third-party APIs. One thing that really bugs me is when useful HTTP APIs have additional custom authentication and encryption. Custom encryption is especially annoying when SSL could’ve been used instead.

Normal APIs

Here is an example of a great api to develop against:

  • Make a HTTP GET request to https://server.com/api/users/1?format=json to get a user’s profile details in a json format.

You can perform this request with any generic HTTP tool (e.g. curl). However, the API doesn’t mention authentication. This is usually where developers split hairs. The standard way of authenticating a connection is to use cookies to store login state:

  • Make a HTTP POST request to https://server.com/api/sessions/create with a content body containing your username and password. If successful, the server will respond with a status code of 200. A successful request will receive a response containing a Cookie-Set header. Subsequent requests to the server can then be authenticated by attaching this cookie data to those requests.

Some developers might be annoyed at how tediously stateful this is, so they might instead opt for using login tokens:

  • Make a POST request to https://server.com/api/sessions/create with a content body containing your username and password. If successful, the server will respond with a status code of 200. A successful request will be responded to with a content body containing a unique login token. Subsequent requests to the server can then be authenticated by attaching this login token to those requests.

Perfectly stateless and REST compliant. APIs like this are also fine and are usually required if you need to, for example, circumvent browser’s cross-domain protection.

An Example Custom API

Below are (abstract) instructions for the latest set of API I’m currently coding against. This API will likely be consumed by a wide variety of programming languages and frameworks. Spot why designing clients for this API might be time consuming:

  • Email the API’s developers to receive a public api key (UTF-8) and secret key (base64).
  • All requests must be SSL encrypted. However, the institute does not use a certificate that is verified by a popular certificate authority, so SSL signature certification must be disabled for all requests to the API unless you’re willing to manually install the necessary certificates.
  • Apart from requests to https://server.com/api/public-key, all requests require these preparation steps:
    • You must attach your public api-key to each request as an api-key HTTP header
    • The request type (GET, PUT, POST), current time timestamp (custom format, see instructions below), request path, and request parameters must be concatenated with unix newlines (\n) and be UTF-8 encoded. This concatenated string is then encrypted using the HMAC256 algorithm and secret key. The resulting ciphertext bytestream is base64 encoded and attached to requests as an api-signature HTTP header
  • Make a HTTP GET request to https://server.com/api/public-key to receive a base64-encoded exponent and modulus
  • Use the exponent and modulus to encrypt your password using the RSA encryption algorithm. Pre-encryption, the password must be UTF-8 encoded. Post-encryption, the output must be base64 encoded.
  • Make a HTTP POST request to https://server.com/api/sessions/create supplying a fully (windows domain) qualified username, encypted password, and api key in the content. Remember to sign your request as described above (HMAC256, etc.).
  • Successful requests will recieve a response containing a api-usertoken and expiry in the content. The expiry timestamp is in a special non-standard format (see instructions below). Subsequent requests to the api authenticated by attaching the api-usertoken to the header (in addition to the signing steps, as described above).
  • Expired api-usertokens, invalid HMAC signed api-signatures, and invalid login credentials will be met with a blanket 401 (Unauthorized) status code and no additional feedback
  • Timestamps for the api-usetoken’s expiry and HMAC signing have the format YYYY-MM-DD hh:mm:ss.fffZ. Note: timestamps within the server’s response use a standard (UTC) timestamp though

Sounds simple enough, but it’s actually quite difficult to do all steps perfectly when the server is a black box—you will only receive a “fail” until your client’s implementation is perfect.

Client development to one side, most of these steps exist because the API doesn’t use a standard SSL certificate and, as a result, SSL’s in-built protection against man-in-the-middle (MITM) exploits is nullified.

If this API had a certificate, most of these design features to prevent MITMs would be eradicated. Unfortunately, because the API is missing a crucial component of SSL we have this API. The issue is, though, that it’s actually quite difficult to prevent MITM exploits if you don’t use a pre-shared key, and that’s where this particular API begins to unravel.

Hacking the API

Here is how you could MITM this API to get user’s domain passwords. Users’ domain accounts are used to access emails, HR details, calendar, payroll, etc. This hack assumes that certificate authentication has been disabled, which will likely be the case for implementations that can’t afford to require that users know how to install CAs (e.g. widely distributed line-of-business software):

  • Create a standard HTTP proxy that intercepts HTTP traffic between a client and the api server
  • Create a forged SSL key pair
  • SSL intercept traffic between clients and the server. In effect, give your forged public key to a client as-per the SSL handshake, decrypt any traffic they send to you with your forged private key (to intercept it), and re-encrypt the request using the API server’s public key to complete the proxy chain
  • For most traffic, proxy as normal. You can read any requests. Also, because the HMAC protection only prevents altering the path & parameters, you can alter POST/PUT request body’s at will. That is, you could can completely hijack legitimate POST/PUT requests.
  • For HTTP GET requests to https://server.com/api/public-key substitute the API’s public RSA exponent/modulus pair for your own forged pair. Store the server’s legitimate public key.
  • Intercept HTTP POST requests to https://server.com/api/sessions/create. These requests will contain an unencypted username and a password that was encrypted with your forged public exponent/modulus
  • Decrypt the password using your private key
  • You now have a user’s full domain username + password
  • Forward the HTTP POST request to the server, RSA encrypting the password using the server’s key. This will complete the proxy chain, which will make the API access seamless - it will be very annoying to figure out why people’s accounts are being hacked

The only thing you’ll have trouble hacking is anything covered by the HMAC encryption (the path + URL parameters). This is because the api-key/secret-key is essentially a pre-shared key (PSK) arrangement that can’t be easily MITMd. This begs the question why you also need user authentication in addition to this key pair because the secret-key could be leveraged for authentication.

This is just a simple example of why most web services use standard protocols. They’ve been developed over decades and are—for the most part—secure against most attacks.


Card bingo

This writeup illustrates how almost anything can become a project if you read into it too much.

A few weeks ago, I was in the pub playing a very simple bingo game. The game works as follows:

  • Each player recieves two random and unique playing cards from each suit to form a hand of eight cards
  • A host sequentially draws cards from a randomly shuffled deck, announcing each card as it is drawn
  • The first player to have all eight of their cards announced by the host wins

I was terrible at the pub quiz, so I decided to focus my mental efforts on two seemingly simple questions about the game, which eventually led to me getting ahead of myself:

  • What’s the probability of having a “perfect” game. That is, a game where you win after the 8th card is announced?

  • In a game containing n players, how many cards are called out by the announcer before somone wins?

I thought I’d get my answer in under ten minutes but it took a little longer than that.

What’s the probability of having a “perfect” game?

My initial instinct was to try and simulate the “perfect game” scenario. I’m glad I resisted - it would’ve taken quite a bit of computation time to verify the probability. Luckily, there aren’t too many variables in this particular question. The chance of winning by the 8th card call is unaffected by other players’ hands. Because of that it’s straightforward to construct an answer purely with combinatorics.

There are two cards per suit in a hand of eight cards. Using binomial coefficients, I calculated that there are 13 choose 2, or 78, combinations of cards when drawing two cards from a suit of 13 cards. Raise that to the power of the number of suits (i.e. 784) to get the total possible combinations of a standard eight-card hand: 37015056 combinations. Only one of those combinations could achieve an 8-round win, which means that the probability of having a valid combination is 1 / 37015056 which, as a percentage, is 0.000002702 %.

That probability is an upper bound, though, because it assumes that the host’s randomly-shuffled deck conveniently contains two of each suit in its first eight cards. That isn’t always the case. Similar to hands, there’s 37015056 combinations of valid 8-card starters in a deck but there’s 52 choose 8, or 752538150, total starting combinations. Therefore, the probability of the host’s deck even has a valid starting combination is around 4.9 %.

Multiply the probability a host has a valid starter with the probability of somone actually having a winning combination to find that the overall probability of having a “perfect” game is around 1.32×10-7 %.

How many card calls are there in an average game containing n players?

This is a little more complicated than the ‘ideal game’ scenario because there’s now a variable number of players to consider. Because of that, I developed a small console program with the following outer interface (link to source):

$ play-scottish-bingo number_of_players number_of_games [-s t]
Output:
winning_round_number, game 1
winning_round_number, game 2
.
.
.
winning_round_number, game [number_of_games]

Example:
$ play-scottish-bingo 5 3
35
37
33

In essence, it’s a very simple program that led me down the rabbit hole of shuffling methods, random seed generation, and proving shuffle correctness, with techniques such as comparing shuffles to derangement probabilities.

Once I got out of the rabbit hole, I used a shell script to run play-scottish-bingo with 0 to 100 players and measured the following results:

A plot showing how the number rounds a bingo game goes on for varies with the number of people playing it

A line plot that illustrates how the standard deviation in the number of rounds a bingo game goes on for varies with the number of people playing the game

On the night I was playing, the pub contained around twenty patrons. For that scenario, the distribution looked as shown below. Based on those results, I’d expect the average game to be around 36 rounds long. A “rare” (<5 %) short game would last 26 rounds while a super rare (<0.5 %) short game would be 21 rounds long. A “rare” nail-biting long game would be around 41 rounds long with a super-rare long game being 44 rounds.

Psychologically speaking, shorter games tend to evoke an “is that it?” response from the crowd, it’s the longer games where anyone could win that are the most exciting. Most incarnations of this game usually ask players to call when seven of their eight cards have been called - an “I’m going to win” message. In longer games, many of the players will have made—and heard—that call and be waiting for their final card to be drawn - how intense!

A histogram that shows the frequency of the number of rounds for 10000 games

Neverending Analysis

As I was doing this, I recoginzed that the analysis shown here is only the tip of the iceberg. What about different shuffling techniques? I assumed one-deck-per-player in this analysis. How does the distribution’s shape change with players and shuffles? I only took the averages of most datasets.

Then I looked at myself in the mirror and ended up at a scary conclusion:

I’m spending tens of hours analyzing, and coding simulators for, a simple card game. What am I? An undergraduate maths student?

After reaching that revelation, I stopped analyzing the bingo game. No more bingo games. For me, this is a hard lesson in “spending your time wisely”. I used to spend hundreds of hours trying to understand Blender’s texture nodes, many nights trying to redesign my University campus in Valve’s Hammer editor, and tens of hours creating vector art of common chemistry equipment. But this project, the bingo game, has finally opened my eyes. I’ll now only ever focus on important profitable and useful ideas.

My next project is a compiler that converts a high-level representation of an RS232 device into (C#, Python, etc.) libraries.

That one’s important.

Maybe.

No, not maybe. That idea’s definitely a winner.