Blog

Performance in OpenSim

Yay, I have a new job! I’m now an Open-Source Software Developer at TU Delft, where I’m going to be working on biomechanical simulation software.

Why do I mention this? Because I’m initially tasked with trying to make OpenSim faster, which is something that beautifully ties together a few of my loves (research software, systems development, and low-level perf optimizations) with a few of my hates (software written by researchers, C++, and diagnosing cache misses) and I’ve been wanting to learn+write about performance for a while.


TextAdventurer: Rust Edition

Just as a little fun, I decided to rewrite textadventurer in Rust, source).

The server was initially written in ~600 LOC of Java with a basic websocket library. The Java implementation worked fine–it was essentially just a tiny tech demo to demo a jobson feature–but I decided to rewrite it in Rust so I could understand where the pain-points are in an application like this.


So Damn Close

So my latest interest has been trying to squeeze performance out of simple algorithms - mostly so I can understand the impact of branch misses, lookup strategies, etc.

I spent Sunday writing an optimized solution to the language benchmark game’s reverse-complement challenge. I ended up doing all kinds of hacky things I’d never recommend doing in prod, like writing a custom vector and writing tricksy algorithms. Repo here, submission here.

Well, for all my hard work, I managed to come… Second! To, of course, a much tidier Rust implementation (❤️). Why? Not because the Rust solution is a more efficient (it’s not: it takes at least 2x more cycles and memory than my single-threaded C++ implementation), but because the the Rust implementation throws threads at the problem, which is the true power of Rust (in addition to the fact that the Rust version can be just as efficient as the C++ one by adding some SIMD and unsafe code).


Implementing Rust Async and Futures from Scratch

As is tradition for many developers stuck at the family home over xmas. I decided to go hack something.

Asynchronous programming is becoming more popular in all major languages. C++20 is going to get co_await and friends, python 3.7 now has async, and Rust has async / .await. Rust’s implementation of Future<T> is quite unique. It uses a “polling”-based interface, where the listener “polls” for updates but–and this is why I am making judicious use of quotation marks–polling only occurs when the asynchronous event source “wakes” the poller, so polling only actually happens when a state change occurs, rather than continuously.


Demoing PetaSuite Protect at ASHG 2019

I went to Houston for ASHG 2019 with PetaGene to demo PetaSuite Protect, one of the products I’m helping to develop.

Giving tech demos is always a daunting task, especially because we gave our tech demos completely freeform - typing shell commands in front of clients is always fun ;). The demos were delivered without a hitch, though, so there’s something to be said about the effectiveness of writing bash scripts during a long-haul airplane journey.


Side Project: libdeflater: Rust bindings to libdeflate

I’m a huge fan of Rust (❤️).

In a [previous post]({% post_url 2018-06-25-fo2dat-sideproject %}) I demoed fo2dat, which can be used to unpack Fallout 2 DAT2 files. I used the venerable flate2 crate for that project, but I’ve since learnt about libdeflate, which reported to be a much faster block-based DEFLATE (de)compression library.

libdeflate didn’t have Rust bindings, so I wrote some as a learning exercise. The result is libdeflater, which exposes a safe Rust API to the library. Benchmarks indicate that the library is around 2-3x faster than flate2, which is based on zlib and minizip. That’s a pretty insane speedup for such a popular compression format.


igv.js: porting a large C/C++ codebase into browsers

One of the more interesting projects I’ve worked on recently is using emscripten to port PetaGene’s high-performance decompression suite to wasm so that it can run in a browser with no installation.

It required figuring out how where to draw the line between having a fully async API (ideal for javascript) and using Emscripten’s asyncify to emulate synchronous IO (ideal for standard C/C++ applications). It also required an ill-thought-out optimization to igv.js, which prompted a much better fix by the maintainer. This is why I like the OSS model: even bad ideas can prompt a discussion about better ones.


PetaGene wins Bio-IT World 2019

PetaGene won best of show for their latest product, PetaSuite Protect (link, archive). I had a great time at the event: people were super interested to learn what compression and encrpytion can do for them. I am looking forward to helping develop the PetaSuite Protect product :)


Jobson 1.0.0

After many weekends and evenings of fixing little bugs, cleaning up the codebase, and polishing the build, I’ve finally managed to publish v1.0.0 of jobson.

I open-sourced jobson late November 2017. The version I demoed here was already close to release-grade in terms of implementation (the server had >200 tests, was used in prod, etc.). However, the deployment, installation, documentation, and maintenance needed work.


Side Project: Rust: fo2dat

tl;dr: I used Rust to make this CLI utility for extracting Fallout 1+2 DAT files.

I love the occasional playthrough of Fallout 1 and 2. They were some of the the first “serious” games I played. Sure, DOOM/Quake/Command & Conquer were also “mature”—I played them around the age of ~10, which marked me as doomed (heh) by the kind of adults that would also think eating sweets is a surefire path to heroin addition or something—but F1+2 included prostitutes, drugs, slavery, and infanticide: irresistibly entertaining topics for a teenager.


Integrating Software

tl;dr: If you find you’re spending a lot of time integrating various pieces of software across multiple computers and are currently using a mixture of scripts, build systems, and manual methods to do that, look into configuration managers. They’re easy to pick up and automate the most common tasks. I’m using ansible, because it’s standard, simple, and written in python.

Research software typically requires integrating clusters, high-performance numerical libraries, 30-year-old Fortran applications by geniuses, and 30-minute-old python scripts written by PhD students.


(Not so) Fancy-Pants new Website

So, I just spent an evening + morning refactoring the site into a, uh, “new” design.

I only ocassionally work on this site these days—I now see it as the sparse journal of a madman that also likes to distribute mobile-friendly versions of his CV—but I thought it would be a nice and easy blog post to reflect on how the site has changed over the last 3 years.

The original version of adamkewley.com was launched in May 2015 and was the fanciest design:


State Machines in ReactJS

I’m currently implementing job resubmission in Jobson UI and found that state machines greatly simplify the code needed to render a user workflow.


Cheeky Hackers

I’ve been running several webservers behind custom domains for a while now (plateyplatey.com, textadventurer.tk, and this site) and it never ceases to amaze me how cheeky bots are getting.

For example, certbot recently started complaining that textadventurer.tk’s TLS certificate is about to expire. That shouldn’t happen because there’s a nightly cronjob for certbot renew.

On SSHing into the server I found an immediate problem: the disk was full. Why? Because some bot, listed as from contabotserver.net decided to spam the server with >10 million requests one afternoon and fill the HTTP logs. Great. Looks like I’m finally going to implement some log compression+rotation.

Then there’s the almost hourly attempts to find a PHPMyAdmin panel on my sites. That one always surprised me: surely only a small percentage of PHP sites are misconfigured that badly? Lets look at the stats:

Percentage of websites using PHP

Even if 1 % of them are misconfigured, we’re doomed.


Jobson: Now in 2D

I recently made screencasts that explain Jobson in more detail. The first explains what Jobson is and how to install it. Overall, Jobson seems well-recieved. The first video seems to be leveling off at around 2700 views and Jobson’s github repo has seen a spike in attention.

Will other teams start adopting it or not? Only time will tell.


Jobson: Webify CLI Applications

This is a post about Jobson, an application I developed and recently got permission to open-source along with its UI.

Do any of these problems sound familiar to you?:

I’d like my application to have a UI.

I want to trace my application’s usage.

I want to share my application.

They’re very common problems to have, and are almost always solved by building a UI in a framework (e.g. Qt, WPF), sprinkling on usage metrics, and packaging everything into a fat executable. In this post, I’d like to present an alternative.


Text Adventurer

One of the first things people learn when they start programming is how to write a text prompt. It’s a decades-old exercise that teaches new programmers input-output.

In order to inject a little excitement, learners are normally encouraged to write interactive games using standard IO. This helps them learn programming by interactively - they will need to learn conditional logic to handle the “Will you stab the monster with your sword or run away?” prompt.

This creative learning process is great but, unfortunately, it’s hard to show the creations to other people. Any players will have to undergo the nuisance of installing an interpreter, libraries, and the game in order to play. These deployment problems are alleviated on the web: javascript can be distributed and executed remotely in a browser with no effort required from the client.


Keep Side Projects Small

I’ve officially thrown a deployment of PlateyPlatey (PP) up onto plateyplatey.com (galley) along with its server and frontend source code. I previously explained PP in a blog post.

PP is missing many features I would of liked to have. However, a few weeks ago I took a long look in the mirror and decided to just tidy it then throw it up on the net. This let me start sharing it on sites such as reddit to gauge if anyone’s interested. After about a week of PP being up, I concluded that no, no one is interested and yes, I spent way too much time coming to that conclusion.


Low Level Programming

I haven’t posted in a while because I have been busy with a few random studies and projects.

Low level programming has always, to a high-level programmer like myself, felt like a mysterious playground where all the real hackers write graphics engines, password crackers, and operating systems. I have always had a fascination with people who can write something hardcore like an emulator. Sure, I can open a window, draw some graphics, and update the UI to reflect state changes. However, an application that does all that and some low-level wizardry is in an entirely different league of complexity.


Engineering Flexibility

It’s tempting to believe that having a flexible file format for a system’s configuration is enough to ensure forward-compatibility and flexibility as the system grows. The thought process is that with a suitable data format—for example, XML or JSON—extra information can be added without breaking existing systems. For adding new fields into the data, this is true. However, if you don’t engineer additional flexibility into the fields themselves then you may end up needing to make breaking changes anyway or, worse, end up with a much more complicated data file.

Imagine you are implementing an ecommerce server that exposes products. The spec for the server stipulates that there will only be a very small number of products. It also stipulates that rebooting the server to refresh the system with new products is fine. With that in mind, you add a products field to the server’s configuration:

port: 8080
products:
 - name: Acme Explosive
 - name: Fake Backdrop 

The Cycle of Platform Maturity

In programming, “mature” is a funny word. When I first started programming, it had negative connotations. A “mature” programming language was boring, had complicated tools, over-complicated design patterns, and would likely die off soon. By contrast, non-mature languages were exciting, free-form, and carving out a new frontier in software.

My opinion has evolved since working as a programmer rather than studying as one. Mature now implies that a language will have have good debuggers and profilers. It will have an established build and testing systems for the language. Its core libraries won’t change midway through a critical project. Most importantly, though, mature languages (tend to) have moved past the same boring old problems that every language ecosystem encounters in its early days.


Just Make Stuff

Software development can be a bottomless well of languages, build environments, platforms, coding standards, and cat analysis algorithms. For someone who likes to study and learn new techniques, this is great. A bottomless well of stuff to better at. However, after almost ten years of doing just that, I’m willing to make a bold statement: curiosity can be maddening and knowledge can be hindrance.

Curiosity is maddening because the more I learn about assembler, linux, garbage collection, compiler theory, engineering techniques, software principles, functional programming, networking, etc. The more I’m convinced that I know almost nothing about computers.


PlateyPlatey

This blog post is about PlateyPlatey, a webapp I am developing to solve the (quite specific) problem I outline in this blog post. PlateyPlatey is in the very early prototype stages of development. I am having fun seeing how far I can develop it. If it saves even one researcher from the complete ballache of reorganizing research data, I will be ecstatic.

Once upon a time, I was a lab researcher. Boy, was it fun. This one time, I spent a month trying to crystallize a particularly annoying-to-crystallize molecule only to find that a decent proportion of the batch I was working with was actually silicone grease which, consequently, made crystallization impossible. Another time, my wrist twitched midway through adding LiAlH powder to a reaction and I managed to fill an entire schlenk line and vacuum pump with a mixture of corpse-scented phosphine and hazardous LiAlH powder. I now can’t smell phosphines without vomiting - fun.


Going Full Circle

Write code to control your application. Configure the code’s behavior using standard configuration files. After that, go full circle: write code to control the code that runs your application. In this blog post I’ll explain when you might want do that.


Free Online Courses

With the rise of free online courses, it’s becoming much easier to learn programming. Over time, more people will learn programming through those courses. Overall, this is a good thing. It means I’ll be able to buy a singing fridge sooner. However, beware of the dragons.

Big IT companies are suspiciously keen to provide free online software courses. Take bigdatauniversity.com for example. It’s a very slick site with lots of content. However, that content is tainted by the site’s ulterior motive: it doesn’t just want you to learn big data, it wants you to learn how IBM is the company to use if you’re working with big data.

So, instead of explaining abstract big data concepts, some of the material descends into the corporate agenda: how big data makes corporations’ wheels turn 15 % faster, improves customer turnover by a factor of 3, and how IBM could’ve gotten you all of that yesterday - for a small fee.


Projects That Break You

Everyone has had one: a project that’s so hard that it breaks you.

I was recently flicking through my old project folders and found that very project. It was an innocuous form-generating WPF application (source) that I thought I had thoroughly buried in my projects-dropped/ folder. However, while browsing through that folder tonight, the project managed to catch my eye.

You can see more details about the application on the Github page. It was an application that was trying to solve every problem that COSHH-form-generating software just needed to solve. It needed composable document models. It needed a variety of IO methods. It needed to be functionally pure. It needed to be architecturally easy to understand. It also needed to be fast, run on any PC, and sell nationwide on day one.


Don't Spare the Low-Level Details

Abstractions are wonderful things, that is until they leak. At that point, I tend to wish someone didn’t spare me the low-level details.

Recently, I was tasked with developing a system that continually logs temperature readings from 12 hotplates. The plates’ use an RS232 communication interface, which is very easy to negotiate.

With only those high-level details available, I declared that the logging software would be an “afternoon in n’ out job” and created an appropriate design for the timescale / effort:

  • Hard-code the 12 plates’ COM/ttyS port identifiers and output paths
  • Loop through each port/path pair
  • Send the GET_TEMPERATURE request
  • Write the response to the output file

Job done.

Port identifiers can change.

Wait, who are you?


Turn any Long-Running Command-Line Application into a FIFO Server

I’ve been using a web scraper—named scrape-site, for the sake of this blog post—that takes around 5 minutes to recursively scrape a website. During one of my scrape sessions, I’ll continually look for more sites to scrape. Because it would be annoying to wait, I’d like to be able to immediately queue any site I find; however, scrape-site is just a plain-old command-line application. It wasn’t designed to support queueing.

If scrape-site was a UI-driven commercial product, I’d be furiously writing emails of displeasure to its developers: what an oversight to forget a queueing feature! Luckily, though, scrape-site only being a single-purpose console application is its biggest strength: it means that we can implement the feature ourselves.


IECapt for Corporate Website Slideshows

Big companies tend to use a variety of webapps to show their news, stats, and announcements. Some locations in the company—usually, the tearoom—might contain displays showing some of that material. A clean way to automate these displays might be to use a relevant API or script for each webapp. However, this assumes two things: the site has an API, and you have enough time to use it.

Another approach is to automatically take screenshots of the webapp. A little dirtier, but much easier to implement and much more amenable to change. Here, I’ve written up that approach.


Netcat: The best tool for playful backdooring

Just because it made me giggle so much, I thought I’d write up a classic shell prank: pumping messages into someone else’s shell terminal.

If you can ssh into the computer, then you can write messages to other user’s terminal with wall:

adams-ssh-account@computer $ echo "You suck!" | wall
target-user@computer $

Broadcast Message from adams-ssh-account@computer                                      
        (/dev/pts/1) at 21:48 ...                                              
                                                                               
You suck!

However, that’s making it far too easy for the target. The message itself gives the game away! Also, you’ll need an account on the target computer, which means you’ll have to get sudo access. People might leave their computers unlocked but it’s unlikely they’ll have root (at least, without you knowing their password).


Fakeonium

I work with large research data systems. One of those systems—lets call it Choogle, for the sake of this post—is nearly two decades old, which is practically forever in the IT world, which is impressive. Choogle has been around so long that much of the lab’s analysis equipment is tightly integrated with it. For example, a researcher can enter a Choogle ID into an analysis instrument to automatically link their analysis with the sample’s history. This is neat, provided the researcher incorporates Choogle as a central component of their workflow.

From a top-down viewpoint, making researchers submit their sample’s information to Choogle is a better situation than each researcher having a collection of loosely formatted labnotes. Designing lab equipment to require Choogle is a way of encoraging conversion, which is the intention.

What happens, though, if researchers don’t particularly want to use Choogle? Maybe they’re already incorporated a similar (non-Choogle) research system, or maybe they just don’t like the UI. When those researchers want NMR plots, the Choogle requirement becomes a barrier.


Pretty Molecules

I have created a few scientific journal covers and renders. Eager to make their own designs, a few colleagues asked me about the process. It’s not a super fancy process and was refined out of a designs I’ve done over the years. The process attempts to go “from nothing to done” while accounting for changing requirements, rollbacks, and tweaks.


Complicated HTTP APIs

I occasionally have to write HTTP clients to handle third-party APIs. One thing that really bugs me is when useful HTTP APIs have additional custom authentication and encryption. Custom encryption is especially annoying when SSL could’ve been used instead.


Card bingo

This writeup illustrates how almost anything can become a project if you read into it too much.

A few weeks ago, I was in the pub playing a very simple bingo game. The game works as follows:

  • Each player receives two random and unique playing cards from each suit to form a hand of eight cards
  • A host sequentially draws cards from a randomly shuffled deck, announcing each card as it is drawn
  • The first player to have all eight of their cards announced by the host wins

I was terrible at the pub quiz, so I decided to focus my mental efforts on two seemingly simple questions about the game, which eventually led to me getting ahead of myself:

  • What’s the probability of having a “perfect” game. That is, a game where you win after the 8th card is announced?

  • In a game containing n players, how many cards are called out by the announcer before somone wins?

I thought I’d get my answer in under ten minutes but it took a little longer than that.


My future self appreciates a simple codebase

At the moment, I regularly have to develop Treegrid UI components. It’s been quite a lesson in API design and made me realize how design patterns exist to create great architecture, not a great API.

My recent work focuses on making treegrid components work with Crown’s backend. The backend contains a variety of data structures which, while being quite diverse, need to be manipulated through a common programmatic interface. To that end, I designed and implemented an adaptor, DDSTreeGrid, around EasyUI’s Treegrid component and a declarative data represenation of our backend views (view).


Language Agnosticism

I initially learnt javascript because I was desperate to have marquee effects on my Microsoft FrontPage website, actionscript to build menus in a basic flash game I tried to make, C++ for a half-life mod, and so on.

Jobs seem a little more focused than my approach. When I was jobhunting, most programming job postings were language- or framework-centric. They weren’t looking for someone generally experienced in full-stack web development. They wanted someone who specifically has at least 2 years of angularjs experience or specifically has Rails4 JSON API coding experience. I’m guessing this is a consequence of reality: commercially established applications are architected on—and have built technical debt in—a particular language or framework.