Blog

Performance in OpenSim

Yay, I have a new job! I’m now an Open-Source Software Developer at TU Delft, where I’m going to be working on biomechanical simulation software.

Why do I mention this? Because I’m initially tasked with trying to make OpenSim faster, which is something that beautifully ties together a few of my loves (research software, systems development, and low-level perf optimizations) with a few of my hates (software written by researchers, C++, and diagnosing cache misses) and I’ve been wanting to learn+write about performance for a while.


TextAdventurer: Rust Edition

Just as a little fun, I decided to rewrite textadventurer in Rust, source).

The server was initially written in ~600 LOC of Java with a basic websocket library. The Java implementation worked fine–it was essentially just a tiny tech demo to demo a jobson feature–but I decided to rewrite it in Rust so I could understand where the pain-points are in an application like this.


So Damn Close

So my latest interest has been trying to squeeze performance out of simple algorithms - mostly so I can understand the impact of branch misses, lookup strategies, etc.

I spent Sunday writing an optimized solution to the language benchmark game’s reverse-complement challenge. I ended up doing all kinds of hacky things I’d never recommend doing in prod, like writing a custom vector and writing tricksy algorithms. Repo here, submission here.

Well, for all my hard work, I managed to come… Second! To, of course, a much tidier Rust implementation (❤️). Why? Not because the Rust solution is a more efficient (it’s not: it takes at least 2x more cycles and memory than my single-threaded C++ implementation), but because the the Rust implementation throws threads at the problem, which is the true power of Rust (in addition to the fact that the Rust version can be just as efficient as the C++ one by adding some SIMD and unsafe code).


Implementing Rust Async and Futures from Scratch

As is tradition for many developers stuck at the family home over xmas. I decided to go hack something.

Asynchronous programming is becoming more popular in all major languages. C++20 is going to get co_await and friends, python 3.7 now has async, and Rust has async / .await. Rust’s implementation of Future<T> is quite unique. It uses a “polling”-based interface, where the listener “polls” for updates but–and this is why I am making judicious use of quotation marks–polling only occurs when the asynchronous event source “wakes” the poller, so polling only actually happens when a state change occurs, rather than continuously.


Demoing PetaSuite Protect at ASHG 2019

I went to Houston for ASHG 2019 with PetaGene to demo PetaSuite Protect, one of the products I’m helping to develop.

Giving tech demos is always a daunting task, especially because we gave our tech demos completely freeform - typing shell commands in front of clients is always fun ;). The demos were delivered without a hitch, though, so there’s something to be said about the effectiveness of writing bash scripts during a long-haul airplane journey.


Side Project: libdeflater: Rust bindings to libdeflate

I’m a huge fan of Rust (❤️).

In a [previous post]({% post_url 2018-06-25-fo2dat-sideproject %}) I demoed fo2dat, which can be used to unpack Fallout 2 DAT2 files. I used the venerable flate2 crate for that project, but I’ve since learnt about libdeflate, which reported to be a much faster block-based DEFLATE (de)compression library.

libdeflate didn’t have Rust bindings, so I wrote some as a learning exercise. The result is libdeflater, which exposes a safe Rust API to the library. Benchmarks indicate that the library is around 2-3x faster than flate2, which is based on zlib and minizip. That’s a pretty insane speedup for such a popular compression format.


igv.js: porting a large C/C++ codebase into browsers

One of the more interesting projects I’ve worked on recently is using emscripten to port PetaGene’s high-performance decompression suite to wasm so that it can run in a browser with no installation.

It required figuring out how where to draw the line between having a fully async API (ideal for javascript) and using Emscripten’s asyncify to emulate synchronous IO (ideal for standard C/C++ applications). It also required an ill-thought-out optimization to igv.js, which prompted a much better fix by the maintainer. This is why I like the OSS model: even bad ideas can prompt a discussion about better ones.


PetaGene wins Bio-IT World 2019

PetaGene won best of show for their latest product, PetaSuite Protect (link, archive). I had a great time at the event: people were super interested to learn what compression and encrpytion can do for them. I am looking forward to helping develop the PetaSuite Protect product :)