Adam Kewley

Integrating Software

tl;dr: If you find you’re spending a lot of time integrating various pieces of software across multiple computers and are currently using a mixture of scripts, build systems, and manual methods to do that, look into configuration managers. They’re easy to pick up and automate the most common tasks. I’m using ansible, because it’s standard, simple, and written in python.

Research software typically requires integrating clusters, high-performance numerical libraries, 30-year-old Fortran applications by geniuses, and 30-minute-old python scripts written by PhD students.

A consistent thorn in my side is downloading, building, installing, and deploying all of that stuff. For example, on a recent project, I needed to:

Each step is simple enough, but designing a clean architecture around doing slightly different permutations of those steps is a struggle between doing something the easy way (e.g. a directory containing scripts, hard-coded arguments in the Luigi task) and doing something the correct way.

The correct way (or so I thought) to handle these kinds of problems is to use a build system. However, there is no agreed-upon “one way” to download, build, and install software, which is why build systems are either extremely powerful/flexible (e.g. make, where anything is possible) and rigid/declarative (e.g. maven).

Because there’s so much choice out there, I concluded that researching each would obviously (ahem) be a poor use of my valuable time. So, over the years, I’ve been writing a set of scripts which have been gradually mutating:

After many months of this, I decided “this sucks, I’ll develop a new, better, way of doing this”. So I spent an entire evening going through the weird, wonderful, and standard build systems out there, justifying why my solution would be better for this problem.

Well, it turns out this problem isn’t suitable for a build system, despite it having similar requirements (check inputs, run something, check outputs, transform files, etc.). Although my searches yielded a menagerie of weird software, what I actually needed was a configuration manager. Ansible being a particularly straightforward one.

This rollercoaster of “there probably isn’t a good solution already available to this problem”, “I’ll hack my own solution!”, “My hacks are a mess, I should build an actual system”, “oh, the system already exists” must be common among software developers. Maybe it’s because the problem isn’t actually about developing a solution: it’s about understanding the problem well enough. If the problem’s truly understood, it will be easier to identify which libraries/algorithms to use to solve it, which will make developing the solution a lot easier. Otherwise, you’ll end up like me: Keeper of the Mutant Scripts.

(Not so) Fancy-Pants new Website

So, I just spent an evening + morning refactoring the site into a, uh, “new” design.

I only ocassionally work on this site these days—I now see it as the sparse journal of a madman that also likes to distribute mobile-friendly versions of his CV—but I thought it would be a nice and easy blog post to reflect on how the site has changed over the last 3 years.

The original version of adamkewley.com was launched in May 2015 and was the fanciest design:

It makes me feel a bit ill. The first version was modelled off of the kind of erudite landing pages you see 3-man startups use to try and sell IoT soap bars or something. By May 2016, I clearly had gotten sick enough of my own bullshit to remove a bunch of that cute stuff:

By May 2017, there were a few regressions:

And now, in March 2018, I’ve finally decided to just throw almost all fancy tricks out the window and use the simplest HTML + CSS solution I could create:

The previous versions of this site required a bunch of javascript libraries and jekyll gems to build. There were also subtle bugs that would pop up if you used it on a phone. Development/maintenance time was dedicated to fixing that - I also couldn’t help but tweak with the code.

This new site is HTML + a ~50 line long CSS file. It felt strangely liberating to make. Maybe because after working on ReactJS (e.g.) and angular sites I came across this absolute gem that satirically makes the point that barebones sites are: a) fast, b) easy to make, and c) responsive. I couldn’t argue with the logic and immediately wanted to just rip out all the complexity in my site, so here we are.

I wonder how quickly regressions will set in ;)

State Machines in ReactJS

I’m currently implementing job resubmission in Jobson UI and found that state machines greatly simplify the code needed to render a user workflow.

Background

A large amount of Jobson UI’s codebase is dedicated to dynamically generating input forms at runtime.

Generating the relevant <input>, <select>, <textarea>s, etc. from a Jobson job spec is fairly easy (see createUiInput here) but became increasingly complex after adding job copying because extra checks needed to be made:

Each of these conditions are simple to check in isolation but, when combined, result in delicate state checks:

render() {
  if (this.state.isLoadingSpecs)
    return this.renderLoadingSpecsMessage();
  if (this.state.errorLoadingSpecs)
    return this.renderSpecsLoadingError();
  else if (this.state.isLoadingExistingJob)
    return this.renderLoadingExistingJob();
  else if (this.state.errorLoadingExistingJob)
    return this.renderErrorLoadingExistingJob();
  else if (this.state.isCoercingAnExistingJob)	
    // etc. etc.
}	

These checks were cleaned up slightly by breaking things into smaller components. However, that didn’t remove the top-level rendering decisions altogether.

For example, the isLoadingSpecs and errorLoadingSpecs checks can put into a standalone <SpecsSelector /> component that emits selectedSpecs. However, the top level component (e.g. <JobSubmissionComponent />) still needs to decide what to render based on emissions from multiple child components (e.g. it would need to decide whether to even render <SpecsSelector /> at all).

State Machines to the Rescue

What ultimately gets rendered in these kind of workflows depends on a complex combination of flags because only state, rather than state and transitions are being modelled. The example above compensates for a lack of transition information by ordering the if statements: isLoadingSpecs is checked before isLoadingExistingJob because one “happens” before the other.

This problem—a lack of transition information—is quite common. Whenever you see code that contains a big block of if..else statements, or an ordered lookup table, or a switch on a step-like enum, that’s usually a sign that the code might be trying to model a set of transitions between states. Direct examples can be found in many network data parsers (e.g. websocket frame and HTTP parsers) because the entire payload (e.g. a frame) isn’t available in one read() call, so the parser has to handle intermediate parsing states (example from Java jetty).

State Machines (SMs) represent states and transitions. For example, here’s the Jobson UI job submission workflow represented by an SM:

From a simplistic point of view, SMs follow simple rules:

I initially played with the idea of using SMs in ReactJs UIs after exploring SM implementations of network parsers. I later found the idea isn’t new. A similar (ish) post by Jeb Beich has been posted on cogninet here and contains some good ideas, but his approach is purer (it’s data-driven) and is implemented in ClojureScript (which I can’t use for JobsonUI). By comparison, this approach I used focuses on using callbacks to transition so that individual states can be implemented as standard ReactJS components. In the approach:

This slight implementation change means that each component only has to focus on doing its specific job (e.g. loading job specs) and transitioning to the next immediate step. There is no “top-level” component containing a big block of if..else statements.

Code Examples

A straightforward implementation involves a top-level renderer with no decision logic. Its only job is to render the latest component emitted via a callback:

export class StateMachineRenderer extends React.Component {

  constructor() {
    const initialComponent =
      React.createElement(InitialStateComponent, {transitionTo: this.handleTransition.bind(this)});

    this.state = {
      component: initialComponent,
    };
  }

  handleStateTransition(nextComponent) {
    this.setState({component: nextComponent});
  }

  render() {
    return this.state.component;
  }
}

A state is just a standard component that calls transitionTo when it wants to transition. Sometimes, that transition might occur immediately:

export class InitialState extends React.Component {
  componentWillMount() {
    const props = {transitionTo: this.props.transitionTo};

    let nextComponent;
    if (jobBasedOnExistingJob) {
      nextComponent = React.createElement(LoadExistingJobState, props, null);
    } else {
      nextComponent = React.createElement(StartFreshJobState, props, null);
    }
    this.props.transitionTo(nextComponent);
  }
}

Otherwise, it could be after a set of steps:

export class EditingJobState extends React.Component {
  // init etc.
  onUserClickedSubmit() {
    this.props.api.submitJob(this.state.jobRequest)
      .then(this.transitionToJobSubmittedState.bind(this))
      .catch(this.showErrors.bind(this))
  }
	  
  transitionToJobSubmittedState(jobIdFromApi) {
    const component = React.createElement(JobSubmittedState, {jobId: jobIdFromApi}, null);
    this.props.transitionTo(component);
  }
}

Either way, this simple implementation seems to work fine for quite complex workflows, and means that each components only contains a limited amount of “transition” logic, resulting in a cleaner codebase.

Conclusions

This pattern could be useful to webdevs that find themselves tangled in state- and sequence-related complexity. I’ve found SMs can sometimes greatly reduce overall complexity (big blocks of if..else, many state flags) at the cost of a little local complexity (components need to handle transitions).

However, I don’t reccomend using this pattern everywhere: it’s usually easier to use the standard approaches up to the point of standard approaches being too complex. If your UI involves a large, spiralling, interconnected set of steps that pretty much require a mess of comparison logic though, give this approach a try.

Cheeky Hackers

I’ve been running several webservers behind custom domains for a while now (plateyplatey.com, textadventurer.tk, and this site) and it never ceases to amaze me how cheeky bots are getting.

For example, certbot recently started complaining that textadventurer.tk’s TLS certificate is about to expire. That shouldn’t happen because there’s a nightly cronjob for certbot renew.

On SSHing into the server I found an immediate problem: the disk was full. Why? Because some bot, listed as from contabotserver.net decided to spam the server with >10 million requests one afternoon and fill the HTTP logs. Great. Looks like I’m finally going to implement some log compression+rotation.

Then there’s the almost hourly attempts to find a PHPMyAdmin panel on my sites. That one always surprised me: surely only a small percentage of PHP sites are misconfigured that badly? Lets look at the stats:

Percentage of websites using PHP

Even if 1 % of them are misconfigured, we’re doomed.

Jobson: Now in 2D

I recently made screencasts that explain Jobson in more detail. The first explains what Jobson is and how to install it. Overall, Jobson seems well-recieved. The first video seems to be leveling off at around 2700 views and Jobson’s github repo has seen a spike in attention.

Will other teams start adopting it or not? Only time will tell.