Multi-stage installs and a better npm

(Originally posted to the npm blog, the source is on Github.)

Hi everyone! I’m the new programmer at npm working on the CLI. I’m really excited that my first major project is going to be a substantial refactor to how npm handles dependency trees. We’ve been calling this thing multi-stage install but really it covers more than just installs.

Multi-stage installs will touch and improve all of the actions npm takes relating to dependencies and mutating your node_modules directory. This affects install, uninstall, dedupe, shrinkwrap and, obviously, dependencies (including optionalDependencies, peerDependencies, bundledDependencies and devDependencies).

The idea is simple enough: Build an in-memory model of how we want the node_modules directories to look. Compare that model to what’s on disk, producing a list of steps to change the disk version into the memory model. Finally, we execute the steps in the list.

The refactor gives several needed improvements: It gives us knowledge of the dependency tree and what we need to do prior to touching your node_modules directory. This means we can give simple errors, earlier, much improving the experience of this failure case. Further, deduping and recursive dependency resolution are then easy to include. And by breaking down the actual act of installing new modules into functional pieces, we eliminate the opportunity for many of the race conditions that have plagued us recently.

Breaking changes: The refactor will likely result in a new major version as we will almost certainly be tweaking lifecycle script behavior. At the very least, we’ll be running each lifecycle step as its own stage in the multi-stage install.

But wait, there’s more! The refactor will make implementing a number of oft-requested features a lot easier– some of the issues we intend to address are:

  • Progress bars! #1257, #5340
  • Automatic/intrinsic dedupe, across all module source types #4761, #5827
  • Errors if we can’t find compatible versions MUCH earlier, before any changes to your node_modules directory have happened #5107
  • Better diagnostics when peerDependencies produce impossible to resolve scenarios.
  • Better use of bundledDependencies
  • Recursively resolving missing dependencies #1341
  • Better shrinkwrap #2649
  • Fixes some icky edge cases [#3124], #5698, #5655, #5400
  • Better shrinkwrap support, including updating of shrinkwrap file when you use –save on your installs and uninstalls #5448, #5779
  • Closer to transactional installs #5984

So when will you get to see this? I don’t have a timeline yet– I’m still in the part of the project where everything I look at fractally expands into yet more work. You can follow along with progress on what will be its pull request

If you’re interested in that level of detail, you may also be interested in reading @izs‘s and @othiym23‘s thoughts.

Abraxas– A node.js Gearman client/worker/admin library

Abraxas is an end-to-end streaming Gearman client and worker library for Node.js. (Server implementation coming soon.)

Standout features:

  • Support for workers handling multiple jobs at the same time over a single connection. This is super useful if your jobs tend to be bound by external resources (eg databases).
  • Built streaming end-to-end from the start, due to being built on gearman-packet.
  • Most all APIs support natural callback, stream and promise style usage.
  • Support for the gearman admin commands to query server status.
  • Delayed background job execution built in, with recent versions of the C++ gearmand.

Things I learned on this project:

  • Nothing in the protocol stops clients and workers from sharing the same connection. This was imposed by arbitrary library restrictions.
  • In fact, the plain text admin protocol can be included cleanly on the same connection as the binary protocol.
  • Nothing stops workers from handling multiple jobs at the same time, except, again, arbitrary library restrictions.
  • The protocol documentation on is out of date when compared to the C++ gearmand implementation– notably, SUBMIT_JOB_EPOCH has been implemented. I’ve begun updating the protocol documentation here:

Because everything is a stream, you can do things like this:

Or as a promise:

Or as a callback:

Or mix and match:

MySQL ROUND considered harmful

MySQL’s ROUND has different behavior for DECIMALs than it does for FLOATs and DOUBLEs.

This *is* documented. The reason for this is not discussed but it’s important. ROUND operates by altering the type of the expression to have the number of decimal places that it was passed. And this matters because the type information associated with a DOUBLE will bleed… it taints the rest of the expression:

We’re going to start with some simple SQL:

Here 2.5 is a DECIMAL(2,1) and 605e-2 a DOUBLE, and the result is a DOUBLE. That’s all well and good…

But let’s try rounding 605e-2.

So… what’s going on here? The round part of the expression shouldn’t have changed its value. And in fact, it hasn’t, calling ROUND(605e-2,2) returns 6.05 as expected. The problem here is that the type of ROUND(605e-2,2) is DOUBLE(19,2) and when that’s multiplied by 2.5 the resulting expression is still DOUBLE(19,2). But the number of decimals on a float is for display purposes only– internally MySQL keeps full precision… we can prove that this way:

So yeah… MySQL let’s you increase precision with ROUND– Postgres is looking mighty fine right now.

Survey of node.js Gearman modules

Here’s a brief survey of node.js Gearman modules. I’ll have some analysis based on this later.

Module Github Author Last
Tests Docs Client Worker Multi
Streams Errors Timeouts
gearman gofullstack/gearman-node smith, gearmanhq 2011-05-02 4
gearman-stream Clever/gearman-stream azylman, templaedhel 2014-03-21 0
Previously named gearman_stream, uses gearman-coffee
gearnode andris9/gearnode andris 2013-02-25 1
gearmanode veny/GearmaNode veny 2014-03-20 4
nodegears enmand/nodegears enmand 2013-12-07 1
que vdemedes/que vdemedes 2012-07-02 0
Uses node-gearman
gearman-js mreinstein/gearman-js mreinstein 2013-11-03 4
gearman2 sazze/gearman-node ksmithson 2013-09-17 0
Fork of gearman with no changes except name
node-gearman andris9/node-gearman andris 2013-08-13 2
node-gearman-ms nachooya/node-gearman-ms nachooya 2013-11-18 0
Fork of node-gearman
gearman-coffee Clever/gearman-coffee rgarcia, azylman, jonahkagan 2013-03-19 2
magictoolbox/node-gearman oleksiyk 2012-12-03 0

Using cron with “every”

The “every” command is that I wrote, inspired by the unix “at” command.  It adds command to your crontab for you, using an easier to remember syntax.  You can find it on github, here:

I was reminded because of this article on cron for perl programmers who are unix novices:

Here’s how you’d write their examples using “every”:

What’s more, there’s no need to specify the path to Perl, because unlike using crontab straight up, it will maintain your path.  Even better, you can use relative paths to refer to your script, eg:

This works because every ensures that it executes from the place you set it up.  Just like “at” it uses all of the same context as your normal shell.

Education clearly doesn’t help reporters

“Education clearly pays. Despite recent questioning of the value of university degrees, more than two thirds of the top one per cent had a university degree, compared to 20.9 per cent of the total population.”

No, that’s not what that says at all. It says that the wealthy value degrees, not that degrees make one wealthy.  It says that degrees are something that wealthy people do, but if you just get a degree thinking it will make you wealthy, you’re as confused as the cargo-cults that would build bamboo airports in hopes of attracting a supply plane.

Easy ad-hoc publishing on a machine with Apache


This is just a little hack of mine to make it trivial for me to reflect any directory on my server as a website, either with a name I specify or a hash. Handy for all sorts of things, I initially created it to give myself an easy way to view remote coverage reports that generated to HTML. It’s also a nice way to view HTML docs bundled with a package, or any other random HTML you come across.

How it works

As part of setup, we create a file based apache rewrite map that rewrites slugs off of our domain based on rules from a text file. These text files are super simple, just the slug followed by a space and then what to rewrite to.

With the setup out of the way, we have a very simple shell script that uses Perl to figure out the absolute path from your relative one and uses openssl to generate a hash from that. It uses the hash as the slug if you don’t specify one.  Once it’s appended these to the rewritemap file it tells you what your new URL is.

The example in the repo obviously isn’t generic, it refers to a host I control, but that’s easily editable.  This is less software package and more stupid sysadmin hack.

Android Ports for Corporate Firewalls

Beyond the standard 80 and 443 to handle web traffic, Android also needs 5222 (Jabber) and 5228 (allegedly Google Marketplace, but needed for a phone to fully connect to the network and have functioning Google Talk).

Mail is also likely needed too of course, with SMTP 25 and 995, POP 110 and 993, and IMAP 143 and 465. For some setups you may also need LDAP, 389 and 636. Exchange needs 135 and in some esoteric configurations, NNTP with 119 and 563.

AnyEvent::Capture – Synchronous calls of async APIS

I’ve been a busy little bee lately, and have published a handful of new CPAN modules— I’ll be posting about all of them, but to start things off, I bring you: AnyEvent::Capture

It adds a little command to make calling async APIs in a synchronous, but non-blocking manner easy. Let’s start with an example of how you might do this without my shiny new module:

The above is not an uncommon pattern when using AnyEvent, especially in libraries, where your code should block, but you don’t want to block other event listeners. AnyEvent::Capture makes this pattern a lot cleaner:

The AnyEvent::DBus documentation provides another excellent example of just how awkward this can be:

With AnyEvent::Capture this would be:

We can also find similar examples in the Coro documentation, where rouse_cb/rouse_wait replace condvars:

Even still, for the common case, AnyEvent::Capture provides a much cleaner interface, especially as it will manage the guard object for you.

Golfing for Gotchas

I’ve been building a little stand alone command line tool lately, which led to me looking at using App::FatPacker to make a standalone, single-script download. This was going well until I tried to load Digest::Perl::MD5, which caused fatpacker to mysteriously crash with an undefined value. The reason for this is interesting…

When fatpacker goes to analyze a module list, it at one stage runs require on all of them, like so:

Then later on it uses @packages and discovers that one of the elements has is now undef. How did this happen?

Well, if the module you require fiddles with $_ without localizing it first, that will ultimately result in modifying @packages. How did Digest::Perl::MD5 do this?

It read its data block at load load time. And of course, the last value in $_ there is undef. This all would have been avoided had the require loop been written out to be less golfy:

Of course, this is the same function that had:

And anyone who would publish code with that in it is probably beyond hope. ;)

(Bugs with patches have been filed with both App::FatPacker and Digest::Perl::MD5. The latter, to just localize $_ before the while loop.)

Whatever fills my mind…