Dev diary…

I kinda fell off the wagon there when it comes to dev diaries, but let’s see if we can’t get this going again. Since last time, I gave up trying to extend npmlog indirectly– its heavy use of bind and the way that ansi integrates led to this. I spent more time fiddling with it as I’d had an incorrect model of bind. Somewhere I got the idea that you could rebind, but no, once bound a function will only ever be called with that object, no matter what you do to it. It turns out that these two really are semantically equivalent (though the bind version is MUCH slower to define AND to call):

Given:

Then:

Is semantically equivalent to:

So yeah, I ended up integrating my progress bar changes directly into npmlog. It’s currently in the version in the multi-stage branch. I do plan to pull that out into its own module to handle rendering, at which point I’ll look again and see if it can use any of the existing progress bar modules. What I did do was factor completion tracking out entirely. Your application interacts with the completion tracker which feeds completion data to your progress bar. This is currently living in [an unreleased module], which’ll see release before the npmlog updates see release. I’m quite happy about having factored completion tracking separately from the progress bar, as unusual as this might be. We actually have a lot of different kinds of things we need to count toward the overall install completion, from discrete tasks to downloads.

I also learned that node-gyp executes during the install lifecycle step, which does make sense as it replaced explicitly putting things in that step. As a result of this, and also just generally thinking about things that could reasonably happen in this step, I’ve moved it and postinstall till after finalize, so they now run in the final destination, not in staging.

I also learned that calls to node-gyp (and probably other lifecycle calls) go to stdout and bypass the log system entirely. This should really be fed into the logging system so that it can be suppressed or not and it can not muck up the progress bar.

I got distracted yak shaving and wrote [documentation for my bespoke dotfiles].

Wednesday/Thursday Oct-29/30 Dev Diary

These days kind of ran together due to a weird schedule, involving in part sleeping off the programming binge from earlier. Like most days it started off with support, emails and finding orphaned slack results…

  • I rewrote my commit history be be more sane, since people are starting to make noises about building on my branch. Having people build on my branch is … awkward because rewriting history is pretty core to how I work– I often make commits of experiments, then throw them out and replace them, then later flatten all that down. But I can make it work. The main thing I had to change was moving install to msinstall and restoring the old install. This makes my branch usable as real npm while we’re developing.
  • I started looking at failing tests now… which given that the replacement install command is temporarily named something different should be none of them. =D
  • Decided to spend some time with the progress bar…
    • I tried a few approaches to integrating with the logger and after a few false starts, settled on wrapping it, which thus far seems to be working ok. I’m not totally happy with how it factors into existing code, but I don’t yet see another option that’d integrate cleanly. =/
    • Still, I do actually have a percentage meter now, so plugging it into an actual progress bar should be pretty easy. There are still some integration issues with logging (due to its “I’m a giant magic global object” thing =D) This all took longer then I’d’ve liked.

Tuesday Oct-28 Dev Diary

In which we’ll work on “gyp and the lifecycle scripts”… which sounds like a band name. =p

  • Started with tickets and user support at first. For me this both feels like work and doesn’t feel actually productive. Which is unfortunate, because it sucks up time and then I feel guilty, when I know I shouldn’t. =D
  • Reviewed the api changes that @othiym23 is planning for npm-registry-client.
  • On a non-work related front, I got a pair of SQL files to try to reproduce a bug in an older project of mine. Unfortunately it doesn’t reproduce for me, which probably means the entire schema is required to reproduce it. =/
  • I took a moment to sketch out a module to compute how complete a larger process with discrete pieces of indeterminate length. This’d be plugged into an actual progress bar module. My requirements were:
    • What I wanted was to be able to split the overall completion up into a set of equally sized chunks and then allow each chunk to complete at its own rate. Further, I’d like each chunk to be able to be split up similarly, ad infinitum. My problem now is figuring out what the heck to call this thing? (progress-tracker? completion-tracker? how-complete-are-we-now? are-we-there-yet?) (Actually I like that last one! Maybe it will be are-we-there-yet!)
    • So as a for-instance, imagine you have 3 chunks, one is at 50%, one is at 70% and one is at 40%. I want the overall completion number to be 52%.
    • Or you have two chunks, one is at 25%, the other is split into 3 more chunks, of which, one is at 70% and the rest at 0%. The overall completion should be 23%.
  • Moar npm install actions:
    • extract – now extracts to TOP/node_modules/.staging/temp-unique-name
    • preinstall (lifecycle)
    • build – which means specifically linkStuff and writeBuiltinConf
    • install (lifecycle)
    • postinstall (lifecycle)
    • test (lifecycle)
    • remove
    • finalize – moves from the .staging directory to the final destination
  • Plus all the steps for the current module, if you didn’t pass an argument

So as you can see from the above, HUGE progress on multi-stage install. It’s really taking form now. I’m super excited!! There’s still a lot of work, but I feel like its really passed a milestone here.

Monday 10/27 dev diary

Wheee… today I…

  • Implemented the run actions stage of the npm multi-stage installer. Began implementing specific actions:
    • [x] Fetch
    • [x] Extract
    • [x] And then I had to rewrite how all of the actions are structured and implemented… because, of course I did.
    • On the plus side of this one, I can actually run npm install and get a complete tree of modules. Granted, its only useful if none of your modules require building, but still, it’s nice to see it actually doing something. Somehow it’s not quite real when it’s all still in memory you know? Disk use concretizes it somehow.
  • Integrating the update to npm-package-arg to the rest of the code base led me to npm bugs, npm repo, etc, which led me to needing more URLs for the various hosted git services. I released a new hosted-git-info with this. It also lead me updating them to use fetch-package-metadata to load their package data, instead of rolling their own.
  • And the rabbit hole was yet deeper, which I’m determined to climb out of… I realized that all the normalization going in in npm bugs, npm repo, npm docs was now for naught as fetch-package-metadata then passed control to normalize-package-data which was doing its own normalization. And low, the duplicated code, and it was legion… which ultimately led to me adding gist.github.com support to hosted-git-info, and then preparing a pull request for normalize-package-data. All in all this rabbit hole sucked up a lot of time. =/
  • Also started and then stopped writing my blog post on commit history style. I’ll find more time this week I’m sure.

Friday’s dev diary…

The best thing I ever did to wordpress was install a markdown module. =D

Anyway, today I…

  • Began reviewing @othiym‘s explicit registry auth pull request. It’s still a work-in-progress, but some of the commits were ready for review.
  • Dropped by IRC and created an issue on behalf of someone having problems there. Which ultimately involved some nerding out on the awfulness of Windows Filesystem APIs and the awful EBUSY error.
  • Did a weekly chat with @othiym about the day-to-day direction of the project. We also cleaned up the next-patch queue a bit, reducing the number of tickets flagged with it.
  • My friday ended up bleeding a bit on to the weekend, but only because me and my fiancée snuck a 5 hour drive up to Maine Friday evening and I ended up going to bed earlier than usual.
  • I kind of accidentally committed to writing a blog post on my feelings on git commit histories (spoiler: I optimize them for ease of code review). I hope to get that into a presentable state next week.
  • Updated the multi-stage request pull request to have a much more accurate comprehensive todo list. It looks long, but that’s mostly because the “to complete” list is a lot more detailed than the “completed” list.
  • Added documentation to the multi-stage request pull request on how install semantics will change. I don’t think we’ve summarized it in these terms previously.
  • Writing the documentation implied some changes I needed to make to how we choose which trees to compare
  • Completed initial action decomposition and started digging in on implementation of actions

Development diary…

I’ve decided to start keeping a development diary here. Mostly this was motivated by a desire to have a forum for rubber duck debugging, as working remote it’s otherwise hard to get this (my long suffering non-programmer fiancée can only take so much!) But it’d also be nice to have a better sense of my own velocity and where things have taken me. That this also gives the rest of the community a bit more insight into what’s getting attention is an extra bonus.

Today I…

  • Got @othiym23 to look at my npm-package-arg changes that integrate hosted-git-info, which in turn will allow the multi-stage install to both fetch package data from hosted repos via a shortcut (direct fetch of the package.json via HTTP, rather than having to clone via git), and also to not treat github as special, adding support for gitlab and bitbucket.
  • … and that led to a request for new npm-package-arg documentation generally, and, well, perhaps the curse of knowledge has made hard for me to see, as it feels well documented to me already. =D That said, I did manage to make some improvements, digging into my own realize-package-specifier for inspiration.
  • Made an issue to track adding website support for multiple hosted git repos.
  • Read @izs‘s excellent talk on corporate oss
  • Got the first issue on require-inject. Prepared a fix for it. This ended up being a little more involved then I was expecting– it turns out that the require object is unique per module, but that require.cache is shared. This means assigning something to require.cache (eg require.cache = {}) does nothing outside the current module. Where as mutating it DOES. Anyway, we’ve got new tests now and a new release, so all’s good.
  • TIL my somewhat eccentric habit of writing /[.]/ or /[/]/ instead of /\./ or /\// dates back to Damien Conway‘s Perl Best Practices.
  • Finally started write code to execute the multi-stage install‘s action list, which at this point means decomposing things like “install” into their component parts, eg, “fetch”, “extract”, “gyp-build”, “lifecycle scripts”, “move to final resting place”. (I say finally, because LAST week I was keen on getting to this and I didn’t and then THIS week I was keen on getting to it too. Alas, unforeseen prerequisites…)

Multi-stage installs and a better npm

(Originally posted to the npm blog, the source is on Github.)

Hi everyone! I’m the new programmer at npm working on the CLI. I’m really excited that my first major project is going to be a substantial refactor to how npm handles dependency trees. We’ve been calling this thing multi-stage install but really it covers more than just installs.

Multi-stage installs will touch and improve all of the actions npm takes relating to dependencies and mutating your node_modules directory. This affects install, uninstall, dedupe, shrinkwrap and, obviously, dependencies (including optionalDependencies, peerDependencies, bundledDependencies and devDependencies).

The idea is simple enough: Build an in-memory model of how we want the node_modules directories to look. Compare that model to what’s on disk, producing a list of steps to change the disk version into the memory model. Finally, we execute the steps in the list.

The refactor gives several needed improvements: It gives us knowledge of the dependency tree and what we need to do prior to touching your node_modules directory. This means we can give simple errors, earlier, much improving the experience of this failure case. Further, deduping and recursive dependency resolution are then easy to include. And by breaking down the actual act of installing new modules into functional pieces, we eliminate the opportunity for many of the race conditions that have plagued us recently.

Breaking changes: The refactor will likely result in a new major version as we will almost certainly be tweaking lifecycle script behavior. At the very least, we’ll be running each lifecycle step as its own stage in the multi-stage install.

But wait, there’s more! The refactor will make implementing a number of oft-requested features a lot easier– some of the issues we intend to address are:

  • Progress bars! #1257, #5340
  • Automatic/intrinsic dedupe, across all module source types #4761, #5827
  • Errors if we can’t find compatible versions MUCH earlier, before any changes to your node_modules directory have happened #5107
  • Better diagnostics when peerDependencies produce impossible to resolve scenarios.
  • Better use of bundledDependencies
  • Recursively resolving missing dependencies #1341
  • Better shrinkwrap #2649
  • Fixes some icky edge cases [#3124], #5698, #5655, #5400
  • Better shrinkwrap support, including updating of shrinkwrap file when you use –save on your installs and uninstalls #5448, #5779
  • Closer to transactional installs #5984

So when will you get to see this? I don’t have a timeline yet– I’m still in the part of the project where everything I look at fractally expands into yet more work. You can follow along with progress on what will be its pull request

If you’re interested in that level of detail, you may also be interested in reading @izs‘s and @othiym23‘s thoughts.

Abraxas– A node.js Gearman client/worker/admin library

Abraxas is an end-to-end streaming Gearman client and worker library for Node.js. (Server implementation coming soon.)

https://www.npmjs.org/package/abraxas

Standout features:

  • Support for workers handling multiple jobs at the same time over a single connection. This is super useful if your jobs tend to be bound by external resources (eg databases).
  • Built streaming end-to-end from the start, due to being built on gearman-packet.
  • Most all APIs support natural callback, stream and promise style usage.
  • Support for the gearman admin commands to query server status.
  • Delayed background job execution built in, with recent versions of the C++ gearmand.

Things I learned on this project:

  • Nothing in the protocol stops clients and workers from sharing the same connection. This was imposed by arbitrary library restrictions.
  • In fact, the plain text admin protocol can be included cleanly on the same connection as the binary protocol.
  • Nothing stops workers from handling multiple jobs at the same time, except, again, arbitrary library restrictions.
  • The protocol documentation on gearman.org is out of date when compared to the C++ gearmand implementation– notably, SUBMIT_JOB_EPOCH has been implemented. I’ve begun updating the protocol documentation here:
    https://github.com/iarna/gearman-packet/blob/master/PROTOCOL.md

Because everything is a stream, you can do things like this:

Or as a promise:

Or as a callback:

Or mix and match:

MySQL ROUND considered harmful

MySQL’s ROUND has different behavior for DECIMALs than it does for FLOATs and DOUBLEs.

This *is* documented. The reason for this is not discussed but it’s important. ROUND operates by altering the type of the expression to have the number of decimal places that it was passed. And this matters because the type information associated with a DOUBLE will bleed… it taints the rest of the expression:

We’re going to start with some simple SQL:

Here 2.5 is a DECIMAL(2,1) and 605e-2 a DOUBLE, and the result is a DOUBLE. That’s all well and good…

But let’s try rounding 605e-2.

So… what’s going on here? The round part of the expression shouldn’t have changed its value. And in fact, it hasn’t, calling ROUND(605e-2,2) returns 6.05 as expected. The problem here is that the type of ROUND(605e-2,2) is DOUBLE(19,2) and when that’s multiplied by 2.5 the resulting expression is still DOUBLE(19,2). But the number of decimals on a float is for display purposes only– internally MySQL keeps full precision… we can prove that this way:

So yeah… MySQL let’s you increase precision with ROUND– Postgres is looking mighty fine right now.

Survey of node.js Gearman modules

Here’s a brief survey of node.js Gearman modules. I’ll have some analysis based on this later.

Module Github Author Last
Commit
Open
Issues
Tests Docs Client Worker Multi
Server
Streams Errors Timeouts
gearman gofullstack/gearman-node smith, gearmanhq 2011-05-02 4
gearman-stream Clever/gearman-stream azylman, templaedhel 2014-03-21 0
Previously named gearman_stream, uses gearman-coffee
gearnode andris9/gearnode andris 2013-02-25 1
gearmanode veny/GearmaNode veny 2014-03-20 4
nodegears enmand/nodegears enmand 2013-12-07 1
que vdemedes/que vdemedes 2012-07-02 0
Uses node-gearman
gearman-js mreinstein/gearman-js mreinstein 2013-11-03 4
gearman2 sazze/gearman-node ksmithson 2013-09-17 0
Fork of gearman with no changes except name
node-gearman andris9/node-gearman andris 2013-08-13 2
node-gearman-ms nachooya/node-gearman-ms nachooya 2013-11-18 0
Fork of node-gearman
gearman-coffee Clever/gearman-coffee rgarcia, azylman, jonahkagan 2013-03-19 2
magictoolbox/node-gearman oleksiyk 2012-12-03 0

Whatever fills my mind…