Dave on SCM

Wednesday, September 26

Agile: Branches vs. Streams

Today I watched a subversion webinar where the presenter from a popular hosting/services company described three general branching strategies: unstable trunk, stable trunk, and agile. After seeing their "solution" for managing agile development with branches, I thought "I agree with the goal but what a complete merge mess using branches! Streams are significantly more natural and just plain easier." I'll explain by comparing to AccuRev's implementation of streams...

"Agile" Branching

Quick review... We've all used the unstable trunk pattern. You know... everyone commits to trunk and it's stability is not guaranteed; it only works well for a small team. The more pragmatic stable trunk pattern attempts to isolate projects/fixes on branches and control merging of only completed, stable changes to the trunk. The advantage here is that new projects or release lines can guarantee to start from a stable trunk configuration. Though, without locking trunk to fully test the merges, there is still a window of potential instability especially for those nasty runtime bugs.

Now to the point... Their agile branch pattern uses a branch-per-task strategy where tasks are eventually merged into target release branches. At various milestones, the release branches are merged into both trunk and ongoing task branches to keep them up-to-date. I've added the picture that was used during the presentation (though, I added the red 'merge' wording). See where I'm going with this? Notice how many merges are present for a trivial 4-task, single major/minor release scenario. I've used this exact type of pattern in a 250+ enterprise web development group with 40-50 parallel tasks contributed by local and remote teams on a 2 week release cycle -- it becomes completely and utterly unmanageable especially when you have to consider security concerns of who has visibility and control of branches and merge targets. It's a nightmare at best even with branch naming conventions and is exactly how most of the file/branch based SCM systems will work, fancy graphics or not.

Agile Streaming

Quick review... a stream represents a single configuration of source code. So you might have an "integration" stream, a "Tuesday night" build stream, or "3.0" official release stream. Any project will have a 'tree' of streams describing mainline development, previous releases, and maintenance work. The trick with streams is that they have a unique property where they automatically inherit changes from their parent. But it doesn't stop there. Any newer versions of files along the entire parent path of streams is inherited. If you're familiar with the OO programming model, it works very similar -- In the same way that adding a new method in a super class is automatically visible to all sub-classes, newer versions of files and directories in parent streams are visible to all child streams.

Now to the point... unlike using branches, streams don't require massive merging all over the place. Why? Built-in inheritance. Lets say you have 4 tasks as streams all working off of a mainline Integration stream (see pic of AccuRev stream browser client). If you promote a single task (i.e. bunch-o-files) to Integration, the other 3 task streams -automatically- have visibility to the newer versions! This allows you to merge-early, merge-often not by manual error-prone practice but accurately and predictably by stream technology. Translated to branch-speak, only a single merge is required to give complete visibility to newer versions to every other task. Furthermore, this example shows only 4 tasks. Lets say you have 40 or 400 concurrent tasks -- you still only need a single promote of a given task to have it automatic delivered to every other task in-progress.

In summary.... Comparing the two pictures, you're probably saying, "How can it be that simple?" Well... this is what a contemporary stream-based architecture gives you. Gone are the days of merge here, merge there... oops, we forgot to merge way over there. In addition, we-the-workhorse-developer don't have to be SCM tool merge experts struggling to determine which of our 9 branches need to be merged into the release candidate branch on Friday night. If your task is on the mainline stream path, you are absolutely guaranteed to have been up-to-date with anything you need. No more guesswork! Finally, the best part about inheritance is if you have a long-lived task, simply stay put and automatic inheritance will implicitly keep you up-to-date with the rest of the world!

Thursday, June 7

Air Gap Development

Lately I've been working with companies doing classified software development. Individual access to source code depends on each person's clearance level (e.g. Q clearance). To separate access between "sensitive" and "unsensitive" source code, development occurs on unclassified (low) and classified (high) networks, respectively. Physically, this means there is an impenetrable barrier between the computers/networks. Thus, gaining access to source code means gaining access to the appropriate network.

Transfer of file changes can sometimes occur from the low network to the high network but never vice versa. Traditionally, this has been implemented by a "sneaker net" where files are transferred between two computers by disk or tape and requires "walking" to each computer. I recently came across a clever data transmission technology that physically guarantees unidirectional transfer of data over a TCP/IP network including handshake protocols! Take a look at Owl Technologies' dual-diode technology.

Saturday, April 21

Agile Programmable Completion - AccuRev + GNU Bash

When at the command line (CLI), productivity means keeping your hands on the keyboard. But once your fingers have memorized all the commands, flags, static arguments, and common usage patterns -- can you still get faster? Yes.

Programmable completion is a shell facility that allows for customizing the command line in real-time as it is typed. Also referred to as "TAB Completion", may shells in both Linux and Windows have a default implementation that support completion on filenames and directories. If you're lucky, you'll even get environment variables and functions.

Lets move to SCM. Various branch-based SCM systems like CVS, SVN, and P4 have basic tab completion of commands and flags. Thats a good start. But an agile user needs a context-sensitive, custom-data completion facility. What you ~really~ want is completion on your own data -- branch names, labels, usernames, etc. Users of stream-based AccuRev are in luck.

Do you use AccuRev on Linux? If so, download the latest GNU Bash (2.05+) completion for AccuRev 4.5.x. Here is the README. You'll never have to memorize flags or type stream names again.

Coming in Part 2 -- Support for Windows users.

Friday, February 23

SCM Video Humor

I recently did a YouTube search for the popular SCM tools on the market. Results presented In alphabetical order.

AccuRev
- much easier to install than clearcase.
- with a single user interface.
- and flexible stream-based software development models.
- this one is just neurotic.
- no wrappers or scripts here.
- and especially developer friendly.

Bitkeeper
- no match. Apparently all related videos are in the attic.

ClearCase
- better than CVS but requires a super hero to use.
- unrelated, ironic video on not being able to escape.

CVS
- no match. Unrelated video about extracting an alien from a human womb.

Perforce
- no match. Unrelated "talent" show with a singer wearing shades to hide his identity.

Subversion
- no match. Unrelated video game. Great for up to 4 players.

In SCM We Trust

Consider the following SCM tool-agnostic sequence of events:

codefreeze.

fix, fix. label RC1 --> build. package. deploy. test.

fix, fix. label RC2 --> build. package. deploy. test.

fix, fix. label RC3 --> build. package. deploy. test. Verified!

Now what? Do you....

[A] Declare RC3 the official release label and "copy" the verified package to production? The actual verified code is in production but how does someone really know from SCM that RC3 is in production? maybe RC3 was rolled back and RC2 is in production. How does someone know that RC4 isn't being worked on?

[B] Subsequently create a new release label with an official "event describing" name like "Release_X.Y" then rebuild, repackage and deploy to production? The new label accurately describes the official release and demarcates the end of release candidates but the rebuild and repackage were not verified. Do you trust your build system? (topic for a separate post)

[C] Deploy the verified code but subsequently create a new "event describing" label like "Release_X.Y" and trust that the configuration for the new label is identical to that of deployed RC3 configuration? Developers may fix bugs using the new release label but technically it wasn't used to generate the actual production code.

I advise doing [C]. Though, it helps to have a transaction-based SCM system (like AccuRev) where the label operation simply marks the transaction # rather than each-and-every file in the configuration - afterall, tainted files are untested files. This way, as long as two labels refer to the same transaction, they will refer to the same configuration.

Thursday, February 22

Vendor Code Management.... with Streams

Lets say you've just obtained someone else's source code.

Maybe you've obtained this code from an opensource project, a mutual business partner, or a department in your own company. The source code in-hand is nifty indeed and you have mentally crafted some grand new features you'd like to add -- after all, you have the source! But there's one small problem....you don't "own" the code. The original authors continue to develop their new features and bugfixes and have a future roadmap that plans to offer subsequent major, minor, and patch releases.

We can call that "someone else" the "vendor" and their source code the "vendor code." It's that simple.

So vendor code management deals with tracking your changes to someone else's code, tracking their new releases, and tracking the merges between your customizations with theirs. In more technical terms, we'll use the following definition.

Vendor Code Management -- The process of tracking and propagating custom changes to external, evolving 3rd party codelines.

Managing all these moving parts can be tricky indeed unless you have the right tools.

Old School. The solution used by traditional branch-based SCM systems is to use yet-another-branch called a "vendor branch" [clearcase, perforce, subversion, cvs]. The vendor branch was nothing more than a branch off of mainline (typically) where you committed raw vendor code and had the wildly laborious and error prone task of committing new vendor code releases and merging their changes, with your changes, all on... other branches. Without private workspaces you absolutely need a new branch in order to test fragile merges in isolation and commit them for savepoints in event a partial rollback is required. And then at some point merging all your feature branches to mainline, cutting a release candidate branch (branch late, right!) and procuring a stable release. Oh, and then merge the release candidate branch to mainline to share last minute bugfixes. Did you save the whiteboard diagram of all your merges? What diagram? exactly.

Enter Streams-based SCM. We all read books and articles and continue to preach about software development best practices such as continuous integration, private workspaces, reproducible configurations, promotion levels, named stable baselines, etc. A stream-based SCM system (like AccuRev) introduces a fresh new paradigm for managing the development and maturation of software changes. With a promotion-based workflow model, best practices can be implemented simply using the single, basic building block -- streams. Lets say we wanted to solve a problem like... oh, say managing vendor code...
The adjacent picture is a labelled screenshot of the AccuRev StreamBrowser and shows 3 primary stream motifs that can be applied to achieve an intuitive development model. First, with vendor code rooted in the repository, snapshots that represent successive, raw vendor code imports serve as roots for custom, named stable base development. Second, Each codeline can adopt its own promotion-based workflow -- in this case, both visible codelines employ Integration -> QA -> Release. Third, each custom release (represented as snapshots) can further serve as new roots of development for release-specific patch or feature development. Creating a tree of streams helps tell a story -- some stories are told from the top to bottom, while others start at snapshots representing new roots of development.

Thankfully, a whitepaper has been written all about the subject thus keeping this blog post to a few paragraphs. It talks in gory detail about the pitfalls of the branch-based solution and the benefits the Stream-based solution. Do you have further questions about merging custom changes across codelines using change packages? It's in the whitepaper ;) I'd be interested to hear from folks who currently use a traditional branch-based solution and want to know more about Streams. I've used the old branch-based technique myself in a past life tracking custom, business-rule changes to Bugzilla -- oh the dreaded horror. If you have questions about the blog post or the whitepaper, feel free to drop me a line!

Wednesday, January 24

Danger0us Pr0xim!ty

I recently returned from a consulting engagement in Seattle and traded banter with release engineers about "nightmare builds." I've had these same discussions with other companies and it's interesting how keyboard proximity is notorious for wrecking havoc during various software release situations. At a previous job, I personally recall a failed production deployment caused by mispeling "prod" as "pr0d" (zero) in a config file. That little midnight gem resulted in a 4-hour, 20+ person conf call that lasted until 4am. And, yes, I happened to be on vacation when I received that critical page. ugh.

Here are some pairings of keystrokes with close proximity that can easily become problematic:

0 / o -- Setting variable to 'pr0d instead of 'prod'
1 / ! -- Meant the bang but shift-key didn't depress
E / 3 -- dislexia at 2am
= / == -- Testing a runtime condition only to "set" it
- / = -- dislexia at 2am
l / | -- meant pipe, got 'el' on return to homerow
: / ; -- shift didn't depress

Saturday, January 20

Welcome!

This seems like a good place to talk about topics in software configuration management (SCM) and software development best practices.

I've been a programmer for 10 years having spent the last 5 years at Orbitz.com. At Orbitz, I was part of an agile 50-person engineering team that grew to 500+ and was eventually bought for $1.25B. It was an invaluable in-the-trenches, firsthand experience evolving from "shoot-from-the-hip" to "process-driven" development. I now have a cool job working for AccuRev creator of the most contemporary (and fun!) configuration management system to-date.

I hope you'll enjoy the thoughts and discussions.

Disclaimer: This is my personal weblog. The opinions expressed here represent my own and not those of my employer AccuRev.