Thursday, February 22

Vendor Code Management.... with Streams

Lets say you've just obtained someone else's source code.

Maybe you've obtained this code from an opensource project, a mutual business partner, or a department in your own company. The source code in-hand is nifty indeed and you have mentally crafted some grand new features you'd like to add -- after all, you have the source! But there's one small problem....you don't "own" the code. The original authors continue to develop their new features and bugfixes and have a future roadmap that plans to offer subsequent major, minor, and patch releases.

We can call that "someone else" the "vendor" and their source code the "vendor code." It's that simple.

So vendor code management deals with tracking your changes to someone else's code, tracking their new releases, and tracking the merges between your customizations with theirs. In more technical terms, we'll use the following definition.

Vendor Code Management -- The process of tracking and propagating custom changes to external, evolving 3rd party codelines.

Managing all these moving parts can be tricky indeed unless you have the right tools.

Old School. The solution used by traditional branch-based SCM systems is to use yet-another-branch called a "vendor branch" [clearcase, perforce, subversion, cvs]. The vendor branch was nothing more than a branch off of mainline (typically) where you committed raw vendor code and had the wildly laborious and error prone task of committing new vendor code releases and merging their changes, with your changes, all on... other branches. Without private workspaces you absolutely need a new branch in order to test fragile merges in isolation and commit them for savepoints in event a partial rollback is required. And then at some point merging all your feature branches to mainline, cutting a release candidate branch (branch late, right!) and procuring a stable release. Oh, and then merge the release candidate branch to mainline to share last minute bugfixes. Did you save the whiteboard diagram of all your merges? What diagram? exactly.

Enter Streams-based SCM.
We all read books and articles and continue to preach about software development best practices such as continuous integration, private workspaces, reproducible configurations, promotion levels, named stable baselines, etc. A stream-based SCM system (like AccuRev) introduces a fresh new paradigm for managing the development and maturation of software changes. With a promotion-based workflow model, best practices can be implemented simply using the single, basic building block -- streams. Lets say we wanted to solve a problem like... oh, say managing vendor code...
The adjacent picture is a labelled screenshot of the AccuRev StreamBrowser and shows 3 primary stream motifs that can be applied to achieve an intuitive development model. First, with vendor code rooted in the repository, snapshots that represent successive, raw vendor code imports serve as roots for custom, named stable base development. Second, Each codeline can adopt its own promotion-based workflow -- in this case, both visible codelines employ Integration -> QA -> Release. Third, each custom release (represented as snapshots) can further serve as new roots of development for release-specific patch or feature development. Creating a tree of streams helps tell a story -- some stories are told from the top to bottom, while others start at snapshots representing new roots of development.

Thankfully, a whitepaper has been written all about the subject thus keeping this blog post to a few paragraphs. It talks in gory detail about the pitfalls of the branch-based solution and the benefits the Stream-based solution. Do you have further questions about merging custom changes across codelines using change packages? It's in the whitepaper ;) I'd be interested to hear from folks who currently use a traditional branch-based solution and want to know more about Streams. I've used the old branch-based technique myself in a past life tracking custom, business-rule changes to Bugzilla -- oh the dreaded horror. If you have questions about the blog post or the whitepaper, feel free to drop me a line!

3 comments:

Anonymous said...

Excellent paper! I've been reading a lot about stream-based vs. branch and merge systems of late and this is a very well articulated article, and clear paper about the power streams provide. Thank you very much!

Anonymous said...

I read the paper wanting to understand why streams are better than branches, but the only advantage I can see is that streams provide automated merging. This is _an_ advantage, yes, but the many other advantages you cite in the paper come from the stream hierarchy you have set up. I don't see any reason why you can't set up the same hierarchy using branches.

fepus said...

Thanks for the feedback bill.

As the paper points out, you *can* use branches to manage vendor code, but it becomes a manual nightmare with any level of sophistication - tracking branches and merging is painful no matter how you put it.

The benefits of my stream structure are two-fold. First, streams provide a separation of concerns. Raw vendor code is managed to the left of vendor snapshots while customizations are managed to the right. Snapshot streams are immutable so they create a physical barrier between the two types of code being managed. Second, as you point out, the 'inheritance' nature of streams means that I can promote changes (or changesets!) to a single stream, and all children streams automatically have visibility. While you *can* mimic a similar tree with branches, propagating a single fix to a tree of branches means manual merge-merge-merge to each branch.