The main reason why most so-called SCM tools are not really SCM tools is because they don’t support managing software configurations. Making software is more than writing source code and converting them into executable code and data models for databases. A real SCM tool would be able to capture everything that is important for deploying and maintaining the software. This includes requirements, designs, models, sources, tools, infrastructure, knowledge, skills, test scripts, test data, manuals, scripts and other information.
Most SCM tools are able to capture files in a structure and control changes to the files and the structure. New files and new versions of existing files are all merely new files. Typically, the structure is 3-dimensional:
- Directory (or folders) structure
- Version (or revision) structure
- Branching structure
The directory structure may seem to be a 2-dimensional structure (i.e. nested directories and directory next to eachother at the same nesting level), but if we consider the pathname + filename as the single identifier for a file, then the directory may be considered as 1-dimensional. The version structure is 1-dimensional: the successive versions supersede their predecessors. How about parallel versions? Parallel versions are partial contributions to a single succesor version. The actual successor is a merge of these partial contributions. Branches look similar to parallel versions, but the essential difference is that parallel versions are partial contributions to a single successor while branches are full contributions (version structures) to alternative successors.
If we look at the implementation of these dimensions, then the most simple implementation is 1-dimensional: all 3 dimensions are projected onto the same implementation, e.g. as directory structure (or path+filename). The “version control tool” could be an ordinary file system. For example:
The problem with this version control system is that path+filenames changes for every new version. So users and the build process have to do more work to figure out which version they should us.
The next better implemention would be a 2-dimensional implementation: directory and branching are combined into a single dimension (path+filename), and versioning is the other dimension. Simple version control tools like Subversion works this way. For example:
main/gui/generic/foo.c (versions: 1)
main/webgui/unix/foo.c (versions: 1, 2 and 3)
main/webgui/winxp/foo.c (versions: 1 and 2)
R18.104.22.168/gui/generic/foo.c (versions: 1)
R22.214.171.124/webgui/unix/foo.c (versions: 2)
R126.96.36.199/webgui/winxp/foo.c (versions: 2)
Advantage is that the path+filename remains the same for all versions within a branch. But for different branches, the path+filename is different. And since directories and branches are resolved in the same dimension (the path), it is not possible to distinguish between a directory and a branch. For example, are main and R188.8.131.52 different branches? Are unix and winxp different branches? Are gui and webgui different branches? Or are they different directories within the same branch? So users have to make agreements about naming conventions to distinguish between branches. The SCM tool only takes care of deciding (automated) which version is used.
One step further is a 3-dimensional solution, where directory (path+filename), version and branch are independent of each other. More advanced version control tools like ClearCase or Synergy are needed. For example:
gui/foo.c (versions: 1 on branch: main)
webgui/foo.c (versions: 1 and on branch: main; versions: 2 and 3 on branch: unix; versions: 2 on branch winxp)
Advantage is now that the path+filename remains the same for all versions and all branches. This simplifies the implementation of an automated build process and the description in design documents and models. But the counterside is that SCM has to have the information to decide which branch a user is working on in order to select the correct version for the user to work on. And the user may be unaware of the branch he is working on – introducing the risk that he is working on the wrong branch.
So on one side SCM makes life easier for the user and the organization (e.g. automation), but on the other side it introduces extra work to reduce the risk mistakes or to repair them.
As you can see, I have left out the baseline (R184.108.40.206) from the last example. In the first and second example (1 and 2 dimensional), the baseline was combined with the directory dimension. In the last example, the baseline could be combined with the branch dimension, but it could also be implemented as a 4th dimension: labeling or tagging.
And this brings me to the point where version control enters the domain of configuration management. An essential feature of configuration management is to identify dependencies. A dependency defines which objects belong together. There are many different dependencies that can be (or need to be) identified, for example:
- Directory dependency: all files within the same directory tree
- Branch dependency: all files on the same branch
- Version dependency: all latest versions
- Status dependency: all files with the same status (e.g. release R220.127.116.11)
- Content dependency: all files with compatible content (e.g. requirements-design-code consistency)
The status dependency is typically modeled in a so-called promotion model. All files go through a predefined series of statuses, and files (versions) with similar status and context (e.g. branches, directories) belong together as a configuration. These statuses are for exampe: working, integration testing, system testing, released. Tools support promotion by branching (e.g. ClearCase/UCM by deliver and rebase, or Subversion by “smart” copying of directories called branching) or by selection rules (e.g. Synergy by reconfigure property templates).
But one of the biggest shortcomings in SCM tools that I know is the absense of support for content dependencies. How do you identify the impact of the change of the design on code, requirements and tests? How do you identify the impact of code changes on other code, interfaces, design models? How to maintain the content information efficiently? How to you know that the content dependency is compromised? How do you know that release 3.2 of product X does work with release 1.2 of the framework, but not with release 1.1 of the framework? How do you know that release 6.1 of product Y cannot work with product X because it does not work with framework 1.2?
Another big shortcoming of SCM tools is that they only support control on file level. They don’t control requirements, components in the design model, test cases in a test specification, tool versions (e.g. compilers, IDEs, webservices), hardware versions (e.g. 32-bit architecture). Consequently, many organizations try to capture those items in files, e.g. by creating a requirement specification document that “baselines” a set of requirements. But then again, those individual requirements – although versioned in a requirement management tool – cannot be identified as separate objects in the SCM tool, let alone that dependencies on requirements level can be identified or that individual requirements can be identified to a baseline.
The only solution that I am aware of that comes close to an “SCM tool” is the Jazz platform, starting the Rational Team Concert, but integrated with the requirements, test and project management applications. Since all information is stored in a composite repository, where information objects are actually identified as objects (not as files), it becomes possible to identify relationships (such as dependencies) between objects (not only files). Yet, I doubt whether it will be capable of identifying dependencies between configurations, e.g. content dependencies between software packages.