Navigation:

Search



Related Articles

Our Friends

Articles Home Directory Sprawl (Using CVS)
 

Home Directory Sprawl (Using CVS)

A description of CVS, how it is useful, and how to set up a CVS pserver respository.

This was written by David L. Cantrell and given on Wed Oct 02 2002.

Table of Contents


CVS stands for the Concurrent Versions System (or Concurrent Versioning System or Concurrent Versioning Software, and so forth) and is pronounced Cee Vee Ess. It was designed as a replacement for RCS, or the Revision Control System.

1. Background

The idea of tracking changes made to a development project has been around for a while. It's not a particularly interesting job, as evidenced by the tools we have today. Some of the more popular source code control systems are CVS, RCS, and SCCS.

CVS Concurrent Versions System. This is the de-facto standard used in open source community.
RCS Revision Control System. The old system used before we had CVS.
SCCS Source Code Control System. Used by companies like Sun. You won't find SCCS tools in the open source world. If you're itching to see SCCS, look in /usr/ccs/bin on a Solaris machine.

There are some new projects underway to replace CVS. While CVS does work, it has some well-known limitations that are now becoming an issue as CVS-controlled projects become larger and larger. A popular one in the Linux community is BitKeeper (or simply bk). BitKeeper is made by BitMover, Inc. They designed bk to be a good enough tool that Linus would be willing to use it for kernel development. A difficult task, at best. BitKeeper is in use in a lot of places, but one of the major differences that you'll find between bk and CVS is that you can't get the bk source code, only binaries. This is because BitKeeper is not an open source project, it's commercial software.

And there is yet another hopeful CVS replacement, Subversion. The Subversion project was started by the CVS authors (!) because they wanted to correct CVS's deficiencies and make the upgrade somewhat painless. We'll see what becomes of that project, so far it's looking pretty good.

... So, with all the above, why CVS? Several reasons, and most of them are related to the current status of the other projects:

  1. CVS is well established, it's pretty much error-free.
  2. It's open source, which means you can get it working on basically any platform.
  3. I don't like trusting ALL of my data, or even just my development projects to a new, experimental, and developmental source control system.

In a few months, when I decide to look at Subversion again, I may choose to move over to that. But for now I'm going with CVS. If you decide to never use CVS for your own purposes, it's still a good idea to understand how it works because it is used in so many projects.

2. Terminology
Repository
The name of the CVS server. This is where all of the data is stored, it's where your changes go, and it's where you pull updates from.
Tag
This is the CVS name for a version or release.
Module
The project you work on from the repository. Typically a repository will only have a few modules, one for each major aspect of development. For example, FreeBSD has the 'src', 'doc', and 'ports' modules.
Branch
Concurrent development on the same module. CVS does a really neat trick which allows you to branch development at any time and it will start tracking changes specific to that. This is useful for software development because you can make a branch after a release and use it for security updates.
Pserver
CVS's internal server mechanism. Don't use this.
Attic
CVS never deletes a file from the repository. Files marked as deleted get put in the attic.
3. How CVS Works

The terminology section may have given you some hints as to how CVS works and what it actually does besides "source code control." The easiest way to think about it is to think of it as a development mediator. Multiple developers working on the same project, CVS handles merging all the work of the developers, it handles sending out changes to each developer, and so forth.

Life in CVS begins by creating a project, which consists of at least one module. In this module, you import the files that belong to it. In the world of software development, this consists of C source code files, header files, Makefiles, support files like X pixmaps, and documentation.

What doesn't go in CVS? Anything that can be automatically generated. Object files that the compiler creates, dependency files, the program or library executables, and ... configure scripts. Autoconf can regenerate the 'configure' script, so you don't want this in CVS. Before distribution in gzipped tar format, most developers run 'autoconf' to create that script for you.

Your source is now under CVS control. To work on it, you checkout a copy, make your changes and commit them to the repository. Periodically you will run the update command to pull down any changes from other developers. And, if it's like any other CVS project, you'll have merge conflicts that will need resolving by hand.

That's the big picture. The important parts are understanding that you are working on a copy of what's under CVS control. You use CVS to manage the changes to your copy and other copies.

4. Problems With CVS
4.1. Permissions and ownerships

CVS does not store permissions and ownerships on files. You cannot flag a file under CVS control as world readable, for instance. When you check something out from CVS, the files come to your workstation and CVS chowns and chmods them according to your umask. When you commit changes, CVS pulls in your changes, but doesn't modify the permissions in the repository. Most projects use different group permissions on the repository so they can restrict access to project developers.

People using CVS for more than software development make use of the post operation scripts to overcome this limitation. You can tell CVS to run a script or program after a CVS update or commit. In this script you can set permissions and other such things that CVS doesn't handle.

4.2. Symbolic links

You cannot store symbolic links in CVS. Forget about it, ain't gonna happen. Use the script hack above or just say good bye to symbolic links forever.

4.3. Binary files

CVS provides source code control. You can check out old versions, generate patches between releases, and many other tasks specific to software development. To perform these tasks, CVS must deal with plain text files. This is how it can track changes between the files. This presents a major issue for binary files. So much of an issue that CVS just doesn't handle binary files. Now we're starting to have some problems. To get around this problem, we can flag a file as 'binary' and CVS won't track changes to it. It will just make sure one copy is in the repository. Generally speaking, this works fine, but it means you can't use CVS to track changes between, say, JPEG image files.

4.4. You cannot delete directories

Once a directory is added in a CVS project, you can't delete it. Remember I said that CVS never deletes a file, it just moves them in to the attic? Well, if you checkout an old release that had a now deleted file in a now deleted directory, CVS needs to know where to put it. The empty directory is where it will put that. Because of this, CVS can think of empty directories as deleted, which is what you want to do.

4.5. Spaces in filenames

Generally speaking, CVS does not like to deal with spaces in filenames. I have seen tricks to make CVS deal with this, but I prefer to not have spaces in filenames anyway. For some people, this may present an issue.

5. Commands

To perform a CVS operation, you run cvs and specify one of the commands below. Each command has a help screen, which you can get with this syntax:

cvs command --help

Below are the major CVS commands, you can see --help-commands for a complete listing. All of these commands assume you have a working CVS repository (this is covered in the second part of this presentation).

5.1. import

The import command is for creating new CVS-controlled projects. You run this command from the directory you want in CVS. For software development projects, this is usually your source directory. To import the current directory, use this command:

cvs import -d project-namevendor-tagrelease-tag

The -d flag tells CVS to use the file's mtime as the import time. This way you can import projects that haven't been under CVS control for years, the timestamps are preserved this way. The project name is what you want to call the CVS module, the vendor-tag is mostly usually, and the release-tag is just a symbolic name to represent the import. For vendor tag I use BURDELL. For the release tag on import operations, I use 'start'.

5.2. checkout

The checkout command is what most people are probably familiar with. The simplest syntax is:

cvs checkout project module

Which checks out the latest revision of the specified module. Adding the -r [rev] switch will check out a specific revision.

5.3. export

This commands works like the checkout command, but it does not "check out" a copy to work on. That is, the CVS server does not know you are working on that copy. This command mainly exists for creating source archives for distributions. Once you tag the release, you export it to another directory and it's free of the CVS repository and does not have those 'CVS' subdirectories all throughout the tree.

5.4. add

The add command is used to add files to a module you have checked out. You must specify each file to add:

cvs add files...

If you want to add a directory, use the add command on the directory, but then change in to the directory and add each file. The add command has a special flag for adding binary files, the -kb switch. Use this switch when you are adding any file that is not plain text.

5.5. remove

To remove a file from a module (or project), you must use the cvs remove command. The biggest problem people have with this is that you must specify each file separately. It's really not that big of an issue once an entire project is under CVS control, you'll find that you rarely remove large sets of files. If you need to remove an entire directory tree that's under CVS control, use the -R switch to recursively remove the directory. The syntax:

cvs remove [-R] [-f] files...

This CVS command has one major annoyance, you cannot run 'cvs remove' on a file until you actually rm it from the filesystem. To get around this default behavior, use the -f switch on this command to tell CVS to rm the file before removing it from the project. Very useful.

5.6. update

You will use this command as much as the commit command. The CVS update command brings your local copy of the project up to date with all the changes available in the repository since you last updated. This command merges differences and also lets you know of merge conflicts. The syntax:

cvs update

Run this command from the main project directory and CVS will check the repository and pull down and merge all the changes. The output from this command can be a bit cryptic. The program displays a letter indicating the operation, followed by the file involved with that operation. Here are the letter codes you will most likely see:

U file updated
A new file added
P file patched (like U, but not the entire file)
R file removed
M changes merged
C MERGE CONFLICT
? cvs hasn't got a clue

The two options I use with cvs update are -d and -P. These two options bring down all directories in the module and then "prune" empty ones, which we assume are deleted directories.

If you get merge conflicts (you will), cvs does this really nice thing by default where it stomps all over your copy of the file in conflict. It is up to you to then move the file out of the way, get the copy from the repository, diff the files, and merge the changes by hand. If you are predicting merge conflicts, it's a good idea to use the common CVS option -n, which reports what would be done to your copy without actually doing it. So running this command:

cvs -n update 2 <&1 | grep "^C "

Will report what files have merge conflicts without stomping all over them.

5.7. tag

This command is used for software development projects under CVS control. If you're tracking your home directory with CVS, you probably won't make releases of it at various points.

The tag commands marks the state of the repository with a symbolic name. Once you tag the module, you can later checkout a specific tag by name.

The tag command places the symbolic tag on your checked out copy of the project, the rtag command puts the tag in the repository and does not affect your copy.

When working with branches, you make use of the tag command. You can create and merge branches with the tag command.

5.8. log

When you make commits, CVS will prompt you for a log entry. Using this wisely will produce a log for the project that can be referred to later. Many people skip this step. It really doesn't matter, but I like logs, it makes tracking down changes easy. With a CVS log, you get the file listing, the revision numbers, and the annotation so you can quickly get back to working copies of files.

5.9. diff

The diff and rdiff commands can display the changes between revisions of files in either unified or context format. If you need to manually resolve a merge conflict, generate a patch to a tagged release based on the current developmental copy, or to see what you changed when you broke something, the diff command is what you want to use. The syntax:

cvs diff -r rev -r revfiles...

The diff command has a lot of options, most of which fall through to the diff command. You can diff specific revisions, files from specific dates, and you can process directories recursively to generate large patches.

5.10. Other Commands

The CVS commands that begin with an 'r' are like the similar command above, but they operate on the repository instead of the checked out copy.

You may also be familiar with the 'login' and 'logout' commands. These are generally used with the CVS pserver mechanism and not when a repository serves via ssh.

5.11. Options

There are some common CVS options, such as -z (for compression) that apply to all CVS commands. You can use the --help-options switch to see a list of those.

On the subject of command options, the location on the command line where you put options does matter. There are generic CVS options and command specific options. You need to follow this order when using options:

cvs [common opts] [cvs command] [command opts] files or something

You can alias the CVS commands to the command plus the common options you prefer. This is done in the ~/.cvsrc file.

5.12. RCS variables

CVS uses the RCS file format to track changes. Because of this, you can make use of special RCS variables within your plain text files. A common one is $Id$, which expands to a description containing the RCS file name, a timestamp, and the revision number. Another RCS variable is $Log$ which expands to the commit log for the file. This can get really long for files that change often (.c files for development projects), but for files that rarely change, it can provide a quick way to look at the log. A list of the common RCS variables:

$ Id$ Identification string
$ Log$ Commit log
$ Revision$ Revision number

This also brings up a good point about CVS. The RCS man pages apply to CVS, mostly. Specifically the co(1), ci(1), rcsintro(1), and rcs(1) man pages.

6. Using CVS to Synchronize Your Home Directory

In this example, I'll explain how I set up CVS on my own machines to synchronize my home directory. This is a problem that I'm sure everyone has encountered at least once.

6.1. The Problem

When you get a new user account on a system, you are given a place for your files, your home directory. Over time you get more and more shell accounts, sometimes even on your own machines and you begin to lose track of where files are and you begin to have trouble maintaining the environment profiles between them all. This is a problem I've fought with for a long time, until I read the article by Joey Hess in the September 2002 Linux Journal. Joey explained how you can put CVS to work synchronizing your home directory. It never occurred to me to try this, but I decided to give it a shot. It's been working great between my machines. Below is a short description of what I did to move my life in to CVS.

6.2. Layout your new home directory

Projects under CVS require some thought. Not being able to freely remove directories and having files always exist means you can't just throw things anywhere. Well, I guess you could do that, but it would make for a CVS managed mess. So, I created a new directory as the working tree for what would become my new home directory. Inside this directory I have these subdirectories:

GNUstep WindowMaker profile and other GNUstep stuff
Mail All of my email (now 118MB)
bin Scripts and programs I've written for myself
doc Like 'My Documents' on Windows
etc Location of configuration/support files for my 'bin' stuff
gt All of my Georgia Tech class stuff (now 425MB)
media Movies, pictures, and random audio files
src Programs I'm working on
tmp Scratch space

That's it. I keep classwork under the gt subdirectory, general 'work' goes under doc in an appropriate subdirectory, and so forth. Keeping to this structure will ensure your cvs attic doesn't grow enormous.

6.3. Import

I start by importing an empty directory structure. I will be holding plain text as well as binary files in my home directory, so I need to specify the -kb switch on some of them.

cvs import -d david-homedir BURDELL start

Once the import is complete, I remove the directory I just imported and check it out from CVS.

6.4. Add files

With my empty directory, I start adding in files one at a time. I did several subdirectories at once and then I'd commit the changes. This process took a while, but I only have to do it once.

Remember to use the -kb flag for binary files. Use cvs commit to place the new files in the repository.

6.5. Update systems

Since my home directory is huge, I can quickly lose track of what I've added to CVS and what I haven't. I use the cvs update command and look at the lines beginning with "?" and then go and handle those files. I pretty much used the update command as my checklist for what I still needed to merge in.

6.6. Take your home directory to CVS

With everything in CVS, you can now take your home directory to CVS. It's somewhat tricky, but here's what I did.

$ cvs commit                    # final commit
$ cd                            # change to my home directory
$ cd ..                         # go up one level (/usr/home on my system)
$ rm -rf ~/*                    # remove everything in my home directory
$ cvs co -d david david-homedir # check out my home directory

Now, I went to another terminal and tried logging in. Once I verified everything was working, I logged out of the shell I did the checkout from and started using normal shells.

6.7. Make CVS a habit

Living in CVS isn't hard, it just requires a few extra commands on top of the normal commands you type.

At the end of each day, you should do a cvs commit to commit your work for the day to CVS. Each time you start working on something new, start in a logical place and do cvs add on those files and directories. After about a week, the CVS commands become second nature.

7. Conclusion

Is CVS the best tool for home directory synchronization? Probably not, but for me it works fine. The advantages I get are:

  • Distributed backups
  • Home directory synchronization
  • History

Based on those advantages alone, I think CVS is the right tool for me. It has some shortcomings, but I think the advantages above are worth it. I used to use NIS and NFS for account and home directory management for my systems, but that requires access to the NIS and NFS server all the time. This doesn't work well for laptops. After that, I tried hacking something together with rsync and ssh, but there was no easy way to keep track of which machine was the "master" copy of my home directory...rsync doesn't merge differences. And now I'm using CVS.

8. Resources

This article has external documents! Click here.