Monday, 29 November 2010
Sunday, 28 November 2010
There is increasing interest in Distributed Version Control Systems (DVCS), in both the commercial world and in open source software development. Dan suggested that actually a more interesting question to ask is ‘Is the enterprise ready for DVCS’? His view in the talk (my interpretation) is that DVCS is ready for the enterprise, but are there work patterns in the enterprise that can use DVCS?
The initial problem he outlined is that more and more code is being version controlled, with much bigger teams over a wider geographical area in different time zones. Developers can be in the same floor or building, or be in different continents and time zones. Put simply the centralised model of SCC just isn’t working any more.
He provided a brief history of SCC and showed the development of open source tools:
- 1972: SCCS – the first SCC tool (file oriented)
- 1986: RCS – introduced the concept of multiple file access
- 1986: CVS – built on RCS by providing current access for distributed users
- 2001: SubVersion – repository oriented.
There are a number of commercial players who are better able to tackle these including: ClearCase, StarTeam, PerForce, VSS etc. There were some strong views from the floor on the ability of these systems to tackle the problems above, and Dan agreed, but pointed out that is focus was on OS SCC systems. However all systems mentioned so far have a centralised file base.
Dan mentioned some alterative systems which are distributed or peer to peer (p2p) based:
- 1997 – Code Co-op
- 2001 – GNU Arch (now defunct)
- 2003 – PARCS (Haskall based – for reasoning on ChangeSets)
More interestingly OS development is now being driven by two new websites:
- Gibhub [https://github.com/] – based on ‘git’
- BitBucket [http://bitbucket.org/] - based on ‘Mercurial’
In order to illustrate the significance of these developments, Dan contrasted the Centralised model of SCC and compared it with the p2p model (DCVS).
The centralised model consists of a Hub and Spoke. The master copy of the SCC data is held on the Hub, while the spoke represents the local copies held by developers on their computer. The model is very much ‘pull’ rather than ‘push’. Working copies are shared via the Hub (Master). Nothing gets past the master! In the centralised model is easier to control:
- User access: through the http protocol
- Build and Release: single point of access
In terms of data repositories share ‘ChangeSets’ in the p2p model, rather than changes on individual files and directories in those directories as in the centralised model (CVS has a tree, and changes between tree’s are what drives that SCC system). What is significant about this issues is that publishing is decoupled from committing. This raises the question – isn’t this a recipe for chaos?
There are a number of issues to consider here:
- Build must be deterministic and repeatable
- Configuration management audit and traceability: who did what, when?
- Organisational structure of the team: central model allows access to trees, various parts of a team can access their part of the tree, but you can’t do this in the p2p model. How do you cope with team structure in DCVS?
Dan talked about the fundamental difference between the centralised and p2p models in terms of the data they process. Centralised SCC systems (such as CVS) are file oriented, whereas p2p DVCS systems (such as git) are ChangeSet orienated.
In the ChangeSet model, there is no concept of a file. Renames and deletes on files and directories are no longer special. Only changes are recorded – this decoupling is important. The advantage of ChangeSets is that we can use ChangeSet Algebra on ChangeSets to find the differences and act on them e.g. to restore parts of the code previously deleted. With ChangeSets only the changes are stored, which is much more efficient than changes on individual files and directories (in the CVS tree for example), and saves a substantial amount of storage space and is easier to reason with. This then is the big advantage of ChangeSets, and hence the p2p model.
In order to illustrate the advantage of the p2p model and its data storage model, Dan talked about the incremental merge problem. In centralised systems, incremental merges can cause the multiple initiations of delete commands (for example), leading to inconsistencies in the repository from actions taken on them. Various members of the audience asserted that this has not been a problem for propriety systems, but Dan countered that CVS/SubVersion did not deal with the problem, and git/Mercurial are significant advances over the former.
Dan also talked about the difference in release from centralised and p2p systems, the former branch per release, whereas the latter branch per feature. He talked very briefly about migrating to DVCS.
Dan outlines some barriers to adoption:
- git has a significant learning curve
- its easy to overlook the synchronisation issue
- it is much harder to enforce central control.
Update: Audio of the talk is available on the BCS streaming server.
J. Feller & B. Fitzgerald - Understand open source software development, Addison-Wesley, 2002.
Thursday, 25 November 2010
It works by finding an exact copy of an image you specify, either via upload or via a link. It does not find near images, the result is binary - either the image matches or it doesn't. Some small differences are allow for example, searching for Mona Lisa in curlers will bring up the original version of the Mona Lisa. One interesting part of the service is that if you provide a web page, it will load the images, and allow you to click on them to conduct a search. The service is useful for tasks which require known item search - e.g. I've got this image, has someone used this before on the web, I need an original image.
Monday, 22 November 2010
Monday, 15 November 2010
- Browse by Genre, using a tagcloud: examples include electronic and downtempo.
- Search and browse for music which is similar to well know artists: examples are David Bowie and Metallica
You can also create your own widgets of music you are interested in, but I could'nt see how it worked, so here's one of the playlists they supply:
Monday, 8 November 2010
The type of music is called Bluegrass, on which I've posted before. I've just got to repost this: