29 September 2009

CIP deployment

As some of you may have noticed, the new CASTOR information provider (version 2.0.3) went live as of 13.00 or thereabouts today.

This one is smarter than the previous one: it automatically picks up certain relevant changes to CASTOR. It has nearline (tape) accounting as requested by CMS. It is more resilient against internal errors. It is easier to configure. It also has an experimental bugfix for the ILC bug (it works for me on dteam). It has improved compliance with WLCG Installed Capacity (up to a point, it is still not fully compliant.)

Apart from a few initial wobbles and adjustments which were fixed fairly quickly (but still needed to filter through the system), real VOs should be working.

ops was trickier, because they have access to everything in hairy ways, so we were coming up red on the SAM tests for a while. This appears to be sorted out for the SE tests, but still causes the CE tests to fail. Which is odd, because the failing CE tests consist of jobs that run the same data tests as the SE tests, which work. I talked to Stephen Burke who suggested a workaround which is now filtering through the information system.

We're leaving it at-risk till tomorrow - and the services are working. On the whole, apart from the ops tests with lcg-utils, I think it went rather well: the CIP is up against two extremely complex software infrastructures, CASTOR on one side, and the grid data management on the other, and the CIP itself has a complex task trying to manage all this information.

Any Qs, let me know.

No comments: