28 September 2007

DPM Dies

For those who don't read the ScotGRID blog:


Filesystems turn readonly

Some of the RAID'ed filesystems at Edinburgh recently decided that they would become readonly. This caused the dCache pool processes that depended on them to fall over (Repository got lost). Some of the affected filesystems were completely full, but not all of them. An unmount/mount cycle seems to have fixed things. Anyone seen this sort of thing before?

22 September 2007

SRM2.2 spaces at Tier-2s

One thing that is really worrying me is regarding the current deployment plan for SRM2.2. The idea is to get this stuff rolled out into production at all Tier-2s by the end of January. This is a difficult task when you think of the number of Tier-2s, all with their different configurations and experiments that need supporting. Oh, and it would also be good if we actually knew what the experiments want from SRM2.2 at Tier-2s. There needs to be a good bit more dialogue between them and the GSSD group to find out what spaces should be setup and how the disk should be separated. Or maybe they just don't care and want all the disk in a large block with a single space reservation made against it. One way or the other, it would be good to know.

dCache SRM2.2

There was a dedicated dCache session during CHEP for discussion between site admins and the developers in order to discover the latest developments and get help in the server configuration, which is *difficult*. Link groups and space reservations were flying about all over the place. More documentation is required, but this seems to be difficult when things are changing so fast (new options magically appear, or don't appear, in the dCacheSetup file). A training workshop would also be useful...

Shameless self promotion

OK, so obviously I've got nothing to do tonight other than talk about my contributions to CHEP. The presentation was on the possibility of using DPM in the distributed Tier-2 environment that we have within GridPP. We (Graeme Stewart and myself) used a custom RFIO client running on multiple nodes of the Glasgow CPU farm to read data from a DPM that was sitting in Edinburgh. It was surprisingly easy to do actually. You can find the slides in Indico.
Futuer investigations will use a dedicated, low-latency ligthpath rather than the production network.

We also had a poster at CHEP which was looking at the scalability of DPM when using RFIO access across the LAN. It used an identical method to the WAN paper, but in this case we were interested in really stressing the system and seeing how DPM scales are you add in more hardware. Summary: it performs very well and can easily support at least 100TB of disk. Check out Indico for details.

DPM browser

Something I picked up from a DPM poster was the imminent release of an http/https browser for the DPM namespace. There weren't many details and the poster isn't online, but I think it claims to be using apache.

Full chain testing

The experiments want to test out the complete data management chain, from experiment pit->Tier-0->Tier-1->Tier-2. This exercise has been given the snappy title of Common Computing Readiness Challenge (CCRC) and is expected to run in a couple of phases in 2008: February, then May. This will be quite a big deal for everyone, and we need to make sure the storage is ready to cope with the demands. SRM2.2 is coming "real soon now" and should be fully deployed by the time of these tests (well it *had* better be...), which will make it the first large scale test of the new interface.

Storage is BIG!

It quickly became clear during CHEP that sites are rapidly scaling up their disk and tape systems in order to be ready for LHC turn on next year (first data in October...maybe). For instance, CNAF will soon have 3PB of disk, 1.5PB of tape along with 7.5MSpecInt. That is basically an order of magnitude larger than what they have currently. It was the same story elsewhere.

Kors was right, storage is hot. Fingers crossed that the storage middleware scales up to these levels.

dCache update

I was speaking to Tigran from dCache during CHEP and got some new information about dCache and Chimera.

First off, ACLs are coming, but these are not tied to Chimera. They are implementing NFS4 ACLs, which are then mapped to POSIX, which (according to Tigran) makes them more like NT ACLs. Need to look into this further.

Secondly, the dCache guys are really pushing the NFS v4.1 definition as they see it as the answer to their local data access problems. 4.1 clients are being implemented in both Linux and Solaris (no more need to dcap libraries!). According to Tigran, NFS4.1 uses transactional operations. The spec doesn't detail the methods and return codes exactly. Rather, it defines a set of operations that can be combined into a larger operation. This sounds quite poweful, but how will the extra complexity lead to client-server interoperation?

Finally, one thing which I had realised about Chimera is that it allows you to modify the filesystem without actually mounting it. There is an API which can be used.

17 September 2007

"Storage is HOT!"*

* Kors Bos, CHEP'07 summary talk, slide 22.

Now that CHEP is finished (and I'm back from my holiday!) I thought it would be good to use the blog to reflect on what had been said, particularly when it comes to storage. In fact, between the WLCG workshop and CHEP itself, it is clear that storage is the hot topic on the Grid. This was made quite explicit by Kors during his summary talk. Over the next few days I'll post some tidbits that I picked up (once I've finished my conference papers, of course ;).

14 September 2007

Improved DPM GIP Plugin Now In Production

The improved DPM plugin, where the SQL query works far better for large DPMs and a number of minor bugs have been fixed, is now in production (should make the next gLite release).

See https://savannah.cern.ch/patch/?1254 for details.

Possibly this will be my last hurrah! with this fine script, before passing off to Greig to do the Glue 1.3 version, now with added space tokens....