22 December 2009

T2 storage Ready for ATLAS Data Taking.. Or are we??

Been a busy couple of Months really; what with helping the Tier2 sites to prepare their storage for data taking.... Good news is the sites have done really well.
Of the three largest LHC VOs, most work has been done with ATLAS; (since they have the hungriest need for space and complexity for site administration of Tier2 space.)

All sites now have the space tokens for atlas that they require.

The ATLAS people have also been ready to see what space is available to them adjust there usage to this.

Almost all sites had either their SE/SRMs in the process of upgrade/decommissioning ready for data taking in '09 and all should be ready for '10.
Sites were very good at making changes needed by the ATLAS changing needs of space token distribution.
Sites have also been really good in working with ATLAS via atlas "hammercloud" tests to improve their storage.
Some issues still remain (Draining on DPM, limiting gridFTP connections etc, lost disk server process, data management by the VOs etc) but these challenges/opportunities will make our lives "interesting" over the coming months..

So that covers some of the known knowns.

The known unknowns ( how user analysis of real data affects on T2 storage) are also going to come about over the next few months, but I feel both the GRIDPP-Storage team, the atlas-uk support team and the site admins are ready to face what the LHC community throw at us.

Unknown unknowns; we will deal with then when they come at us....

09 December 2009

When its a pain to drain

Some experiences rejigging filessytems at ECDF today. Not sure I am recomending this approach but some of it may be useful as a dpm-drain alternative in certain circumstances.

Problem was that some data had been copied in with a limited lifetime but was in fact not OK to delete. Using dpm-drain would delete those so instead I marked the filesystem RDONLY and then did:

dpm-list-disk --server=pool1.glite.ecdf.ed.ac.uk --fs=/gridstorage010 > Stor10Files

I edited this file to replace Replica: with dpm-replicate (and delete the number at the end). (Warning: If these files are in a spacetoken you should also specify the spacetoken in this command)

Unfortunately I had to abort this part way through which left me in a bit of a pickle not knowing what files had been duplicated and could be deleted.
While you could probably figure out a way of doing this using dpm-disk-to-dpns and dpm-dpns-to-disk I instead opted for the database query

select GROUP_CONCAT(cns_db.Cns_file_replica.sfn), cns_db.Cns_file_replica.setname, count(*) from cns_db.Cns_file_replica where cns_db.Cns_file_replica LIKE '%gridstorage%' group by cns_db.Cns_file_replica.fileid INTO outfile '/tmp/Stor10Query2.txt ';

This gave me list of physical file names and the number of copies (and the spacetoken) which I could grep for a list of those with more than one copy.
grep "," /tmp/Stor10Query2.txt | cut -d ',' -f 1 > filestodelete

I could then edit this filestodelete to add dpm-delreplica to each line and sourced it to delete the files. I also made a new list of files to replicate in the same way as above. Finally I repeated the query to check all the files had 2 replicas before deleting all the originals.

Obviously this is a bit of a palava and not the ideal approach for many reasons including there is no check that the replicas are identical and the replicas made are still volatile so I'll probably just encounter the same problem again down the line. But if you really can't use dpm-drain for some reason - there is at least an alternative.

24 November 2009

Storage workshop discussion

If you have followed the weeklies, you will have noticed we're discussing having another storage workshop. The previous one was thought extremely useful, and we want to create a forum for storage admins to come together and share their experiences with Real Data(tm).
Interestingly, we now have (or are close to getting!) more experience with tech previously not used by us. For example, does it improve performance having your DPM db on SSD? Is Hadoop a good option for making use of storage space on WNs?
We already have a rough agenda. There should be lots of sysadmin-friendly coffee-aided pow-wows. Maybe also some projectplanny stuff, like the implications for us of the end of EGEE, the NGI, GridPP4, and suchlike.
Tentatively, think Edinburgh in February.

23 November 2009

100% uptime for DPM

(and anything else with a MySQL backend).

This weekend, with the ramp up of jobs through the Grid as a result of some minor events happening in Geneva, we were informed of a narrow period during which jobs failed accessing Glasgow's DPM.

There were no problems with the DPM, and it was working according to spec. However, the period was correlated with the 15 minutes or so that the MySQL backend takes to dump a copy of itself as backup, every night.

So, in the interests of improving uptime for DPMs to >99%, we enabled binary logging on the MySQL backend (and advise that other DPM sites do so as well, disk space permitting).

Binary logging (which is enabled by adding the string "log-bin" on it's own line to /etc/my.cnf, and restarting the service) enables (amongst other things, including "proper" uptothesecond backups) a MySQL-hosted InnoDB database to be dumped without interrupting service at all, thus removing any short period of dropped communication.

(Now any downtime is purely your fault, not MySQL's.)

12 November 2009

Nearly there

The new CASTOR information provider is nearly back, the host is finally back up, but given that it's somewhat late in the day we better not switch the information system back till tomorrow. (We are currently running CIP 1.X, without nearline accounting.)

Meanwhile we will of course work on a resilienter infrastructure. We also did that before, it's just that the machine died before we could complete the resilientification.

We do apologise for the inconvenience caused by this incredibly exploding information provider host. I don't know exactly what happened to it, but given that it took a skilled admin nearly three days to get it back, it must have toasted itself fairly thoroughly.

While we're on the subject, a new release is under way for the other CASTOR sites - the current one has a few RAL-isms inside, to get it out before the deadline.

When this is done, work can start on GLUE 2.0. Hey ho.

10 November 2009


Well it seems we lost the new CASTOR information provider (CIP) this morning and the BDII was reset to the old one - the physical host it lived on (the new one) decided to kick the bucket. One of the consequences is that nearline accounting is lost, all nearline numbers are now zero (obviously not 44444 or 99999, that would be silly...:-)).
Before you ask, the new CIP doesn't run on the old host because it was compiled for 64 bit on SLC5, and the old host is 32 bit SL4.
We're still working on getting it back, but are currently short of machines that can run it, even virtual ones. If you have any particular problems, do get in touch with the helpdesk and we'll see what we can do.

30 September 2009

CIP update update

We are OK: problems in deployment that had not been caught in testing appear to be due to different versions of lcg-utils (used for all the tests) behaving subtly differently. So I could run tests as dteam prior to release and they'd work, but the very same tests would fail on the NGS CE after release, even if they'd also run as dteam. Those were finally fixed this morning.

29 September 2009

CIP deployment

As some of you may have noticed, the new CASTOR information provider (version 2.0.3) went live as of 13.00 or thereabouts today.

This one is smarter than the previous one: it automatically picks up certain relevant changes to CASTOR. It has nearline (tape) accounting as requested by CMS. It is more resilient against internal errors. It is easier to configure. It also has an experimental bugfix for the ILC bug (it works for me on dteam). It has improved compliance with WLCG Installed Capacity (up to a point, it is still not fully compliant.)

Apart from a few initial wobbles and adjustments which were fixed fairly quickly (but still needed to filter through the system), real VOs should be working.

ops was trickier, because they have access to everything in hairy ways, so we were coming up red on the SAM tests for a while. This appears to be sorted out for the SE tests, but still causes the CE tests to fail. Which is odd, because the failing CE tests consist of jobs that run the same data tests as the SE tests, which work. I talked to Stephen Burke who suggested a workaround which is now filtering through the information system.

We're leaving it at-risk till tomorrow - and the services are working. On the whole, apart from the ops tests with lcg-utils, I think it went rather well: the CIP is up against two extremely complex software infrastructures, CASTOR on one side, and the grid data management on the other, and the CIP itself has a complex task trying to manage all this information.

Any Qs, let me know.

28 September 2009

Replicating like HOT cakes

As mentioned on the storage list, the newest versions of the GridPP DPM Tools (documented at http://www.gridpp.ac.uk/wiki/DPM-admin-tools) contain a tool to replicate files within a spacetoken (such as the ATLASHOTDISK).

At Edinburgh this is running in cron

0 1 * * * root /opt/lcg/bin/dpm-sql-spacetoken-replicate-hotfiles --st ATLASHOTDISK >> /var/log/dpmrephotfiles.log 2>&1

Some issues observed are :
* Takes quite a long time to run the first time. Because of all the dpm-replicate calls on the ~1000 files that ATLAS stuck in there it took around 4 hours just for 1 extra copy. Since then though only the odd file has come in - so it doesn't have much to do.
* The replicas are always on different filesystems - but not always different disk server. This obviously depends on how many servers you have for that pool (compared to the nreps you want), as well as how many filesystems on each server. The replica creation could be more directed but perhaps it should be the default behaviour of the built in command to use a different server if it can.

Intended future enhancements of this tool include:
* List in a clear way the physical duplicates in the ST.
* Remove excess duplicates.
* Automatic replications of a list of "hotfiles"

Other suggestions welcome.

20 August 2009

GridPP DPM toolkit v2.5.2 released

Hello all,

Another month, another toolkit release.
This one, relative to the last announced release (2.5.0) has a slightly improved functionality for dpm-sql-list-hotfiles and adds a -o (or --ordered) option to dpm-list-disk.
The -o option returns a sorted list of the files in the space selected, descending by filesize. As this uses the dpm API, the process currently needs to pull the entire filelist before sorting it, so, unlike the normal mode, you get all the files output in one go (after a pause of some minutes while all the records are acquired + sorted).

There's also a new release of the Gridpp-DPM-monitor package, which includes some bug fixes and the new user-level accounting plot functionality. This should work fine, but if anyone has any problems, contact me as normal.

All rpms at the usual place:

31 July 2009

GridPP DPM toolkit v2.5.0 released

Hello everyone,

I've just released version 2.5.0 of the GridPP DPM toolkit.
The main feature of this release is the addition of the tool
which should be called as
dpm-sql-list-hotfiles --days N --num M
to return the top M "most popular" files over the past N days.
Caveat: the query involved in calculating the file temperature is a little more intensive than the average queries implemented in the tool kit. You may see a small load spike in your DPM when this executes, so don't run it a lot in a short space of time if your DPM is doing something important.

As always, downloads are possible from:

15 July 2009

Storage workshop and STEP report

The storage workshop writeup is available! Apart from notes from the workshop, it also contains Brian's storage STEP report. Follow the link and read all about it. Comments are welcome.

03 July 2009

GridPP Storage workshop

The GridPP storage workshop was a success. Techies from the Tier 2s got together to show and tell, to discuss issues, and to hear about new stuff. We also had speakers from Tier 1 talking about IPMI and verifying disk arrays, and from the National Grid Service talking about procurement and SANs.
We talked about STEP (storage) experiences, and had a dense programme with a mix of content for both newbies and oldbies, and we gazed into the crystal ball to see what's coming next.
All presentations will be available on the web on the hepsysman website shortly (if they aren't already). There will also be a writeup of the workshop.

24 June 2009

GridPP DPM toolkit v2.3.9 released

Ladies and gentlemen,

I am proud to announce the release of the gridpp-dpm-tools package,
version 2.3.9.
It is available at the usual place:

This version is an extra-special "GridPP Milestone" release, in that
it includes a mechanism for writing user-level storage consumption
data to a database, so that you can make nice graphs of them later.
(The graphmaking functionality exists as scripts now, and will be
released as a modification to the dpm monitoring rpm as soon as I make
the necessary changes + package it.)
You can enable user-level accounting on your DPM by following the
instructions just newly added to the GridPP wiki for the toolkit. (
http://www.gridpp.ac.uk/wiki/DPM-admin-tools#Installation )

Comments, bug reports, &c all (happily?) accepted. In particular, if
the user-level accounting doesn't work for you, I'd like to know about
it, since it's been happy here at Glasgow for the last couple of days.

23 June 2009

Workshop agenda now available

As the title indicates, the agenda for the GridPP storage workshop is now uploaded to the web site, along with the agenda for hepsysman.

15 June 2009

Storage workshop planning

With now less than 0.05 years to go before the storage workshop, the agenda and other planning is continuing apace.
We would like to ask sites to give site reports (currently 20 mins each, incl. Qs) about their (own) storage infrastructure: we'd like to hear about their storage setup (as opposed to computing and other irrelevant stuff :-P) as well as their STEP experiences. This is partly so we can discuss the implications, but also for the benefit of folks who will be attending the storage workshop only. We will get feedback from ATLAS on STEP, ie the users' perspective.
Brian will present our experiences with BeStMan and Hadoop; there will be an introduction to the storage schema, to the DPM toolkit, to SRM testing, to user level accounting, and from the Tier 1 a talk on disk arrays scheduling, and hopefully room for discussion. So lots of things to look forward to!

08 June 2009

GridPP DPM toolkit v2.3.6 released

This is a bug-fix release for 2.3.5, which had some annoying whitespace inconsistencies introduced into the dpm-sql* functions. Thanks to Stephen Childs for noticing them.

(The direct link for download is:
and the documentation is still at
and has been slightly updated for this release.)

02 June 2009

SRM protocol status

The Grid Storage Management working group (GSM-WG) in OGF exists to standardise the SRM protocol. Why standardise? We need this to ensure the process stays open, and can benefit other communities than WLCG.
The SRM document is GFD.129, and we now have an experiences document available for public comments. You are invited to read this document and submit your comments - thanks!
You can even do so anonymously!

01 June 2009

Summary of DESY workshop

Getting the SRM implementers back together was very useful, and long overdue. We agreed of course to not change anything :-)
  • We needed to review how WLCG clients make use of the protocol; there are cases where they do not make the most efficient use of the protocol, thus causing a high load on the server. Is the estimated wait time used properly?
  • Differences between implementations may need documenting, e.g. whether an implementation supports "hard" pinning.
  • We reviewed the implementations' support for areas of the protocol, whether it was fully or partially supported (or not at all), to find a "core" which MUST be universally supported, and whether the implementers thought the feature desirable, given their specialist knowledge of the underlying storage system.
  • Security and the use of proxies were discussed.
There was one person who was involved with SNIA, and users from WLCG.
This is the summary, for the full report attend the next GridPP storage meeting.

28 April 2009

GridPP DPM toolkit v2.3.5 released

The inaugural release of the DPM toolkit under my aegis has just happened. This release contains some bug fixes (I've attempted to improve the intelligence of the SQL-based tools when trying to acquire the right username/password from configuration tools), and is deliberately missing dpm-listspaces as this is now provided by DPM itself.

This is also my first try building RPMs in OSX, so can people tell me if this is horribly broken for them? :)

(The direct link for download is:
and the documentation is still at
and has been slightly updated for this release.)

27 April 2009

Storage calendar

It's that time of year and I'm writing reports again. It shows that Greig has left, the number of blog entries has dramatically since the end of March... yes, I am still trying to persuade the rest of the group to blog about the storage work they're doing. Just because it's quiet doesn't mean they're not beavering away.

At the last storage meeting we had a look at the coming storage meetings - not our own but the ones outside GridPP. There were storage talks at ISGC and CHEP, we looked at some of those. The next pre-GDB or GDB is supposed to be about storage although the agenda was a bit bare last I looked. There will be a workshop at DESY focusing on WLCG's usage of SRM, with the developers from both sides, so to speak. Preparations are ongoing for the next OGF - mainly documents that need writing, we still need an "experiences" document describing interoperation issues at the API level. There's a hepix coming up (agenda), in Sweden - usually we have an interest in the filesystem part as well as the site management. Then there is a storage meeting 2-3 July at RAL, following hepsysman on 0-1 July.

26 March 2009

More on CHEP

Right I meant to write more about stuff that's going on here but the network is somewhat unreliable (authentication times out and reauthentication is not always possible). Anyway, I am making copious notes and will be making a full report at the next storage meeting - Wednesday 8 April.

If I shall summarise the workshop, from a high level data storage/mgmt perspective, I'd say it's about stability, scaling/performance, data access (specifically xrootd and http), long term support, catalogue synchronisation, interoperation, information systems, authorisation and ACLs, testing, configuration, complexity vs capabilities.

More details in the next meeting(s).

22 March 2009

WLCG workshop part I

Lots of presentations and talks at the WLCG workshop. As usual much of the work is done in the coffee breaks.
From the storage perspective, there was talk about "SRM changes" which was news to me (experiments require (a) stability, and (b) change, you see). Upon closer investigation, it turns out to be about implementing the rest of the SRM MoU. One outstanding question is how these changes are implemented without impacting users (in a bad way).
Fair bit of talk about xrootd support. xrootd is considered a Good Thing(tm), but the DPM implementation is rather old (2 years). It is possible it can benefit from the new CASTOR implementation for 2.1.8.
Some talk about SRM performance. The dCache folks as usual have good suggestions, Gerd from NDGF suggests using SSL instead of GSI. I believe srmPrepareToGet should be synchronous when files are on disk, this should lead to a large performance increase. Talking to other data management people, we believe the clients should do the Right Thing(tm), so no changes required. Of course the server should be free to treat any request asynchronously if it feels it needs to do this, eg to manage load.
Talked to Brian Bockelman from U Nebraska; they have good experiences with (recent versions of) Hadoop, using BeStMan as the SRM interface.
More later...

08 March 2009

GridPP DPM toolkit v2.3.0 released

I've added a new command line tool to the DPM toolkit: dpm-delreplica. This just just a wrapper round the dpm_delreplica call in the python API and does exactly what it says on the tin. It arose after the guys at Oxford noted that there wasn't an easy way using existing tools to delete a replica of a file - it was either all or nothing.

One thing to note is that the tool will let you delete the last replica of a file, which then leaves a dangling entry in the DPM namespace that you can successfully do e.g. dpns-ls on, but cannot actually retrieve. As with all of these tools, I try to make each one as simple and self contained as possible (the Unix way) so I've not added any special checking to make sure that a replica isn't the last one. You have been warned.

The tool has been tested in a couple of places and seems to work fine. As always, feedback is welcome.


13 February 2009

Gie's a job!

[Or for those who aren't Scottish: "Give me a job!" ;) ]

I know a few people read this blog so it seems like a good place for some advertising...

A new post-doctoral research associate position has opened up within the particle physics group at Edinburgh University to work on distributed storage management for the GridPP project. This has come about as I am leaving GridPP to move onto other things (physics analysis for LHCb, if you must know) so the project needs a replacement. It will be an exciting time for whomever gets the job since this year we will actually start to see data from the LHC experiments (fingers crossed)! Plus, Edinburgh is a great place to live and work.

All of the details about the position and the online application form can be found here. If you would like any more details please do get in touch.

In addition, the particle physics group in Edinburgh is advertising another job titled "Scientific Programmer". This system administrator position had two main responsibilities. First is the organisation and support of the groups computing needs and secondly to assist in the day-to-day operations of the Edinburgh Tier-2 grid services. All details can be found here. Again, get in touch if you have questions.


Update: You can get a full listing of the jobs available within the particle physics group at Edinburgh here:


There's even an advanced fellowship position available if anyone is interested in doing some physics!

29 January 2009

GridPP DPM toolkit v2.2.0 released

Hot on the heels of v2.1.0 comes v2.2.0. This one contains a couple of new tools that have been created to allow sites to have a greater understanding of what is happening with their space tokens. These are:

* dpm-sql-spacetoken-usage

This displays information like:

* dpm-sql-spacetoken-list-files

Unfortunately, I have had to use some SQL to directly query the DB as the API doesn't support this functionality. I'm hoping that the small DB schema change in v1.7.0 of DPM doesn't break these tools too much... These tools were born out of some discussion which has taken place over the past couple of days on our gridpp-storage mailing list (anyone can join!). Thanks to those who tested out the initial releases of the tools.

I have also made another change to dpns-su. I have added another new switch (-s, --summary) for dpns-du which will present a summary of the total size under a target directory rather than the default behaviour which displays the summary for every sub-dir under the target.

You can get it from the usual place (although it will take a day for te yum repodata to update). Again, the release notes on the wiki will be updated at some stage...

23 January 2009

GridPP DPM toolkit v2.1.0 released

I've built a new release of the DPM admin toolkit. This one contains a couple of new tools that have been created by Sam Skipsey. They both present the user with a breakdown of the storage used in the DPM per user/group. One tool uses the DPM python API to do this (and is correspondingly slow) while the other directly talks to the DPM database using the python SQL module. Fingers crossed that this should present the same numbers as are calculated by the GridppDpmMonitor.

There is also a new switch for dpns-du which stops directories of zero size being printed to stdout. Yes Winnie, this one's for you.

You can get it from the usual place. Some of the release notes have to be updated for the new tools as I haven't got round to that yet...

05 January 2009

The evolution of storage in 2008

I've been running my WLCG storage version monitoring system for 1 year so I thought now would be a prudent time to have a quick review of the changes in the storage infrastructure over the past year. The above image shows the count of each different version of the SRM2.2 storage middleware that is deployed on the Grid each day. Over the course of the year the number of deployed SRM2.2 endpoints increased steadily from ~100 to >250.

The pie charts below show the breakdown (as of today) for the different versions of DPM, dCache and CASTOR that are running on the Grid. There are also 20 instances of StoRM out there, but StoRM does not appear to return versioning information from an srmPing operation so it's not possible to tell what version is deployed.

DPM clearly dominates in terms of number of running instances. Hopefully CERN doesn't do something crazy like drop support for it! It's interesting to see that there are still many old versions of the software running at sites. Perhaps this is an indication of the success of SRM in that all of these different implementations are still talking to each other.