23 May 2012

Some notes from CHEP - day 2

So, I'm sure that when Wahid writes his blog entries from CHEP, you'll hear about lots of other interesting talks, so as before I'm just going to cover the pieces I found interesting.

The plenaries today started with a review of the analysis techniques employed by the various experiments by Markus Klute, emphasising the large data volumes required for good statistical analyses. More interesting perhaps for its comparison was Ian Fisk's talk covering the new computing models in development, in the context of LHCONE. Ian's counter to the "why can Netflix do data streaming when we can't was:

(that is, the major difference between us and everyone with a CDN is that we have 3 orders of magnitude more data in a single replica - it's much more expensive to replicate 20PB across 100 locations than 12TB!).
The very best talk of the day was Oxana Smirnova's talk on the future role of the grid. Oxana expressed the most important (and most ignored within WLCG) lesson: if you make a system that is designed for a clique, then only that clique will care. In the context of the EGI/WLCG grid, this is particularly important due to the historical tendency of developers to invent incompatible "standards" [the countless different transfer protocols for storage, the various job submission languages etc] rather than all working together to support a common one (which may already exist). This is why the now-solid HTTP(s)/WebDAV support in the EMI Data Management tools is so important (and why the developing NFS4.1/pNFS functionality is equally so): no-one outside of HEP cares about xrootd, but everyone can use HTTP. I really do suggest that everyone enable HTTP as a transport mechanism on their DPM or dCache instance if they're up to date (DPM will be moving, in the next few revisions, to using HTTP as the default internal transport in any case).

A lot of the remaining plenary time was spent in talking about how the future will be different to the past (in the context of the changing pressures on technology), but little was new to anyone who follows the industry. One interesting tid-bit from the CloudDera talk was the news that HDFS can now support High Availability via multiple metadata servers, which gives potentially higher performance for metadata operations as well.

Out of the plenary tracks, the most interesting in the session I was in was Jakob Blomer's update on CVMFS. We're now deep in the 2.1.x releases, which have much better locking behaviour on the clients; the big improvements on the CVMFS server are coming in the next minor version, and include the transition from redirfs (unsupported, no kernels above SL5) to aufs (supported in all modern kernels) for the overlay filesystem. This also gives a small performance boost to the publishing process when you push a new release into the repository.

Of the posters, there were several of interest - the UK was, of course, well represented, and Mark's iPv6 poster, Alessandra's Network improvements poster, Chris's Lustre and my CVMFS for local VOs poster all got some attention. In the wider ranging set of posters, the Data Management group at CERN were well represented - the poster on HTTP/WebDav for federations got some attention (it does what xrootd can do, but with an actual protocol that the universe cares about, and the implementation that was worked on for the poster even supports Geographical selection of the closest replica by ip), as did Ricardo's DPM status presentation (which, amongst other things, showcased the new HDFS backend for DMLite). With several hundred posters and only an hour to look at them, it was hard to pick up the other interesting examples quickily, but some titles of interest included the "Data transfer test with a 100Gb network" (spoiler: it works!), and a flotilla of "experiment infrastructure" posters of which the best *title* goes to "No file left behind: monitoring transfer latencies in PhEDEx".

No comments: