30 June 2014

Thank you for making a simple compliance test very happy

Rob and I had a look at the gstat tests for RAL's CASTOR. For a good while now we have had a number of errors/warnings raised. They did not affect production: so what are they?

Each error message has a bit of text associated with it, saying typically "something is incompatible with something else" - like an "access control base rule" (ACBR) is incorrect, or tape published not consistent with type of Storage Element (SE). The ACBR error arises due to legacy attributes being published alongside the modern ones, and the latter complains about CASTOR presenting itself as tape store (via a particular SE)

So what is going on?  Well, the (only) way to find out is to locate the test script and find out what exactly it is querying. In this case, it is a python script running LDAP queries, and luckily it can be found in CERN's source code repositories. (How did we find it in this repository? Why, by using a search engine, of course.)

Ah, splendid, so by checking the Documentation™ (also known as "source code" to some), we discover that it needs all ACBRs to be "correct" (not just one for each area) and the legacy ones need an extra slash on the VO value, and an SE with no tape pools should call itself "disk" even if it sits on a tape store.

So it's essentially test driven development: to make the final warnings go away, we need to read the code that is validating it, to engineer the LDIF to make the validation errors go away.

09 June 2014

How much of a small file problem do we have...An update

So as an update to my previous post "How much of a small file problem do we have..."; I decided to have a look at a single part of the namespace within the storage element at the tier1 rather than a single disk server. (The WLCG VOs know this as a scope or family etc.)
When analysing for ATLAS ( if you remember this was the VO I was personally mostly worried about due to the large number of small files; I achieved the following numbers:

Total number of files          3670322
Total number of log files    109025
Volume of log files             4.254TB
Volume of all files              590.731TB
The log files  represent ~29.7% of the files within the scope, so perhaps the disk server I picked was enriched with log files compared to the average.
What is worrying is that this 30% of files is only reponsible for  0.7% of the disk space used ( 4.254TB out of a total 590.731TB).
The mean filesize of the log files is 3.9MB and the median filesize is 2.3MB. Also the log files size varies from 6kB to 10GB;  so some processes within the VO  do seem to be able to create large log files. If one were to remove the log files from the space; then the files mean size would increase from 161MB to 227MB ;  and the median filesize would increase from 22.87MB to 45.63MB.