18 August 2010

Where Dave Lives and who shares his home....

Title of this blog post is all about whee I am situated within the RALLCG2 site and where my children our as well. I apparently also want to discuss the profile of "files" across a "disk server" as my avatar likes to put it, I prefer to think of this "Storage Element" that he talks of as my home and these "disk servers" as rooms inside my home.

I am made of 1779 files. ( ~3TB if you recall) I am spread across 8/795tapes in the RAL DATATAPE store ( although the pool of tapes for real data is actually only 229 tapes. in total there are currently tapes being used by atlas, so I take up 1/1572 datasets but ~1/130 of the volume (~3TB of the ~380TB) stored on DATATAPE at RAL and correspond to ~1/130 of the files (1779 out of ~230000). In this tape world I am deliberately kept to as small subset of tapes to allow for expedient recall.

However when it comes to being on disk I want to be spread out as much as possible so as not to cause "hot disking" . However, spreading me across many rooms means that if a single room is down, then this increases the chance that I can not be fully examined. In this disk world; of my 3TB is part of the 700TB in ATLASDATADISK at RAL and is 1 in 25k datasets and 1779 files in ~1.5 Million. In this world my average filesize at ~1.7GB per file is a lot larger than the average 450MB filesize of all the other DATADISK files. (Filesize distribution is not linear but that is a discussion for another day.) I am spread across 38 out of 71 roomsa which existed in my space token when I was created. (ther are now an additional 10 rooms and this will continue to increase in the near term.).

Looking at a random DATADISK server for every file:

1in20 datasets represented on this server are log datasets and that 1 in 11 files are log files and corresponds to 1 in 10200GB of the space used in the room.
1in2.7 datasets represented on this server are AOD datasets and that 1 in 5.1 files are AOD files and corresponds to 1 in 8.25GB of the space used in the room.
1in4.5 datasets represented on this server are ESD datasets and that 1 in 3.9 files are ESD files and corresponds to 1 in 2.32GB of the space used in the room.
1in8.3 datasets represented on this server are TAG datasets and that 1 in 8.5 files are TAG files and corresponds to 1 in 3430GB of the space used in the room.
1in47 datasets represented on this server are RAW datasets and that 1 in 17 files are RAW files and corresponds to 1 in 10.8GB of the space used in the room.
1in5.4 datasets represented on this server are DESD datasets and that 1 in 5.1 files are DESD files and corresponds to 1 in3.67 GB of the space used in the room.
1in200 datasets represented on this server are HIST datasets and that 1 in 46 files are HIST files and corresponds to 1 in735 GB of the space used in the room.
1in50 datasets represented on this server are NTUP datasets and that 1 in 16 file are NTUP files and corresponds to 1 in 130GB of the space used in the room.


Similar study has been done for a MCDISK server:
1 in 4.8 datasets represented in this room are log datasets and that 1 in 2.5 files are log files and corresponds to 1 in 18 GBof the space used in the room.
1 in 3.1 datasets represented in this room are AOD datasets and that 1 in 5.7 files are AOD files and corresponds to 1 in 2.1GB of the space used in the room.
1 in 28 datasets represented in this room are ESD datasets and that 1 in 13.6 files are ESD files and corresponds to 1 in 3.2GB of the space used in the room.
1 in 4.3 datasets represented in this room are TAG datasets and that 1 in 14.6 files are TAG files and corresponds to 1 in 2000GB of the space used in the room.
1 in 560 datasets represented in this room are DAOD datasets and that 1 in 49 files are DAOD files and corresponds to 1 in 2200GB of the space used in the room.
1 in 950 datasets represented in this room are DESD datasets and that 1 in 11000 files are DESD files and corresponds to 1 in 600GB of the space used in the room.
1 in 18 datasets represented in this room are HITS datasets and that 1 in 6.3 files are HITS files and corresponds to 1 in 25GB of the space used in the room.
1 in 114 datasets represented in this room are NTUP datasets and that 1 in 71 files are NTUP files and corresponds to 1 in 46GB of the space used in the room.
1 in114 datasets represented in this room are RDO datasets and that 1 in 63 files are RDO files and corresponds to 1 in 11GB of the space used in the room.
1 in 8 datasets represented in this room are EVNT datasets and that 1 in 13 files are EVNT files and corresponds to 1 in 100GB of the space used in the room.

As a sample this MCDISK server represents 1/47 of the space used in MCDISK at RAL and ~ 1/60 of all files in MCDISK. This room was add recently so any disparity might be due this server being filled with newer rather than older files ( which would be a good sign as it shows ATLAS are increasing file size.) Average filesize on this server is 211MB per file. Discounting log files this increases to 330MB per file. ( since log files average size is 29MB)


One area my avatar is interested in is to know that if one of these rooms were lost then how many of the files that were stored in that room could be found in anothe house and how many would be permanently lost.


For the "room" in the DATADISK Space Token, there are no files that are not located in another other house. ( This will not be the case all the time but is a good sign that the ATLAS model of replication is working.)

For the "room" in the MCDISK Space Token the following is the case:
886 out 2800 datasets that are present are not complete elsewhere. Of these 886, 583 are log datasets (consisting of 21632 files.)
Including log datasets there would be potentially 36283 files in 583 of the 886 datasets with a capacity of 640GB of lost data. ( avergae file size is 18MB).
Ignoring log datasets this drops to 14651 files in 303 datasets with a capacity of 2.86 TB of lost data.
The files on this diskserver whcih are elsewhere are form 1914 datasets, consist of 18309 files, and fill a capacity of 8104GB.

No comments: