23 February 2018

Lisa, the new sister to "Dave the dataset" makes her appearane.

Hello  I 'm Lisa, similar to "Dave the dataset" but born in 2017 in the ATLAS experiment . my DNA number is 2.16.2251. My initial size pf my 23 sub sections is 60.8TB 33651 files. My Main physics subsection is 8.73TB (4726 files). I was born on 9 months ago, in that time I have now produced 1281 unique children corresponding to 129.4TB of data in 60904 files. It i snot surprising that I have a large number of children as I am still relatively new and my children have yet been culled.

It is interesting to see for a relatively new dataset, how many copies of myself and my children there are.
There is 46273 files/ 60.248TB with 1 copy, 35807 files/ 62.06TB with 2 copies, 2959 files/ 4.94TB with 3 copies, 9110 files/ 2.16TB with 4 copies, 51 files/ 0.017GB with 5 copies and 80files/ 0.44GB with 6 copies. Only four real scientist have data which doesn't have a second copy


Analyzing  how distributed around the world this data is shows the data is in  100 rooms in total across 67 houses.

Of course more data sets are just about to be created with the imminent restart of the LHC , so we will see how I and new datasets distributions develop.

21 February 2018

Dave's Locations in preparation for 2018 data taking.

My powers that be are just about to create more brethren of me in their big circular tunnel. SO I though I would give an update of my locations.

There are currently 479 rooms over 145 houses used by ATLAS. My data is still 8 years on still in 46 rooms in 24 houses. There are 269 individuals. of which 212 are unique , 56 have a twin in another room and one is a triplet. In total this means 13GB of data has double redundancy,  5.48TB has a single redundancy, and 2.45TB has no redundancy.  Of note is that 5.28TB of the 7.93TB of data with a twin is from the original produced data.

My main concern is not with those "Dirk" or "Gavin" who are sole children, as they can easily be reproduced in the children "production" factories. Of concern are the 53 "Ursulas" with no redundancy. This equates to 159GB of data/ 6671 files of whose lose would effect 17 real scientists.

06 February 2018

ZFS 0.7.6 release

ZFS on Linux 0.7.6 has now landed.

https://github.com/zfsonlinux/zfs/releases/tag/zfs-0.7.6

For everyone running the 0.7.0-0.7.5 builds I would encourage people to look into updating as there are a few performance fixes associated with this build.
Large storage servers tend to have ample hardware, however if you're running this on systems with a small amount of RAM then the fixes may have a dramatic performance improvement.
Anecdotally I've also seen some improvements on a system which hosts a large number of smaller files which could be due to some fixes around the ZFS cache.


What if an update goes wrong?

I'm linking a draft of a flowchart I'm still working on to help debug what to do if a ZFS filesystem has disappeared after rebooting a machine:
https://drive.google.com/file/d/1hqY_qTfdpo-g_qApcP9nSknIm8X3wMwo/view?usp=sharing(Download and view offline for best results, there's a few things to check for!)