Data Preservation

From Istituto di Radioastronomia


Sharing & Preservation

Datasets produced with Pleiadi computing systems can be shared or saved for a long time on INAF-IA2 storage systems

INAF-IA2 provides storage for “Data Sharing” and “Long Preservation”, where the first term is essentially linked to the use of Owncloud to share files, while the second refers to saving data on disks and Tape Library.

Data Sharing:

Each INAF employee and associate can access Owncloud at the site: https://ia2-owncloud.oats.inaf.it/ with his IDEM username / password (the same one he uses for mail, card, eduroam, ...). By default, each user has 10 GB of space available, which can be increased upon request. In the same way, each INAF employee can access the VOSpace through the dedicated web interface (http://vospace.ia2.inaf.it/ui) where she/he can upload or download proprietary files and access through a dedicated web link to the archives.

Long Preservation:

Temporarily, access, ingestion and retrieval of data to the "Long Preservation" section of the IA2 storage takes place through a dedicated personal account by ssh. The procedure indicated below will be updated as soon as the interface is ready.

How to request an account:

  • The account is personal and not open to a group.
  • If it is necessary to save data of a group this must be done through an account linked to a contact person.
  • The account must be requested directly to ia2@inaf.it and will use the IDEM credentials.
  • The request must indicate the size of the data to be saved, and the frequency with which you plan to access this data.

How to ingest data of remarkable size:

  • Once the account has been created, the user will be allocated a scratch area where he/she can transfer his/her data;
  • The transfer can be done via scp, rsync, gridftp. Any additional software should be agreed directly with IA2 people;
    • We kindly ask to prepare and structure the data in directories with no more than 2000 files.It is therefore suggested to proceed with a creation of a tar before to transfer it;
    • For each directory we ask to produce a file with a list of computed checksum of the related files in order to check the file integrity. Please leave each checksum file into the directory.
  • Once the transfer of a block of data (directory with sub-directory) is completed by the user, this will be taken over (after communication by the user) and frozen;

Once data has been save on the tape, they can be manage by the user in "read-only" mode;

Data Retrieval:

  • Retrieval of a single file from the tape is feasible via the VOSpace in case of relatively small files while for big size or directories we suggest to ask our admins.
  • Retrieval of a directory is inefficient from the command line and must be requested to the system administrator.
  • The requested data will be placed in the user's directory with the original path.

Note: The user does not have direct access to write on the Tape Library

FAQ:

  • Q. How can I calculate the checksums?

Here a checksum script to calculate the file’s checksum as expected. Suppose to have your data into /home/<yourname.surname>/<my_data>. Put the script into /home/<yourname.surname> and run it changing to execute the grants (before startin this process, please check the files or directories do not contain special character like: [ ] < > ? \ / " : | ' ` *)

~> cd /home/<yourname.surname>
~> chmod +x checksum
~> bash checksum my_data &

This file will run in background and it will create a log file. To check the script execution state, use : pgrep -f checksum

  • Q. How long will the files be available on line?

It depends on the availability of fast disk space and the requested amount of storage; in general it is about one month

  • Q. Is there a fee to pay?

Yes, if the amount of data to import is over 250 TB on Tape and the cost is about the cartridge cost at TeraByte.

More Info at https://www.ia2.inaf.it/index.php/ia2-services/data-sharing-preservation.