Data Backups – What’s your strategy?

I was recently reminded of the 3-2-1 Backup rule – in theory, it sounds very straightforward!

  • 3 copies of your data
  • on at least 2 DIFFERENT media types
  • 1 of those being stored in a different physical location to the other two

If you have ever sat down and tried to work out WHAT you need to back up and how often, I am sure you have found that trying to DO regular data backups can get a little complicated. There are strategies and tools to help you, but getting started is always the hardest part.

Step One – Your Unique Circumstances

You need to get an understanding of you projects digital-data-situation, computing resources and make some best-guesses about how you are going to be working with the data in your project.

  • Will you be only working on ONE computer?
  • Will you be MAINLY working on one computer?
  • Are the datasets for your project so big that you need to work on them from an external device no matter what computer(s) you are using?
  • Does the software you will be using require that data be placed in specific locations so that it can see it?
    (Yes, I am side-eyeing certain scientific software very hard here!)
  • Are you using specialist computers remotely (Hullo, COVID-19.) or time-sharing on a HPC?
  • Where are you going to store the copies of any source data provided to you by others that will inform/enable your work?
  • Will this source data be updated or added to over the life of your project?
  • Are you creating or collecting new digital data as part of your project?
  • Have you thought about how to tell the difference between a “valuable dataset” and “step in and analysis chain dataset” if you are doing analysis work?

At the beginning of a project, you might think you cannot possibly know the answer to these questions – but you CAN make enough of a start to put yourself in a good position to not get overwhelmed as the data starts coming in. Each project is different but a good starting place is to assume that you will have:

  1. Source Data: This is the data that is provided to your project. You may have had to purchase it or it has been provided under a Licence (eg: Creative Commons CC-BY). You should always cite this data in any maps, reports or publications if you use it in your project and include it as a “parent dataset” in the metadata of any data that it helped you to create.
  2. Working Data: This is the collection of files that are generated as you do your project. I like to work in a folder called “Working” for each stage of a project as it means I can then archive the folder (internal mess and all) if necessary and restart a stage if I need to try a different direction but want to keep my options open by not deleting the work I have done already. Using this technique, I usually avoid ending up with filenames like “final_final_reallyfinal_thisisit_imeanit_2.shp” <- an honest to goodness filename I saw in an archived project once!
  3. Created Data: Can also be called “Output Data”, this is the data created by your project – the new or derived data that has come about due to the work done as a result of the research work you have done. It could have collected in the field or created from analysis, either way, It is up to you, as the researcher, to identify what needs to be moved from the WORKING folder to the CREATED folder, and to ensure that you write up the metadata to support it. Without metadata, your new data has much less value, and the metadata (if you write it up at the time the dataset is created) has the added bonus of being a great reminder for how it was created when you are writing your thesis and any future papers.

Having identified WHERE (computers) and HOW (starting file structure and likely software + implications), we are ready to start talking about how you can develop your backup strategy.

Step Two – Evaluate Your Resources

Everyone with a Registered Project can get support on managing their geospatial data – if you are not confident to go it alone on developing your first Data Backup Strategy, what follows are two possible scenarios of many. Follow along and then have a go so that you can at least bring what you know to a meeting with your SAL Tech Staff contact.

Example One

  • working from personal laptop
  • Using UOWmail OneDrive Account
  • Project has an S: drive share
What we need:What we haveCan we do it?
3 copies1 x laptop
1 x OneDrive
1 x S: Drive
YES
2 types of media1 x hard drive
(laptop)
1 x cloud service
(OneDrive)
1 x enterprise
(S: drive)
YES
1 different location1 x mobile
1 x cloud
1 x enterprise
YES
3-2-1 SUCCESS!

Example Two

  • working from multiple computers
  • Data stored on an external Hard Drive
What we need:What we haveCan we do it?
3 copies1 x external hard drive
1 x personal laptop
1 x drive of lab machine
yes…
2 types of media3 x hard driveNO
1 different location2 x mobile
1 x lab
sort of…
3-2-1 FAILURE…

Example Two is a scenario that is going to get complicated, if it doesn’t feel that way from the start! It often arises due to extremely large datasets being integral to the project. A project share is still something you should consider – we have access to enterprise solutions that can handle this and allow you to have the resources for a solid 3-2-1 Backup Strategy for your project. Whether you choose to go through SAL or not, we strongly recommend using a 3-2-1 approach to managing your data backups.

Step Three – Developing Your Plan

What you choose to do here is going to be influenced by:

  • the number of people actively working on data in the project
  • the size and scope of the project
  • the frequency of changes to the data in the project

In most cases, you will be the primary researcher, and the person doing all the work on/with the data. SAL registers “Projects” rather than people to specifically assist with data management for early career researchers. Due to this, we will use a 12 month student project as our example for this step, with an addition of using the TOL computers to the resources for Example One above.

LaptopOneDriveS: DriveTOL
3 copiesSource Data Folder
Working Folder
Created Data Folder
Source Data Folder
Working Folder
Created Data Folder
Source Data Folder

Created Data Folder
Working Folder
2 mediahard drivecloudenterprisehard drive
1+ locationsmobilecloudenterprisecampus
Frequency of BackupInstall the app any files in the OneDrive Folder will be backed up every time you are online.Using your UOWmail account for storage ensures data is managed in accordance with UOW and Australian Government policies.Snapshots with some file-recovery available for a limited period of time.The local hard drives of these machines are considered “scratch space”. There are no backups made of any of the drives in this lab.
What goes where?With the OneDrive App, you have the ability to choose which folders from your account are synced to your local drive:
undefined
Choose what you need, when you need it – make changes as often as necessary.
You have the space – back up everything, for -most- student projects, here.

If you are done with a part of the project, rename the working folder appropriately to archive it and let it sit in OneDrive till the end of the Project – there if you need it.
If possible all the data from your :
– SOURCE
– CREATED
folders should be backed up here.

If you end up with a large volume of SOURCE data, an alternative third storage solution for that can be discussed with SAL.
Due to limited hard drive space and the multi-user nature of the TOL computers, the OneDrive app is not installed in TOL.

As each section of work is being done in its own folder in the WORKING folder on OneDrive, the appropriate folder can be downloaded onto the machine using a web browser.

Any new files can be uploaded to the CREATED folder and the working folder can be zipped and uploaded to OneDrive for Archiving.
For a project working mostly from a personal laptop with a small amount of TOL use.

Step Four – Manage your Data while Doing your Project

This is where the “rubber hits the road”.

You are now in a position to better manage the digital data that is part of your project. You will still need to do some regular housekeeping, but hopefully much less and you won’t be losing anything critical if technical tragedy (hard drives FAIL!) befalls your project.

Each time you place a geospatial dataset into the CREATED folder, remember to write up some metadata for it. The SAL Metadata form does not take long and adds a lot of contextual value to your data – it may even help you with writing future papers and reports – and is essential for getting your hard work recognised and eligible for storage in a Repository.

If you are not sure whether a geospatial dataset meets the requirements of being “derived” under the terms of a licence for one of the parents datasets or if it is ‘important enough’ to need metadata – make an appointment with SAL. There are some techniques we can use to help work this out.

Step Five – At Project End

There is one last thing that your CREATED folder does for you, in addition to providing you with an appendix of your metadata for your thesis!

If you have been diligent over the course of the project, this folder contains everything needed to support the reports and publications that have (and can be) generated by the project. This folder can, and should, be archived in a research data repository. If you would like to place your geospatial data into the SAL Repository, ensure your project is registered and speak to the SAL Technical Staff.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: