1 Introduction

Most people work with computers on a daily basis but almost nobody has been taught how to organize files and folders, how to manage data, and how to avoid data loss. The idea behind this tutorial is to address this issue and provide advice and offer general tips on how to keep your computer clean and running smoothly. In addition, we will have a look at how to organize files and folders. Finally, the tutorial will give some pointers about how to deal with and organize data so that they are stored safely and orderly.

Thus, this tutorial will address what to do if…

  • your computer is very slow and gets very hot

  • your filing system is a mess and you want to tidy it up

  • you want to secure your files from someone else being able to access them

  • you want to work more efficiently with tabulated data

This tutorial focuses on Windows machines and is intended for anybody - especially for those of us who are less computer savvy. For this reason, we will not go over manually optimizing Windows and system options (for those that are more computer savvy, a video on how to optimize you Windows and system options for maximum performance can be found here).

Some of the contents of this tutorial build on the Digital Essentials module that is offered by the UQ library, the Reproducible Research resources (created by Griffith University’s Library and eResearch Services), and on Amanda Miotto’s Reproducible Reseach Things. You can find additional information on all things relating to computers, the digital world, and computer safety in the Digital Essentials course that is part of UQ’s library resources.

2 General tips about using computers

In the following, we will go over some tips that prevent computers from slowing down or getting hot. These tips are also useful if you want to increase the speed of your computer. If cared for, even older machines can handle most of our everyday tasks and fulfill most of our computational needs.

Restart your computer regularly

  • While it can be very handy to use sleep mode repeatedly, you should shut down and restart your computer regularly. This is necessary to install already downloaded updates and it also allows your computer to dump a lot of data that accumulates when it keeps running. So, rather than using sleep mode, shut down and restart your computer on a regular basis (at least once per week) so automatic software updates can be installed. If there’s a high security risk, you may be forced to restart your computer immediately because updates often close gaps in the security of either specific programs or your operating system.

Keep your computer up-to-date

  • One thing you can do to keep your computer running smoothly is to keep your computer up-to-date by checking for updates. In fact, outdated software isn’t updated with the latest security features and puts your computer at risk.

  • Updates can be annoying but they also help to close security gaps and improve the functionality of the software that you are using. If you do not know how to check if updates are available, you can find a step-by-step tutorial here.

Look out for leeches and malware

  • When you download software, it is quite common that, in addition to the software you are looking for, other additional software will be downloaded and installed as a default. To avoid this, make sure to uncheck such options when installing the software that you want. This does simply require that you pay attention and read the options that you can check or uncheck during the installation when installing software.

Use anti-virus software

  • Antivirus software checks if any software on your computer has been reported as malware or if your software differs from what it should look like if it were not infected.

  • Symantec Endpoint Protection (SEP) is an anti-virus software that is installed on all UQ computers. This software protects your computer from malware but also checks if your computer is already “infected”. Such checks are performed regularly but to run such a check manually, you can simply click on the antivirus software icon in the lower right corner of your PC and follow the instructions. While Symantec Endpoint Protection is not free and you have to pay a fee if you want to install it on a private PC, UQ has a deal with the manufacturer that gives UQ members a discount.

  • There are also free alternatives available such as the free version of Avira in case you do not want to pay for anti-virus software. Both the free and the commercial versions of Avira have the advantage that they also allow you to check if the performance of your PC can be improved (in addition to merely protecting your computer) and - depending on the version - they can also implement these improvements.

  • Another option that helps to detect software on your computer is Malwarebytes which also has a free version and which has the most up-to-date data base of malware which means that it is able to detect even very “fresh” malware.

No data on your Desktop or C-drive

  • When you start your computer, different parts of the computer are started at different times with different priorities. The Desktop is always started with the highest priority which means that if you have a lot of stuff on your desktop, then the computer will load all that stuff once it kicks into action (which will cause it to work quite heavily at each start and also slow down quite dramatically).

  • This means that you should avoid storing data on any part of your system that is activated routinely. Rather, try to separate things that need to be loaded from things that only need to be loaded if they are actually used. For this reason, you should also avoid storing data on your C-drive. In fact, the C-drive should only contain programs as it is activated automatically at each start.

  • You can, for example, store all your projects on your D-Drive or, even better, on OneDrive, Google’s MyDrive, or in Dropbox where it is only started once you actively click and open a folder. If you use cloud-based storage options (OneDrive, Google’s MyDrive, or Dropbox) the files are also backed up automatically. However, you should not use either of these for sensitive data (sensitive data should be stored on your PC, an external hard drive and UQ’s RDM.)

  • If you want to have data accessible via your desktop, you can still do so by using links (also called short-cuts): place a link to your data (stored on another drive) on your desktop and you can load your data easily without it being activated at every start.

Tidy your room!

  • Just like in real life, you should clean your computer. Full bins, for instance, will slow down your computer so you should empty it regularly. In addition, your computer will store and collect files when it is running. These files (temp files, cookies, etc.) also slow your computer down. As such files are redundant, they should be deleted regularly. You can remove such files manually using the msconfig prompt (you find a video tutorial on how to do this here). If you want to optimize your computer manually via the msconfig prompt, simply enter msconfig in the Window’s search box in the lower left corner of your PC (or search for it in the search box that opens when you click on the Window’s symbol). However, an easier way is to use software to help you with cleaning your computer.

Software to clean your computer

  • While UQ provides various software applications that keep your computer secure, it does not have any specific recommendations for software to keep your computer digitally clean.

  • Luckily, there are numerous software applications that can help you with keeping your computer clean and up-to-date (you will find a list of software options for PCs here). We will only look at two options here (The two applications we will discuss are CCleaner and Avira) but a quick Google search will provide you with many different alternatives.

  • The most widely used program to clean your computer (if you have a PC rather than a Mac) is CCleaner. There are different versions of CCleaner but the free version suffices to delete any superfluous files and junk from your computer. When using this program, you should, however, be careful not to remove information that is useful. For instance, I like to keep all tabs of my current session in my browser and I therefore have to change the default options in CCleaner to avoid having to reopen all my tabs when I next open my browser. Here is a short video tutorial on how to use the CCleaner.

  • In addition, the free version of Avira also has a function that you can use to clean your computer. In fact, Avira will also inform you about any software that is out-of-date and other issues. Here is a short video tutorial on how to use the Avira for cleaning your computer and performing an anti-virus scan.

3 Basics about folder structure

The UQ Library offers the Digital Essentials module Working with Files. This module contains information on storage options, naming conventions, back up options, metadata, and file formats. Some of these issues are dealt with below but the materials provided by the library offer a more extensive introduction into these topics.

There are various ways in which you can organize your folders. All of these ways to organize your folders have different advantages and problems but they all have in common that they rely on a tree-structure - more general folders contain more specialized ones. For example, if you want to find any file with as few clicks as possible, an alphabetical folder structure would be a good solution. Organized in this way, everything that starts with a certain letter will be stored by its initial letter (e.g. everything starting with a “t” such as “travel” under “T” or everything related to your “courses” under “C”). However, organizing your data alphabetically is not intuitive and completely unrelated topics will be located in the same folder.

A more common and intuitive way to organize your data is to separate your data into meaningful aspects of your life such as Work (containing, e.g., teaching and research), Living (including rent, finances, and insurances), and Media (including Movies, Music, and Audiobooks).

3.1 Naming conventions

A File Naming Convention (FNC) is a framework or protocol if you like for naming your files in a way that describes what files contain and importantly, how they relate to other files. It is essential prior to collecting data to establish an agreed FNC.

Folders (as well as files) should be labelled in a meaningful way. This means that you avoid names like Stuff or Document for folders and doc2 or homework for files.

Naming files consistently, logically and in a predictable manner will prevent against unorganized files, misplaced or lost data. It could also prevent possible backlogs or project delays. A file naming convention will ensure files are:

  • Easier to process - All team members won’t have to over think the file naming process

  • Easier to facilitate access, retrieval and storage of files

  • Easier to browse through files saving time and effort

  • Harder to lose!

  • Check for obsolete or duplicate records

The University of Edinburgh has a comprehensive and easy to follow list (with examples and explanations) of 13 Rules for file naming conventions. You can also use the recommendations of the

Australian National Data Services (ANDS) guide of file wrangling. Some of the suggestions are summarized below.

Like the different conventions for folders, there are different conventions for files. As a general rule, any special character symbols, such as +, !, ", -, ., ö, ü, ä, %, &, (, ), [, ], &, $, =, ?, ’, #, or / but also including white spaces, should be avoided (while _ is also a special character, it is still common practice to include them in file names for readability). One reason for this is that you will may encounter problems when sharing files if you avoid white spaces and special characters. Also, some software applications automatically replace or collapse white spaces. A common practice to avoid this issue is to capitalize initial letters in file names which allows you avoid white spaces. An example would be TutorialIntroComputerSkills or Tutorial_IntroComputerSkills.

When you want to include time-stamps in file names, the best way to do this is to use the YYYYMMDD format (rather than DDMMYYYY or even D.M.YY). The reason is that if dates are added this way, the files can be easily sorted in ascending or descending order. To elaborate on the examples shown before, we may use TutorialIntroComputerSkills20200413 or Tutorial_IntroComputerSkills_20200413

For Beginners

  • Let’s look at some easy naming convention for your data files and documents. Any dates are best stored with YYYY-MM-DD. Try to avoid spaces in your file names

For Intermediates

For Advanced file namers

  • Do you have a policy in your team/section/cluster around naming conventions? If not, this is a great way of getting everyone on the same page.

3.2 Documentation and the Bus Factor

Documentation is the idea of documenting your work so that outsiders (or yourself after a long time) can understand what you did and how you did it. As a general rule, you should document where to find what as if you informed someone else how to find stuff on your computer.

Documentation can include where your results and data are saved but it can also go far beyond this. Documentation does not have to be complex and it can come in different forms, depending on your needs and your work. Documenting what you do can also include photos, word documents with descriptions, or websites that detail how you work.

The idea behind documentation is to keep a log of the contents of folders so that you at a later point in time or someone else can continue your work in case you or someone else is run over by a bus (hence the Bus Factor).

In fact, documentation is all about changing the Bus Factor - how many people on a project would need to be hit by a bus to make a project fail. Many times, projects can have a bus factor of one. Adding documentation means when someone goes on leave, needs to take leave suddenly or finishes their study, their work is preserved for your project.

If you work in a collaborative project, it is especially useful to have a log of where one can find relevant information and who to ask for help with what. Ideally you want to document anything that someone coming on board would need to know. Thus, if you have not created a log for on-boarding, the perfect person to create a log would be the last person who joined the project. Although there is no fixed rule, it is recommendable to store the log either as a ReadMe document or in a ReadMe folder on the top level of the project.

For Beginners

  • Read this first: How to start Documenting and more by CESSDA ERIC

  • Start with documenting in a text file or document- any start is a good start

  • Have this document automatically synced to the cloud with your data or keep this in a shared place such as Google docs, Microsoft teams, or Owncloud. If you collaborate on a project and use UQ’s RDM, you should store a copy of your documentation there.

For Intermediates

  • Once you have the basics in place, go into detail on how your workflow goes from your raw data to the finished results. This can be anything from a detailed description of how you analyse your data, over R Notebooks, to downloaded function lists from Virtual Lab.

For Advanced documentarians

  • Learn about Git Repositories and wikis.

3.3 Keep copies

Keeping a copy of all your data (working, raw and completed) in the cloud is incredibly important. This ensures that if you have a computer failure, accidentally delete your data or your data is corrupted, your research is recoverable and restorable.

The 3-2-1 backup rule

Try to have at least three copies of your project that are in different locations. The rule is: keep at least three (3) copies of your data, and store two (2) backup copies on different storage media, with one (1) of them located offsite. Although this is just a personal preference, I always safe my projects

  • on my personal notebook

  • on at least one additional hard drive (that you keep in a secure location)

  • in an online repository (for example, UQ’s Research Data Management system (RDM) OneDrive, MyDrive, GitHub, or GitLab)

Using online repositories ensures that you do not lose any data in case your computer crashes (or in case you spill lemonade over it - don’t ask…) but it comes at the cost that your data can be more accessible to (criminal or other) third parties. Thus, if you are dealing with sensitive data, I suggest to store it on an additional external hard drive and do not keep cloud-based back-ups. If you trust tech companies with your data (or think that they are not interested in stealing your data), cloud-based solutions such as OneDrive, Google’s MyDrive, or Dropbox are ideal and easy to use options (however, UQ’s RDM is a safer option).

The UQ library also offers additional information on complying with ARC and NHMRC data management plan requirements and that UQ RDM meets these requirements for sensitive data (see here).

For Beginners

  • Get your data into UQ’s RDM or Cloud Storage - If you need help, talk to the library or your tech/eResearch/QCIF Support

For Advanced backupers

  • Build a policy for your team or group on where things are stored. Make sure the location of your data is saved in your documentation

3.4 Avoid duplicates

Many of the files on our computers have several duplicates or copies on the same machine. Optimally, each file should only be stored once (on one machine). Thus, to minimize the superfluous files on your computer, we can delete any duplicates of files. You can, of course, do that manually but a better way to do this is to use programs that detect files that are identical in name, file size, and date of creation. One of these programs is the Douplicate Cleaner. A tutorial on how to use it can be found here.

4 How to organize projects

This section focuses on how to organize your projects and how to use your computer in doing so.

Project folder design

Having a standard folder structure can keep your files neat and tidy and save you time looking for data. It can also help if you are sharing files with colleagues and having a standard place to put working data and documentation.

Store your projects in a separate folder. For instance, if you are creating a folder for a research project, create the project folder within a separate project folder that is within a research folder. If you are creating a folder for a course, create the course folder within a courses folder within a teaching folder, etc.

Whenever you create a folder for a new project, try to have a set of standard folders. For example, when I create research project folders, I always have folders called archive, datatable, and images. When I create course folders, I always have folders called slides, assignments, exam, studentmaterials, and correspondence. However, you are, of course, free to modify and come up or create your own basic project design. Also, by prefixing the folder names with numbers, you can force your files to be ordered by the steps in your workflow.

  • Having different sub folders allows you to avoid having too many files and many different file types in a single folder. Folders with many different files and file types tend to be chaotic and can be confusing. In addition, I have one ReadMe file on the highest level (which only contains folders except for this one single file) in which I describe very briefly what the folder is about and which folders contain which documents as well as some basic information about the folder (e.g. why and when I created it). This ReadMe file is intended both for me as a reminder what this folder is about and what it contains but also for others in case I hand the project over to someone else who continues the project or for someone who takes over my course and needs to use my materials.

If you work in a team or share files and folders regularly, it makes sense to develop a logical structure for your team, you need to consider the following points:

  • Check to make sure there are no pre-existing folder structure agreements

  • Name folders appropriately and in a meaningful manner. Don’t use staff names and consider using the type of work

  • Consistency - make sure you use the agreed structure/hierarchy

  • Structure folders hierarchically - start with a limited number of folders for the broader topics, and then create more specific folders within these

  • Separate ongoing and completed work - as you start to create lots of folders and files, it is a good idea to start thinking about separating your older documents from those you are currently working on

  • Backup – ensure folders and files are backed up and retrievable in the event of a disaster. UQ like most universities, have safe storage solutions.

  • Clean up folders and files post project.

For Beginners

  • Pick some of your projects and illustrate how you currently organize your files. See if you can devise a better naming convention or note one or two improvements you could make to how you name your files.

  • There are some really good folder template shapes around. Here is one you can download.

For Advanced Folder designers

  • Come up with a policy for your team for folder structures. You could create a template and put it in a downloadable location for them to get them started.

4.1 Encryption and Computer Security

Ensuring that your computer and network are secured means that you have far less a chance of a data breach or hack.

As some information is sensitive (especially when it comes to exams and attendance in courses), I encrypt folders and files that contain such information. To encrypt a file or folder I right-click on the file or folder and go to properties > advanced, the I check encrypt contents to secure data and confirm the changes by checking OK. Then I back-up the encryption key where I check enable certificate privacy and create password and store the encrypted file in the original folder. You can find a step-by-step guide on how to encrypt files in this video.

You can also encrypt your entire computer. Information about how to do this can be found here and tips specific for

  • Macs can be found here

  • Windows 10 can be found here

  • Windows 7 can be found here

It is also recommendable to use or create strong passwords. Here are some tips for creating secure passwords:

  • Don’t just use one password - use a different password for every account

  • Use a pass phrase - instead of a singular word, try a sequence of words for instance, DogsandCatsareawesome (Do not use this as your password)

  • Include numbers, capital letters and symbols

  • The longer the password, the better

  • Don’t write passwords down

  • Turn on two-factor authentication

An alternative is to use a password manager. Again, the Digital Essentials module has a lot of information about password management (password managers explained in detail in section 4).

Password managers provide a similar level of convenience to “Login with Facebook” but are much safer. Password managers create an encrypted database of all your usernames and passwords, that only you can access with a master password. This means you only need to remember one password to have access to all of your accounts. Most password managers will include the ability to generate secure passwords that you can use for new or existing account logins. Because you only need to remember one master password, you can generate and store complex passwords for your needs. This way, you are not relying on your memory and easy passwords to remember many different account login details.

Also, to find out if your email has been compromised, you can check this here

Recently, UQ has adopted Multi-Factor Authentication which is more secure than simple authentication. You should use it when the option is available (Signing in with a password and an email to your account with a pin).

As a general tip, avoid unsecured wifi and, if its available, Eduroam is usually a better option than free wifi/cafe wifi.

For Beginners

  • Have good strong passwords and encrypt your computer’s hard drive

For Intermediates

  • Get set up on a password manager

For Advanced passwordists

  • Try to ensure that your team/cluster is encrypted and practicing safe habits.

5 How to organize data

This section will elaborate on who to organize and handle (research) data and introduce some basic principles that may help you to keep your data tidy.

Never delete data!

  • The first thing to keep in mind when dealing with data is to keep at least one copy of the original data. The original data should never be deleted but, rather, you should copy the data and delete only sections of the copy while retaining the original data intact.

Tips for sensitive data

  • Sensitive data are data that can be used to identify an individual, species, object, or location that introduces a risk of discrimination, harm, or unwanted attention. Major, familiar categories of sensitive data are: personal data - health and medical data - ecological data that may place vulnerable species at risk.

Separating identifying variables from your data

  • Separating or deidentifying your data has the purpose to protect an individual’s privacy. According to the Australian Privacy Act 1988, “personal information is deidentified if the information is no longer about an identifiable individual or an individual who is reasonably identifiable”. Deidentified information is no longer considered personal information and can be shared. More information on the Commonwealth Privacy Act can be located here.

  • Deidentifying aims to allow data to be used by others for publishing, sharing and reuse without the possibility of individuals/location being re-identified. It may also be used to protect the location of archaeological findings, cultural data of location of endangered species.

  • Any identifiers (name, date of birth, address or geospatial locations etc) should be removed from main data set and replaced with a code/key. The code/key is then preferably encrypted and stored separately. By storing deidentified data in a secure solution, you are meeting safety, controlled, ethical, privacy and funding agency requirements.

  • Re-identifying an individual is possible by recombining the deidentifiable data set and the identifiers.

Australian practical guidance for Deidentification (ARDC)

  • Australian Research Data Commons (ARDC) formerly known as Australian National Data Service (ANDS) released a fabulous guide on Deidentification. The Deidentification guide is intended for researchers who own a data set and wish to share safely with fellow researchers or for publishing of data. The guide can be located here.

Nationally available guidelines for handling sensitive data

  • The Australian Government’s Office of the Australian Information Commissioner (OAIC) and CSIRO Data61 have released a Deidentification Decision Making Framework, which is a “practical guide to deidentification, focusing on operational advice”. The guide will assist organisations that handle personal information to deidentify their data effectively.

  • The OAIC also provides high-level guidance on deidentification of data and information, outlining what deidentification is, and how it can be achieved.

  • The Australian Government’s guidelines for the disclosure of health information, includes techniques for making a data set non-identifiable and example case studies.

  • Australian Bureau of Statistics’ National Statistical Service Handbook. Chapter 11 contains a summary of methods to maintain privacy.

  • med.data.edu.au gives information about anonymisation

  • The Office of the Information Commissioner Queensland’s guidance on deidentification techniques can be found here

Tips for managing deidentification (ARDC)

  • Plan deidentification early in the research as part of your data management planning

  • Retain original unedited versions of data for use within the research team and for preservation

  • Create a deidentification log of all replacements, aggregations or removals made

  • Store the log separately from the deidentified data files

  • Identify replacements in text in a meaningful way, e.g. in transcribed interviews indicate replaced text with [brackets] or use XML mark-up tags.

Management of identifiable data (ARDC)

Data may often need to be identifiable (i.e. contains personal information) during the process of research, e.g. for analysis. If data is identifiable then ethical and privacy requirements can be met through access control and data security. This may take the form of:

  • Control of access through physical or digital means (e.g. passwords)

  • Encryption of data, particularly if it is being moved between locations

  • Ensuring data is not stored in an identifiable and unencrypted format when on easily lost items such as USB keys, laptops and external hard drives.

  • Taking reasonable actions to prevent the inadvertent disclosure, release or loss of sensitive personal information.

Safely sharing sensitive data guide (ARDC)

  • ANDS’ Deidentification Guide collates a selection of Australian and international practical guidelines and resources on how to deidentify data sets. You can find more information about deidentification here and information about safely sharing sensitive data here.

5.1 Data as publications

More recently, regarding data as a form of publications has gain a lot of traction. This has the advantage that it rewards researchers who put a lot of work into compiling data and it has created an incentive for making data available, e.g. for replication. The UQ RDM and UQ eSpace can help with the process of publishing a dataset.

Digital Object Identifier (DOI) and Persistent identifier (PiD)

Once you’ve completed your project, help make your research data discoverable, accessible and possibly re-usable using a PiD such as a DOI! A Digital Object Identifier (DOI) is a unique alphanumeric string assigned by either a publisher, organisation or agency that identifies content and provides a PERSISTENT link to its location on the internet, whether the object is digital or physical. It might look something like this http://dx.doi.org/10.4225/01/4F8E15A1B4D89.

DOIs are also considered a type of persistent identifiers (PiDs). An identifier is any label used to name some thing uniquely (whether digital or physical). URLs are an example of an identifier. So are serial numbers, and personal names. A persistent identifier is guaranteed to be managed and kept up to date over a defined time period.

Journal publishers assign DOIs to electronic copies of individual articles. DOIs can also be assigned by an organisation, research institutes or agencies and are generally managed by the relevant organisation and relevant policies. DOIs not only uniquely identify research data collections, it also supports citation and citation metrics.

A DOI will also be given to any data set published in UQ eSpace, whether added manually or uploaded from UQ RDM. For information on how cite data, have a look here.

Key points

  • DOIs are a persistent identifier and as such carry expectations of curation, persistent access and rich metadata

  • DOIs can be created for DATA SETS and associated outputs (e.g. grey literature, workflows, algorithms, software etc) - DOIs for data are equivalent with DOIs for other scholarly publications

  • DOIs enable accurate data citation and bibliometrics (both metrics and altmetrics)

  • Resolvable DOIs provide easy online access to research data for discovery, attribution and reuse

For Beginners

  • Ensure data you associate with a publication has a DOI- your library is the best group to talk to for this.

For Intermediates

  • Learn more about how your DOI can potentially increase your citation rates by watching this 4m:51s video

  • Learn more about how your DOI can potentially increase your citation rate by reading the ANDS Data Citation Guide

For Advanced identifiers

5.2 Tidy data principles

The same (underlying) data can be represented in multiple ways. The following three tables are show the same data but in different ways.

Table 1.
country continent 2002 2007
Afghanistan Asia 42.129 43.828
Australia Oceania 80.370 81.235
China Asia 72.028 72.961
Germany Europe 78.670 79.406
Tanzania Africa 49.651 52.517
Table 2.
year Afghanistan (Asia) Australia (Oceania) China (Asia) Germany (Europe) Tanzania (Africa)
2002 42.129 80.370 72.028 78.670 49.651
2007 43.828 81.235 72.961 79.406 52.517
Table 3.
country year continent lifeExp
Afghanistan 2002 Asia 42.129
Afghanistan 2007 Asia 43.828
Australia 2002 Oceania 80.370
Australia 2007 Oceania 81.235
China 2002 Asia 72.028
China 2007 Asia 72.961
Germany 2002 Europe 78.670
Germany 2007 Europe 79.406
Tanzania 2002 Africa 49.651
Tanzania 2007 Africa 52.517

Table 3 should be the easiest to parse and understand. This is so because only Table 3 is tidy. Unfortunately, however, most data that you will encounter will be untidy. There are two main reasons:

  • Most people aren’t familiar with the principles of tidy data, and it’s hard to derive them yourself unless you spend a lot of time working with data.

  • Data is often organised to facilitate some use other than analysis. For example, data is often organised to make entry as easy as possible.

This means that for most real analyses, you’ll need to do some tidying. The first step is always to figure out what the variables and observations are. Sometimes this is easy; other times you’ll need to consult with the people who originally generated the data. The second step is to resolve one of two common problems:

  • One variable might be spread across multiple columns.

  • One observation might be scattered across multiple rows.

To avoid structuring in ways that make it harder to parse, there are three interrelated principles which make a data set tidy:

  • Each variable must have its own column.

  • Each observation must have its own row.

  • Each value must have its own cell.

An additional advantage of tidy data is that is can be transformed more easily into any other format when needed.

5.3 How to minimize storage space

Most people use or rely on data that comes in spreadsheets and use software such as Microsoft Excel or OpenOffice Calc. However, spreadsheets produced by these software applications take up a lot of storage space.

  • One way to minimize the space, that your data takes up, is to copy the data and paste it into a simple editor or txt-file. The good thing about txt files is that they take up only very little space and they can be viewed easily so that you can open the file to see what the data looks like. You could then delete the spread sheet because you can copy and paste the content of the txt file right back into a spread sheet when you need it.

How to cite this tutorial

Schweinberger, Martin. 2020. General Tips on Computering. Brisbane: The University of Queensland. url: https://slcladal.github.io/introcomputer.html.