Data Management in Research

Research data can be in a Word document, a notebook, a bunch of papers in a folder, a program like EverNote or OneNote, or bibliographic software like EndNote.

Good data management allows you to identify, locate and effectively use your data. Can:

Manage large amounts of various data files.

Locate and search for files easily.

Distinguish different files and file versions within a folder.

Avoid confusion when working in a team or sharing files.

Avoid data loss due to overwriting or accidental deletion of files.

Provide context for data retrieval and storage.

What does Data Management in Research cover?

File management includes:

Structure the hierarchical organization of file folders logically and clearly;

Plan the syntax and vocabulary of individual file names;

Consistently use agreed conventions.

Proprietary vs. Open Formats

Whenever possible, you should save the data in a non-proprietary (open) file format. If converting to an open data format is going to result in some data loss from your files, you can consider saving the data in both the proprietary format and an open format. Having at least some of the information available later will be better than having none available.

When it is necessary to save the files in a proprietary format, consider including a readme.txt file in your directory that documents the name and version of the software used to generate the file, as well as the company that created it. This could help you in the future if you need to figure out how to open these files again.

Guidelines for choosing formats

When selecting file formats, the ideal is that they are

Non-owners

Unencrypted

Uncompressed

Commonly used by the research community

Adhering to an open and documented standard, such as that described by the State of California (see AB 1668, 2007)

Interoperable across platforms and applications

Fully published and available without copyright

Fully and independently deployable by multiple software vendors across multiple platforms without any intellectual property restrictions for the necessary technology

Developed and maintained by an open standards organization with a well-defined inclusive process for standard evolution.

Some preferred file formats

Containers: TAR, GZIP, ZIP

Databases: XML, CSV

Geospatial: SHP, DBF, GeoTIFF, NetCDF

Moving images: MOV, MPEG, AVI, MXF

Sounds: WAVE, AIFF, MP3, MXF

Statistics: ASCII, DTA, POR, SAS, SAV

Still images: TIFF, JPEG 2000, PDF, PNG, GIF, BMP

Tabular data: CSV

Text: XML, PDF/A, HTML, ASCII, UTF-8

Web Archive: WARC

Data versioning

Version control refers to saving new copies of files when changes are made, so that you can go back and recover specific versions of the files later.

Name versions

When you create new versions of your files, record the changes that are made to them and give the new files a unique name. Follow the site’s general tips for naming files, but also note the following:

Include a version number, for example “v1”, “v2”, or “v2.1”.

Include information about the status of the file, such as “draft” or “final”, as long as you don’t end up with confusing names like “final2” or “final_revisado”.

And include information about changes made, such as “cropped” or “normalized.”

Simple file versions

A simple way to version files is to manually save new versions when significant changes are made. This works well if:

You don’t need to save many different versions.

Only one person works with the files.

Files are always accessed from a single location.

Version control involves saving new copies of files when changes are made, so that you can go back and recover specific versions of the files later.

Saving multiple versions allows you to decide later that an earlier version is preferred. Then you can immediately go back to that version instead of having to retrace your steps to recreate it.

This versioning method requires you to remember to save the new versions when appropriate. This method can be confusing when you collaborate on a document with multiple people.

Google Drive

Google Drive’s word processing, spreadsheet, and slides processing software automatically creates versions as you edit.

Every time you edit files created in Google Drive, new versions are saved as you go.

Version information includes who edited the file and the date and time the new version was created.

You can also view changes made from one version to another (or between the current version and any previous version) and revert to a previous version at any time.

Pros: The real-time editing feature means Google Drive works well for collaborating on files with multiple people. And since the files are in Google Drive, they are accessible from anywhere.

Cons: You’re restricted to software provided by Google, which may not have all the bells and whistles of your word processor, spreadsheet, or desktop presentation software.

Data Management Plans

A data management plan (PMD) is a written document that describes the data you expect to acquire or generate during the course of a research project, how you will manage, describe, analyze, and store that data, and what mechanisms you will use at the end of your project to share and preserve your data.

You may have already considered some or all of these issues in connection with your research project, but writing them down helps you formalize the process, identify the pain points of your plan, and provide you with a record of what you intend to do.

Data management is best approached in the early stages of a research project, but it’s never too late to develop a data management plan.

Creating a PMD

A data management plan is a living document. Research is based on discovery, and the research process sometimes requires you to change gears and review the intended path. Your PDM is a living document that you may need to modify as the course of your research changes. Remember that whenever your research plans change, you should review your PDM to make sure it continues to meet your needs.

Preparing to write a PMD

Before you sit down to write your research management plan, you should think a little. The following documents provide guidance on the types of issues you may need to consider as you begin the process of writing your PMD.

Include IT costs in research grants

Data storage and backups

Data Best Practices

Creating metadata

Working with sensitive data

Sharing data

Data licensing

Data retention

Agunas Final Recommendations

Copy and paste your search strings (especially the ones that gave good results) into a document, and write down which databases gave you the best results (you may have to search again).

Whenever you find something that you find remotely interesting, write it down so you can come back to it later. This will help you in the wee hours of the morning when you desperately try to remember where you read something.

Remember that if you use someone else’s information, you have to cite it. If you can’t cite it, you can’t use it, so take note of who said what.

The simplest (and best) thing you can do is always write down the details of the main reference every time you read something, and leave a few notes to remember what it was about (and take note of anything that catches your eye, even if you don’t think it’s especially useful for you at the time). It is very likely that you will find yourself trying to remember it in the middle of the night, when you need to finish some task.

Our specialists wait for you to contact them through the quote form or direct chat. We also have confidential communication channels such as WhatsApp and Messenger. And if you want to be aware of our innovative services and the different advantages of hiring us, follow us on Facebook, Instagram or Twitter.

If this article was to your liking, do not forget to share it on your social networks.

You may also be interested in: Metadata in Research

Bibliographic References

Chervenaka, Ann, Ian Foster, Carl Kesselman, Charles Salisbury, and Steven Tuecke. “The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets.” Journal of Network and Computer Applications 23, no. 3 (2000): 187-200. https://doi.org/10.1006/jnca.2000.0110

Childs, Sue, Julie McLeod, Elizabeth Lomas, and Glenda Cook. “Opening Research Data: Issues and Opportunities.” Records Management Journal 24, no. 2 (2014): 14-162. http://dx.doi.org/10.1108/RMJ-01-2014-0005

Figueiredo, Ana Sofia. “Data Sharing: Convert Challenges into Opportunities. ” Frontiers in Public Health 5, no. 327 (2017). https://doi.org/10.3389/fpubh.2017.00327.

You might also be interested in: Metadata in Research

Data Management in Research. Photo: Unsplash. Credits: Brooke Cagle

Data Management in Research

What does Data Management in Research cover?

Proprietary vs. Open Formats