Author: Raidell Avello Martínez – Translation: Erika-Lucia Gonzalez-Carrion
In the field of scientific research a historical transformation is happening. The impulse of open science, linked to the advancement of ICT, and the large increase in the volume of data are invariant elements to plan, execute and communicate the results of research. Research data management emerges as a new element of scientific communication that affects researchers, instances of publication, indexation and evaluation of science. In this environment, the analysis of large volumes of data (bigdata) and its value is larger than ever before and continues to increase.
In particular, in scientific research, there are different types of data, classified according to the methodology applied to obtain them. Among the main types are observational ones, which correspond to historical records (they can only be obtained at a unique time and place); the experimental data, that is, those generated with the application of different types of experiments; there are also computational data, which may include input data or application activity logs; in the same way, as well as simulation data, generated from test models. These datasets, more and more frequently, are published in data repositories designed for this purpose in order to be accessed and in turn cited.
Although this practice is not yet widespread, it is an international need that public funding agencies, universities, foundations, regular publications, etc., offer these storage services and require researchers to publish them. Above all, insist that researchers, together with their published electronic documents, provide links to their datasets. Likewise, make researchers see the value of the data and its potential for the recognition of their work, both in their professional circles, and for the increase in the citation that their search and reuse can generate.
Given the importance that the publication of the data has gained, the main publishers such as Elsevier, Springer, etc., as well as institutions and universities, have created this new service, with levels of organization by subjects, descriptors and metadata that allow indexing and efficient search. Just to mention some of the most popular are: Harvard Dataverse, Open Science Framework and Mendeley Data
Recently, the journal Nature published a fairly comprehensive list of recommended data repositories “Recommended Data Repositories”, organized by areas of science, which can be very useful to researchers.
Similarly, specialized search engines have proliferated datasets, which allow the search and recovery of datasets associated with scientific research. Here are 2 interesting initiatives:
Data Search from Google
Google Dataset Search (https://toolbox.google.com/datasetsearch), allows users to search for datasets stored on the Internet by keywords. This tool displays information about data sets hosted in thousands of Internet repositories; In this way, any user can access them and take advantage of the information they contain.
This project will also have other advantages, as it will allow a) to create a data exchange ecosystem that will encourage data publishers to follow recommended practices for storing and publishing data, and b) offer scientists a way to show the impact of their work through the citations of the data sets they have produced.
DataSearch de Elsevier
DataSearch (https://datasearch.elsevier.com/) is an Elsevier search engine, associated with Scopus, dedicated to primary research data. It is the first search engine that can search not only in the description or metadata of the articles, but also in the data itself. It is also possible to preview the data directly from the search results, as well as download the complete data set. The new collaboration with DataSearch means that when you run a search in Scopus, the same search will run simultaneously in DataSearch. If data results are found, you will find a link (which lists the number of results found) on the Scopus search results page.
In addition, in this own site, Elsevier proposes 10 aspects or steps (recommendations) to take into account for the effective management of the data and its life cycle, these are:
As conclusions, open science data is a type of open data focused on the publication of observations and results of the application of scientific methods and scientific activities available for anyone to analyze and reuse. One of the main objectives of the availability of open data in science is to allow transparency and verification of scientific claims, by allowing others to see the reproducibility of the results, and allowing data from many sources to be integrated to provide new knowledge. That is why today this practice should be part of the scientific research process.