Digital Science: reproducibility and visibility in Astronomy

Most of the science done in Astronomy is digital science, from observing proposals to final publications, data and software used: each of the elements and actions involved in the overall process could be recorded in electronic format. This fact does not prevent that the final result of an experiment is still difficult to reproduce. At the same time, we have a rich infrastructure of observational data and publications. This could be used more efficiently if greater visibility of the scientific production is achieved and seamless reproducibility guaranteed, which would avoid duplication of effort and reinvention.

 

In J.E. Ruiz et al. 2013, we presented the results achieved up to that time by the Wf4Ever project. In particular, we showed how the use of scientific workflows as digital characterization of the methodology may boost visibility and reproducibility of the scientific outcome, hence its discovery, re-use and a more efficient exploitation of present astronomical archives, computational infrastructures and observational facilities.

Workflow-centric Research Objects as tool to boost reproducibility

The scientific workflows provide a comprehensive view and clear scientific interpretation of the experiment as well as the automation of the method, going beyond the usual pipelines that normally end up in automated data processing.
In Wf4Ever, the Research Object concept encompasses the digital workflows involved in the scientific experiment, the provenance of their executions and links to all the related resources upon which they depend. To ensure the long-term preservation of the scientific methodology the ROs should be stored in semantic repositories that facilitate their discovery, access, inspection, exploitation and distribution among the community.

Scientific workflows as a tool to improve the transparency of the experiment.

Automation of tasks is a pressing concern that has been successfully solved in Astronomy with scripting in different program languages and environments, depending on the specific astronomical domain of research. Consequently, the added value arising from the migration of existing scripts into workflows is not the automation of the process, but the improvement in the transparency of the experimental protocol. This allows the astronomer to precisely know how to execute the experiment, what datasets are needed and how to set-up the execution environment. This knowledge is hidden in the scripts, preventing the reproducibility of the experiment and hampering the replicability of digital science.

Scientific workflows as living tutorials

Scientists may visualize the actions performed by the workflows as they progress in their executions, allowing them to practice self-learning by the example, which expedites training and avoids reinvention.

Information on authoring and credit attribution in Research Objects

Information on authoring and credit attribution is useful to achieve long sought citation rates, but most important this entails responsibility. We consider of great relevance the possibility to register the tuples user-annotation, as well as the provenance indicators whowhen and why, in order to know whom to blame and ask for specific issues related to his/her contribution. This practice should in principle modify the existing citation system, enabling credit attribution to specific parts of the experiment as well as different roles in the contribution.