Principles
of Data Sharing
We, as Human Brain Project
principal investigators, support data sharing, and offer guidelines to promote
effective and rewarding policies and methods. These proposals, based on our
individual and collaborative efforts, are designed to reduce technological and
sociological barriers to data sharing. We offer these guidelines as
investigators, neither as representatives of our several institutions nor as
NIH policy. Although this perspective is based on our experience advancing neuroinformaticsÑinformation
technology for exchange and analysis of neuroscience dataÑthese proposals are
designed to serve the spectrum of biomedical investigations.
¥ Data sharing is a fundamental
and vital component of science, and research data produced with public funding
should in general be available to the community as a scientific resource.
¥ Sharing of data has great
potential for advancing our understanding. Sharing will both enhance the
utility of existing data and promote competition in the marketplace of
scientific ideas. It will permit reanalyses and meta-analyses beyond the focus
or time constraints of the original data collectors. Informed by shared data,
new hypotheses can be advanced; current hypotheses can be re-tested on new
data. Archived data can be used as well to develop or validate new analytic
methods or technology
¥ Current NIH policy mandates
data sharing for high-direct-cost grantees. We urge broader, voluntary, sharing
of data, and adoptions of norms for the proper presentation and use of
shareable data.
¥ Such norms should properly
extend those of publication, universally recognized as an essential component
of research:
- Just as results are published
freely and openly, without restrictions, so most data should be made available
for sharing, consonant with appropriate privacy or proprietary restrictions.
- Just as the rewards of
publication are universally recognized to outweigh the risks, with active
competition for placement in high-citation journals, so data sharing should be
encouraged and soughtÑand rewarded.
- Just as publication is timely,
so data should be made available without delay.
- Just as publications are
citable archives, so shared data and its locators should be maintained.
- Just as citation of othersÕ publications
is essential to scientific communication, so citation and acknowledgment of
shared data should be required.
- Just as publication costs are
recognized as appropriate direct costs of research, so expenses of data sharing
should be supported.
¥ Sharing should not imply
relinquishing. Re-use of data should require clear and prominent attribution of
the data and acknowledgement of the originator; a standard citation and credit
paradigm should be developed and adopted by the bioinformatics community. For
extensive or targeted re-use of some classes of data, a collaboration between
submitter and user, or an accompanying comment by the submitter, may be
appropriate. Investigators and grantee institutions, not third-party
repositories, should retain ownership of shareable data. Commercial
exploitation of shared data without appropriate recognition, including possible
compensation, may represent misuse of intellectual property.
¥ The scope of sharable data should
be acknowledged as variable and dependent upon the standards and practices of
different fields or techniques. Consequently, a variety of models for data
sharing may be adopted, including both central databases and peer-to-peer
solutions.
¥ The magnitude of the efforts
needed to convert local research data into a form that can be distributed and
shared must be recognized. The volume of data produced by some techniques can
be immense, and even mid-scale data storage imposes requirements for cataloging
or indexing as well. Methods are needed to let potential users know that data
are available for sharing, what the data represent, and how they may be
selected, obtained, and used.
¥ Detailed metadata descriptions
including protocols and analytic specifications are required to annotate many
classes of biomedical data. Submitters have the responsibility to supply, and
recipients to interpret, such metadata. Without sharing of metadata as well as
data, post-hoc analyses are open to misinterpretation. Such misreading could
lead to the publication of unwarranted results that might improperly cast doubt
upon the conclusions of the original work, or impugn unfairly the competence or
scientific integrity of the original investigators.
¥ Recognized technological and
descriptive standards for data and metadata description, archiving, and
exchange will enable the technological and sociological infrastructure needed
to implement data sharing. Developing and adopting standards is desirable as
well for interoperability: coordinating disparate data resources without the need
for individual database-to-database negotiation to link types of data and
descriptors.
¥ Appropriate de-identification
techniques should allow sharing of human data while maintaining privacy
required by both HIPAA and the Common Rule.
¥ Finally, we urge continued
encouragement and support for the development of informatics methods enabling
investigators to share data with accuracy, accountability, responsibility, and
recognition.
Daniel Gardner
Giorgio A. Ascoli
Jackson Beatty
James F. Brinkley
Anders M. Dale
Peter T. Fox
Esther P. Gardner
John S. George
Nigel Goddard
Kristen M. Harris
Edward H. Herskovits
Michael Hines
Gwen A. Jacobs
Russell E. Jacobs
Edward G. Jones
David N. Kennedy
Daniel Y. Kimberg
John C. Mazziotta
Perry Miller
Susumu Mori
David C. Mountain
Allan L. Reiss
Glenn D. Rosen
David A. Rottenberg
Gordon M. Shepherd
Neil R. Smalheiser
Kenneth P. Smith
Tom Strachan
Arthur W. Toga
David C. Van Essen
Robert W. Williams
Stephen T.C. Wong