As discussed in the editorial of the May 2023 issue of C&RL, “Is C&RL Ready for a Data Sharing Policy?”, after a few years of research on the current state of journal data sharing policies and an engagement survey, the C&RL Editorial Board and Editor decided to not implement a data sharing policy in the foreseeable future due to multiple concerns. Instead, C&RL planned to develop and maintain a guide in order to help authors get familiar with the concepts and practices of research data management and sharing. The contents of the guide are presented below.
Research data refers to primary resources underlying the results described in the manuscript and is needed to reproduce and validate the results. It includes, but is not limited to, spreadsheets, text files, interview recordings or transcripts, images, videos, and computer code or scripts. Data files are normally accompanied by documentation describing the file contents (e.g., data dictionaries, codebooks, and ReadMe files) for the purpose of understanding and reuse.
Over the past two decades, research data management and sharing have been recognized as important scholarly activities in the research lifecycle. Besides meeting funding agencies' requirements, responsible data management (e.g., increasing the transparency of the research process and facilitating reproducibility and innovative reuse of data) is becoming an essential part of research excellence in this digital and open scholarship era. Data sharing is conducive to safeguarding research integrity and strengthening the public's trust in science.
The Association of College and Research Libraries (ACRL) has been providing information professionals with training in research data management and sharing since the beginning of the open science/scholarship movement. Academic librarianship as a profession has the unique mission to address scholarly communication issues. Practitioners in academic and research libraries who are also researchers and authors thus ought to put the new standard of scholarship into practice when publishing in C&RL.
Research data sharing goes beyond positing a data file on a website or emailing data to interested colleagues. Researchers should prepare and publish their data in a repository as a scholarly product by following established standards, such as the FAIR (Findability, Accessibility, Interoperability, and Reuse) guiding principles and other data sharing best practices.
Authors are encouraged to develop a research data management plan (DMP) before they start a research project and take necessary steps to ensure their data is sharable and FAIR at the end. These strategies may include obtaining research participants’ consent of data sharing, negotiating and making decisions on data ownership and sharing license or agreement, budgeting for data de-identification and documentation, and contacting data repositories for depositing, curation, and preservation support. Authors from the US and Canada may use DMPTool and DMP Assistant respectively to facilitate their data management planning.
To enable data reuse, properly structured, openly formatted, and consistently named data files need to be accompanied by a ReadMe file and other necessary documentation that makes the data understandable while being deposited in trusted data repositories. For detailed guidance on quantitative and qualitative data preparation for long-term access, check out the Guide to Social Science Data Preparation and Archiving by the Inter-university Consortium for Political and Social Research (ICPSR).
While considering sharing data with the research community and making research reproducible, authors need to handle data ethically, responsibly, and with integrity, especially when providing access to sensitive data, e.g., library user data and indigenous data. The general principle is that research data should be "as open as possible and as closed as necessary." Authors are required to carefully assess potential data exposure risks and take necessary ethical consideration and technical procedures to protect the privacy of research participants throughout the research data lifecycle and at the end of the research project. Some recommended practices include, but are not limited to:
Archive highly sensitive or regulated data on a secured server or encrypt it on a computer. Such data should not be shared publicly, put on unsecured cloud servers or unsecured computers;
Be aware that even de-identified data might still have a certain level of re-identification risk. Therefore, it is better to deposit it in a secured repository that supports restricted access;
When reporting results in the manuscripts, authors need to carefully examine the possibilities of identifying individuals through small numbers from the summary statistics or a combination of indirect identifiers.
Authors who need assistance should consult with the Office of Ethics, data management librarians, and/or the Information Technology Department at their institutions (if available).
Authors could check out the NIH Guide on Protecting Participant Privacy When Sharing Scientific Data for further guidance on providing secure access to human participant data. ACRL’s Learning Analytics Toolkit: Privacy and Ethics is especially helpful for handling library user data. The CARE Principles for Indigenous Data Governance is a widely recognized guidance for respecting Indigenous communities and engaging with Indigenous Peoples’ rights and interests in data management decisions and practices. Authors may also want to check out the Qualitative Data Repository’s Human Participants - General Guide and a brief Data De-Identification guide (LDbase) for quantitative data.
Providing access to the underlying data of a manuscript shows the rigor of the research by allowing for validation of the results, replication of the analysis, and potential reuse of the data by a broader research community and even by the public. This requires authors to present data in a complete and high-quality manner, and they should not intentionally hide some data, fabricate data, or manipulate data to support only positive findings.
In addition, sharing research data is such a complicated social action that authors have ethical and legal obligations to:
Take necessary steps to evaluate risks and avoid harms to research participants and their communities, for example, by obtaining informed and transparent data sharing permissions from the participants, honoring indigenous communities' ownership and authority, and properly de-identifying data before sharing.
Verify all data co-authors and co-contributors, and reach agreement on data authorship, contribution recognition, and author order. When depositing data into a repository, the depositor or curator should provide accurate metadata concerning authorship, declare competing interests during the submission process, and record changes and dynamics of contributions in the data documentation.
Ensure that authors have the right or permission to deposit the data if the material is copyright-protected. Authors should check their institutions’ policy or collective agreement on data ownership and the researchers' rights and responsibilities concerning research data.
Respect and adhere to relevant commercial data use agreement, national or regional legislations, international treaties and frameworks, and local laws that might restrict or regulate certain types of data redistribution and licensing practices.
Authors are encouraged to curate and deposit research data in repositories with the desirable characteristics SPARC has identified. In particular, the selected repository should:
Be managed and maintained to ensure free, long-term access to the deposited data
Assign a persistent identifier, such as digital object identifier (DOI), to the dataset to enhance its online discoverability
Collect essential metadata (e.g., licensing information) to facilitate reuse of the data
In general, repositories suitable for archiving research data include:
Repository managed by the author’s affiliated institution
Domain-specific repository, especially if it already provides access to similar research data, e.g., Inter-university Consortium for Political and Social Research (ICPSR) for both quantitative and qualitative data and Qualitative Data Repository for qualitative data, both of which could provide restricted and controlled access if needed and requested by authors
Generalist repository that is commonly used for research data sharing, e.g., Zenodo or Harvard Dataverse. Please note that a generalist repository may charge a fee for its curation service
Authors can use the Registry of Research Data Repositories to find appropriate options. If they have questions about depositing research data in a particular repository, they should contact the repository administrator and request guidance. Alternatively, they can consult the research data librarian at their institutions (if available).
Authors are encouraged to provide a data availability statement in the manuscript submission process. It is recommended that the statement specify:
Whether research data was generated or used in the research process
Where or how to access the data (if applicable)
Reason(s) why the data is not shared (if applicable)
Below are some examples of the data availability statement. For additional guidance on writing the statement, please consult the following resources:
Sample statement for shared research data:
“The data and analysis files for this article can be found at: https://doi.org/10.3886/E161561V1”
[Source: Allensworth, E., Cashdollar, S., & Cassata, A. (2022). Supporting change in instructional practices to meet the common core mathematics and next generation science standards: How are different supports related to instructional change? AERA Open, 8, 233285842210880. https://doi.org/10.1177/23328584221088010]
Sample statement for research data with access restrictions:
“The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.”
[Source: Wu Young, M. Y., Garza, R. M., & Chang, D. W. (2022). “Immediate versus delayed autologous breast reconstruction in patients undergoing post‐mastectomy radiation therapy: A paradigm shift.” Journal of Surgical Oncology, 126(6), 949–955. https://doi.org/10.1002/jso.27005]
Sample statement for research without use of data:
“No data was used for the research described in the article.”
[Source: Frank, J., Foster, R., & Pagliari, C. (2023). Open access publishing – noble intention, flawed reality. Social Science & Medicine, 317, 115592. https://doi.org/10.1016/j.socscimed.2022.115592]
Authors who have reused existing research datasets in their projects should specify the creator, title, and source of the datasets in their manuscripts. Additionally, they should provide a citation of the datasets in APA style in the reference section. Guidance on citing research datasets is available from this APA style guide. ZoteroBib and the DOI Citation Formatter are free online tools that generate dataset citations based on the datasets' digital object identifiers (DOIs). Below are some examples of dataset citation for reference.
Della Libera, K., Strandburg-Peshkin, A., Griffith, S., & Leu, S. T. (2023). Data from: Fission-fusion dynamics in sheep: The influence of resource distribution and temporal activity patterns [Data set]. Dryad. https://doi.org/10.5061/DRYAD.59ZW3R2D6
Kaplan, S. (2023). COEP replication package for “Leveling the playing field: The distributional impact of maximum- and minimum-level contracts on player compensation” [Data set]. ICPSR. https://doi.org/10.3886/E192724V1
Lin, T. (2021). A quantitative approach to study the adaptation of rhythmic eye movements in larval zebrafish [Data set]. Zenodo. https://doi.org/10.5281/ZENODO.5529443
Ridge, H. (2023). Replication data for: Democratic commitment in the Middle East: A conjoint analysis [Data set]. Harvard Dataverse. https://doi.org/10.7910/DVN/GBJVAF
Authors who have questions about depositing research data in a particular repository should contact the repository administrator and request guidance. Alternatively, they can consult the research data librarian at their institutions (if available).