Data Sources

  • Search Strategies
  • Quick Statistics
  • Business & Economics This link opens in a new window
  • Demographics
  • Health, Nursing, Nutrition
  • International/Area Studies
  • Labor/Workforce
  • Public Opinion
  • U.S. Census ↗ This link opens in a new window
  • Voting & Elections
  • Gender & Sexuality
  • Data Archives
  • Data Portals
  • GIS/Spatial Data ↗ This link opens in a new window
  • Textual Data
  • Software & Analysis

Citing data

More resources for citing data.

  • Classes & Workshops
  • Resources for Politics Honors Seminars

General Information

For assistance, please submit a request .  You can also reach us via the chat below, email [email protected] , or join Discord server .

If you've met with us before,                        tell us how we're doing .

Service Desk and Chat

Bobst Library , 5th floor

Staffed Hours: Summer 2024

Mondays:  12pm - 5pm         Tuesdays:  12pm - 5pm         Wednesdays:  12pm - 5pm         Thursdays:  12pm - 5pm         Fridays:  12pm - 5pm        

Data Services closes for winter break at the end of the day on Friday, Dec. 22, 2023. We will reopen on Wednesday, Jan. 3, 2024.

Creative Commons License logo.

Data should be cited within our work for the same reasons journal articles are cited: to give credit where credit is due (original author/producer) and to help other researchers find the material. If you use data without citation, that is deeply problematic for academic integrity as well as reproducibility purposes. Pay attention to licenses (here's a page on those) and give attribution!

A data citation includes the typical components of other citations:

Author or creator: the entity/entities responsible for creating the data Date of publication: the date the data was published or otherwise released to the public Title: the title of the dataset or a brief description of it if it's missing a title Publisher: entity responsible for hosting the data (like a repository or archive) URL or preferably, a DOI: a link that points to the data Data Accessed: since most data are published without versions, it's important to note the time that you accessed the data in case newer releases are made over time.

Citation standards for data sets differ by journal, publisher, and conference, but you have a few options generally (depending on the situation):

  • Use the format of a style manual as determined by a publisher or conference, such as IEEE or ACM. If you use a citation manager (highly recommended for organizing research reading!) like Zotero (which we support at NYU - check out our Zotero guide ), you can have them export your citations in whatever format you need.
  • Use the author or repository's preferred citation that they list on the page where you downloaded the data initially.

Here's an example of how to find the citation information for a dataset hosted on Zenodo , a generalist repository that houses data, code, and more:

All scholarly or academic work requires that you cite your sources, whether you are writing a long paper or a quick report. Why is citing your research so important?

Researching and writing a paper ideally involves a process of exploring and learning. By citing your sources, you are showing your reader how you came to your conclusions and acknowledging the other people's work that brought you to your conclusions. Citing sources:

  • Documents your research and scholarship
  • Acknowledges the work of others whose scholarship contributed to your work
  • Helps your reader understand the context of your argument
  • Provides information for your reader to use to locate additional information on your topic
  • Establishes the credibility of your scholarship
  • Provides you with an opportunity to demonstrate your own integrity and understanding of academic ethics

Partially adapted from "When and Why to Cite Sources." SUNY Albany. 2008. Retrieved 14 Jan 2009.

  • Data-Planet Data Basics Data Basics is a module in Data-Planet that provides resources and examples for citing datasets and statistics when incorporating them into research.
  • IASSIST Quick Guide to Data Citation Includes examples from APA, MLA, and Chicago styles.
  • How to Cite Data A comprehensive guide with examples from Michigan State University Libraries.
  • << Previous: Software & Analysis
  • Next: Classes & Workshops >>
  • Last Updated: Jun 28, 2024 3:45 PM
  • URL: https://guides.nyu.edu/datasources

APA Style 7th Edition: Citing Your Sources

  • Basics of APA Formatting
  • In Text Quick View
  • Block Quotes
  • Books & eBooks
  • Thesis/Dissertation
  • Audiovisual
  • Conference Presentations
  • Social Media
  • Legal References

Standard Format

Formatting rules, various examples.

  • Reports and Gray Literature
  • Academic Integrity and Plagiarism
  • Additional Resources
  • Reference Page

 

Author, A. A. & Author, B. B.

Name of Group

 

 (year).

 (range of years). 

(Version #) [Data set].

[Unpublished raw data].

[Description of untitled data set] [Unpublished raw data].

Publisher Name.

Source of Unpublished Data.

 

https://doi.org/xxxx....

https://xxxx...

Retrieved Month date, year, from https://xxxx

Adapted from American Psychological Association. (2020). Publication manual of the American Psychological Association (7th ed).  https://doi.org/10.1037/0000165-000

  • Provide a retrieval date only if the data set is designated to change over time
  • Date for published data is the year of publication
  • Date for unpublished data is the year(s) of collection
  • If version number exists, include in parentheses after the title
Data Set

Cohen, M. A., & Miller, T. R.  (1991). (ICPSR 6581, Version V1) [Data set].  ICPSR. https://doi.org/10.3886/ICPSR06581.v1

Unpublished raw data

Evans, S. K. (2014).  [Personnel survey] [Unpublished raw data].  University of Southern California.

See Ch. 10 pp. 313-352 of APA Manual for more examples and formatting rules

  • << Previous: Legal References
  • Next: Reports and Gray Literature >>
  • Last Updated: Aug 8, 2024 10:42 AM
  • URL: https://libguides.usc.edu/APA7th
  • University of Michigan Library
  • Research Guides

Citation Help

  • Citing Data & Statistics

Introduction

  • Getting Started
  • Citing Government Documents
  • Bibliography Tools

Data requires citations for the same reasons journal articles and other types of publications require citations: to acknowledge the original author/producer and to help other researchers find the resource.

A dataset citation includes all of the same components as any other citation, and although data citation practices are still emerging, including data you use (or create) in your references section will allow others to locate it, and ensures that its use is captured correctly to become part of the scholarly record:

  • year of publication,
  • publisher (for data this is often the archive where it is housed),
  • edition or version, and
  • access information (a URL or other persistent identifier).

Unfortunately, standards for the citation of data are not uniformly agreed upon and have yet to be codified by the National Information Standards Organization (an organization that sets technical standards for other bibliographic materials).  However, many data providers and distributors and some style manuals do provide guidelines.  Some of these instructions are listed on this guide.

Be sure to follow the general citation format for the style manual your professor has asked you to use.  It is always better to provide more information about a resource rather than less!

General Rules

Some style manuals do provide instructions for the citation of data, and selected examples are listed on the Data Citations tab.  If the style manual you are using does not address data citations, you can follow these general rules.

Usually a style manual will lay out basic rules for the order of citation elements, regardless of the type of work.  This is what you will need to pay close attention to in order to format your citation correctly.  If you can’t find a generic list of rules, then look at how the citation for a book is formatted. 

These are the citation elements you need to consider when building a data citation:

Who is the creator of the data set?  This can be an individual, a group of individuals, or an organization.

What name is the data set called, or what is the name of the study? 

Edition or Version

Is there a version or edition number associated with the data set?

What year was the data set published?  When was the data set posted online?

Is there a person or team responsible for compiling or editing the data set?

Publisher and Publisher Location

What entity is responsible for producing and/or distributing the data set?  Also, is there a physical location associated with the publisher? 

In some cases, the publisher of a data set is different than how we think of the publisher of a book.  A data set can have both a producer and a distributor.

The producer is the organization that sponsored the author’s research and/or the organization that made the creation of the data set possible, such as codifying and digitizing the data.

The distributor is the organization that makes the data set available for downloading and use. 

You may need to distinguish the producer and the distributor in a citation by adding explanatory brackets, e.g., [producer] and [distributor].

Some citation styles (e.g., APA) do not require listing the publisher if an electronic retrieval location is available.  However, you may consider including the most complete citation information possible and retaining publisher information even in the case of electronic resources.

Material Designator

What type of file is the data set?  Is it on CD or online? 

This may or may not be a required field depending on the style manual.  Often this information is added in explanatory brackets, e.g. [computer file].

Electronic Retrieval Location

What web address is the data set available at?  Is there a persistent identifier available?  If a DOI or other persistent identifier is associated with the data set it should be used in place of the URL.

Examples using the General Rules

Apa (6th edition).

Minimum requirements based on instructions and example for dataset reference:

Milberger, S. (2002). Evaluation of violence against women with physical disabilities in Michigan, 2000-2001 (ICPSR version) [data file and codebook]. doi:10.3886/ICPSR03414

With optional elements:

Milberger, S. (2002). Evaluation of violence against women with physical disabilities in Michigan, 2000-2001 (ICPSR version) [data file and codebook]. Detroit: Wayne State University [producer]. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor]. doi:10.3886/ICPSR03414

MLA (7th edition)

Minimum requirements based on instructions and examples for books and web publications:

Milberger, Sharon. Evaluation of Violence Against Women With Physical Disabilities in Michigan, 2000-2001 . ICPSR version. Inter-university Consortium for Political and Social Research, 2002. Web. 19 May 2011.

Milberger, Sharon. Evaluation of Violence Against Women With Physical Disabilities in Michigan, 2000-2001 . ICPSR version. Detroit: Wayne State U [producer]. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2002. Web. 19 May 2011. doi:10.3886/ICPSR03414

Chicago (16th edition)

Bibliography style (based on documentation for books):

Milberger, Sharon. Evaluation of Violence Against Women With Physical Disabilities in Michigan, 2000-2001 . ICPSR version. Detroit: Wayne State University, 2002. Distributed by Ann Arbor, MI: Inter-University Consortium for Political and Social Research, 2002. doi:10.3886/ICPSR03414.

Author-Date style:

Milberger, Sharon. 2002. Evaluation of Violence Against Women With Physical Disabilities in Michigan, 2000-2001 . ICPSR version. Detroit: Wayne State University. Distributed by Ann Arbor, MI: Inter-University Consortium for Political and Social Research. doi:10.3886/ICPSR03414.

Citing Census Data and Maps

When you make a table in data.census.gov you can click on "More Tools" in the upper right corner and then select Cite

Basic Format for APA 6th edition

U.S. Census Bureau (year data was published).  Name of data or report.  Retrieved from [URL].

Chicago Style 16th ed.

United States Census Bureau. Name of  Table . Data.Census.Gov <URL> (The date the Table was generated)

  • Citing Census Data in Social Explorer Guidance on citing census data in Social Explorer for APA, Chicago and MLA styles

Ask A Librarian

  • Collections
  • Research Help
  • Teaching & Learning
  • Library Home

APA 7th Edition Citation Style Guide

  • Basics & Help
  • Journal Articles
  • Web Sources
  • Magazine & News Articles
  • Audiovisual Media

Technical & Research Reports

Tests, scales, & inventories.

  • Legal Documents
  • Dissertations & Theses
  • References Page
  • In-text Citations
  • Author Variations (more than 1, or group)
  • Citing Business Resources

General Rule:

Author. (Year). Title of report  (Report No. if given). Publisher. DOI or URL

  • If the author and the publishing agency are the same omit the publisher from the citation. 

  Federal Interagency Forum on Child and Family Statistics. (2013). America’s children: key national indicators of well-being. http://childstats.gov/americaschildren/index2.asp.

Author or name of group. (Year). Title of data set [description of form]. Publisher Name or Source of

unpublished data. Retrieved month day, year, from DOI or URL

  • Include a retrieval date only if the data set is designed to change over time. 
  • If a version number and/or database number is available include it with the data set title. 
  • No need to include a publisher name if it is the same as the author.
  • If the data is unpublished provide the source (e.g. university) if known. 
  • If the dataset is untitled, give a description of the data and publication status in square brackets.

Pew Internet & American Life Project. (2012). November 2012- library services [Data file and code book]. http://www.pewinternet.org/Shared-Content/Data-Sets/2012/November-2012--Library-Services.aspx

Jeffri, J., Schriel, A., & Throsby, D. (2003) The aDvANCE Project: A study of career transition for professional dancers (ICPSR 35598; Version V1)  [Data set].  IPCSR.  https://doi.org/10.3886/ICPSR35598.v1  

Whenever possible, give a citation for the measurements' supporting literature (e.g. manual, book, or journal article ). If the supporting literature is unavailable, cite the the test itself or database record using the following rule.

Author name. (year).  Title of the test. URL

Author name. (year).  Title of the test database record [Database record] . Test Database Name. URL

Hofstede , G &  Hofstede , G. J. (2013). Values Survey Module 2013 .  https://geerthofstede.com/research-and-vsm/vsm-2013/

Castellanos, I., Kronenberger, W.G., & Pisoni, D.B.   ( 2018 ). Learning, Executive, and Attention Function Scale

(LEAF) [Database record]. PsycTESTS.  https://doi.org/10.1037/t66008-000

  • << Previous: Audiovisual Media
  • Next: Legal Documents >>
  • Last Updated: Aug 1, 2024 11:13 AM
  • URL: https://libguides.wvu.edu/apa

UC San Diego

  • Research & Collections
  • Borrow & Request
  • Computing & Technology

UC San Diego

How to Cite - Tools, Tricks, & Tips for Managing Citations: Cite Data or Statistics

  • Need More Comprehensive Comparisons?
  • EndNote Desktop
  • EndNote Web
  • Citation Generators -- A Short List
  • Style Guides
  • Getting Help
  • Cite Articles or Books
  • Cite Data or Statistics
  • Avoiding Plagiarism
  • Writing and Grammar

Citing Data

What do i need to cite data or statistics.

Data requires citations for the same reasons journal articles and other types of publications require citations: to acknowledge the original author/producer and to help other researchers find the resource.

Some style manuals provide instructions for the citation of data, and selected examples are listed below. If the style manual you are using does not address data citations, you can follow these general rules below. Be sure to follow the general citation format for the style manual your professor has asked you to use. It is always better to provide more information about a resource rather than less!

These are the citation elements you need to consider when building a data citation:

Author : Who is the creator of the data set?  This can be an individual, a group of individuals, or an organization.

Title : What name is the data set called, or what is the name of the study? 

Edition or Version : Is there a version or edition number associated with the data set?

Date : What year was the data set published?  When was the data set posted online?

Editor : Is there a person or team responsible for compiling or editing the data set?

Publisher and/or Distributor : What entity is responsible for producing and/or distributing the data set?  Also, is there a physical location associated with the publisher? 

In some cases, the publisher of a data set is different than how we think of the publisher of a book.  A data set can have both a producer and a distributor.

The producer is the organization that sponsored the author’s research and/or the organization that made the creation of the data set possible, such as codifying and digitizing the data.

The distributor is the organization that makes the data set available for downloading and use. 

You may need to distinguish the producer and the distributor in a citation by adding explanatory brackets, e.g., [producer] and [distributor].

Some citation styles (e.g., APA) do not require listing the publisher if an electronic retrieval location is available.  However, you may consider including the most complete citation information possible and retaining publisher information even in the case of electronic resources.

Material Designation : What type of file is the data set? 

For example, is it on CD-ROM or online?

This may or may not be a required field depending on the style manual.  Often this information is added in explanatory brackets, e.g. [computer file].

Electronic Location or Identifier : What web address is the data set available at?  Is there a persistent identifier available? 

If a DOI or other persistent identifier is associated with the data set it should be used in place of the URL.

Examples with the General Rules

  • APA 7th Citation Style
  • Chicago Citation Style
  • MLA 7th Citation Style

Minimum requirements based on instructions and example for dataset reference:

Bibliography:

Milberger, S. (2002). (ICPSR version) [data file and codebook]. doi:10.3886/ICPSR03414

In-Text Citation: (Milberger, 2002)

With optional elements:

Bibliography:

Milberger, S. (2002). (ICPSR version) [data file and codebook]. Detroit: Wayne State University [producer]. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor]. doi:10.3886/ICPSR03414

In-Text Citation: (Milberger, 2002)

Bibliography style (based on documentation for books):

Bibliography:

Milberger, Sharon. ICPSR version. Detroit: Wayne State University, 2002. Distributed by Ann Arbor, MI: Inter-University Consortium for Political and Social Research, 2002. doi:10.3886/ICPSR03414.

In-Text Citation: (Milberger)

Author-Date Style

Bibliography:

Milberger, Sharon. 2002. ICPSR version. Detroit: Wayne State University. Distributed by Ann Arbor, MI: Inter-University Consortium for Political and Social Research. doi:10.3886/ICPSR03414.

In-Text Citation: (Milberger, 2002)

Minimum requirements based on instructions and examples for books and web publications.

Bibliography:

Milberger, Sharon. ICPSR version. Inter-University Consortium for Political and Social Research. 2002. Web. 19 May 2011.

In-Text Citation: (Milberger, 2002)
Bibliography:

Milberger, Sharon. Detroit: Wayne State U [producer]. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2002. Web. 19 May 2011. doi:10.3886/ICPSR03414

In-Text Citation: (Milberger, 2002)

More Examples of Citing Data and Statistics

  • Data - Style Manual Examples
  • Data - Archive Examples
  • Data - Publisher Examples
  • Statistical Tables - Style Manual Examples
  • Statistical Tables - Government Reports Examples

APA 6th edition

Basic form: Author/Rightsholder. (Year). Title of data set (Version number) [Description of form]. Location: Name of producer. or Author/Rightsholder. (Year). Title of data set (Version number) [Description of form]. Retrieved from http://  

Example:   Pew Hispanic Center. (2008).  2007 Hispanic Healthcare Survey  [Data file and code book]. Retrieved from http://pewhispanic.org/datasets/

Unpublished raw data from study, untitled work

Basic form:   Author, F. N. (Year). [Description of study topic]. Unpublished raw data.

Example: Smith, J.A. (2006). [Personnel survey]. Unpublished raw data.

APA Style Guide to Electronic References

Pew Hispanic Center. (2008).  2007 Hispanic Healthcare Survey  [Data file and code book]. Available from Pew Hispanic Center Web site: http://pewhispanic.org/datasets/

Note: Available from, rather than Retrieved from, indicates that the URL takes you to a download site, rather than directly to the data set file itself.

Graphic Representation of Data

Centers for Disease Control and Prevention. (2005). [Interactive map showing percentage of respondents reporting "no" to, During the past month, did you participate in any physical activities?]. Behavioral Risk Factor Surveillance System. Retrieved from http://apps.nccd.cdc.gov/gisbrfss/default.aspx

APA 5th edition

APSA Style Manual for Political Science

For a complete description of citation guidelines refer to the  APSA Style Manual for Political Science .

Data Archived and Available at the Inter-university Consortium for Political and Social Research (ICPSR)

Eldersveld, Samuel J., John E. Jackson, M. Kent Jennings, Kenneth Lieberthal, Melanie Manion, Michael Oksenberg, Zhefu Chen, Hefeng He, Mingming Shen, Qingkui Xie, Ming Yang, and Fengchun Yang. 1996. Four-County Study of Chinese Local Government and Political Economy, 1990 [computer file] (Study #6805). ICPSR version. Ann Arbor, MI: University of Michigan/Beijing, China: Beijing University [producers], 1994. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 1996.

American Sociological Association Style Guide

Machine-Readable Data Files

CBS News. 2009.  CBS News Poll: Energy USCBS2009-02A Version 2  [ MRDF]. New York: CBS News [producer]. Storrs, CT: The Roper Center for Public Opinion Research, University of Connecticut [distributor].

ICPSR Data Archive

Duncan, Otis D., and Howard Schuman. Detroit Area Study, 1971: Social Problems and Social Change in Detroit [Computer file]. ICPSR07325-v2. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 1997. doi:10.3886/ICPSR07325

Read the FAQ page,  Why and How Should I Cite Data? , for additional information on citing ICPSR datasets, as well this Quick Guide to Data Citation .

Manuscripts and dissertations based on ICPSR data should be submitted for inclusion in the ICPSR  Bibliography of Data-Related Literature .

Roper Center for Public Opinion Research Data Archive

Cable News Network & USA Today. Gallup/CNN/USA Today Poll: Aftermath of Hurricane Katrina [computer file]. 1st Roper Center for Public Opinion Research version. Lincoln, NE: Gallup Organization [producer], 2006. Storrs, CT: The Roper Center, University of Connecticut [distributor], 2006.

Read the  How to Cite Roper Center data  page for additional information.

Papers published based on Roper Center data may be submitted to the  Bibliography of publications using data from the Roper Center .

Dataverse Network

Gary King; Langche Zeng, 2006, "Replication Data Set for 'When Can History be Our Guide? The Pitfalls of Counterfactual Inference'"  hdl:1902.1/DXRXCFAWPK  UNF:3:DaYlT6QSX9r0D50ye+tXpA== Murray Research Archive [distributor]

Read the  Academic Credit  page at Dataverse for additional information.

National Center for Education Statistics

Kroe, E. (2002).  Data File (Public-Use): Public Libraries Survey, Fiscal Year 1994  (NCES 2003–304). U.S. Department of Education, National Center for Education Statistics. Washington, DC: 2002.

Holton, B., and George, A. (2007). Data File and Documentation, Public Use: Academic Libraries Survey (ALS): Fiscal Year 1996 (NCES 2008-318). U.S. Department of Education. Washington, DC: National Center for Education Statistics. Retrieved [date] from http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2008318.

Centers for Disease Control/National Center for Health Statistics

National Center for Health Statistics. National Ambulatory Medical Survey, 1994. Public-use data file and documentation. ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/. 1996.

Read the Citations for  NCHS Publications and Electronic Media  page for more information.

Centers for Disease Control and Prevention. (2005). [Interactive map showing percentage of respondents reporting "no" to, During the past month, did you participate in any physical activities?].  Behavioral Risk Factor Surveillance System . Retrieved from http://apps.nccd.cdc.gov/gisbrfss/default.aspx

Citing Specific Parts of a Source

For in-text citations, indicate the page, chapter, figure, or table within the paranthetical citation.

Basic form:

(Author, Year, Table #)

(National Center for Education Statistics, 2008, Table 3)

Entry in a Reference Work APA does not provide specific information on how to cite a statistical table, but use this general format to cite part of a source (e.g. a statistical table) in the bibliography.

Author. (Year). Title of entry. In Editor (Eds.),  Title of reference book  (pp. xxx-xxx). Retrieved from http:// OR Location: Publisher OR doi:xxxx.

U.S. Department of Education, National Center for Education Statistics. (2009).  Table 151: Percentage of public and private high school graduates taking selected mathematics and science courses in high school, by sex and race/ethnicity: Selected years, 1982 through 2005. In U.S. Department of Education, National Center for Education Statistics (Ed.),  Digest of Education Statistics  (2009 ed.). Retrieved from http://nces.ed.gov/programs/digest/d09/tables/dt09_151.asp.

American Veterinary Medical Association. (2010). Table 1204: Household Pet Ownership: 2006. In U.S. Census Bureau (Ed.),  Statistical Abstract of the United States  (129th ed.). Retrieved from http://www.census.gov/compendia/statab/2010/tables/10s1204.pdf

A work in a Reference MLA does not provide specific information on how to cite a statistical table, but use this general format adapted from the rules for citing a work in an anthology (p. 157), an article in a reference work (p. 160), and guidelines for citing electronic materials (p. 181).

Author. "Title of entry."  Title of book . Edition. Ed. Editor's name(s). Place of publication: Publisher, Year. Page range. Medium of publication.

For web publications, add date of access.  URL is optional (MLA 7th no longer requires the use of URLs as an acknowledgement that they change often).

American Veterinary Medical Association. "Table 1204: Household Pet Ownership: 2006."  Statistical Abstract of the United States . 129th ed. Ed. U.S. Census Bureau. Washington D.C.: U.S. Census Bureau, 2010. Web. 14 July 2010. <http://www.census.gov/compendia/statab/2010/tables/10s1204.pdf>.

Data Table in an Online Statistical Volume

Basic Form:

"Title of Table." In  Title of Statistical Volume . Available at: http://some.url.gov; Accessed: mo/da/yr.

"Table 385: Unemployment Rate of Persons 16 Years Old and Over, by Age, Sex, Race/Ethnicity, and Highest Degree Attained: 1996, 1997, and 1998" (PDF file; 13 kb). In  Digest of Education Statistics, 1999 . Available at: http://nces.ed.gov/pubs2000/-Digest99/tables/PDF/Table385.pdf; Accessed: 11/25/01.

American FactFinder Table

"Commuting to Work (1990 QT)—State College, PA" Part of:  Quick Table: DP-3—Labor Force Status and Employment Characteristics: 1990 . Data Set: Census of Population and Housing, 1990 (STF 3). Available at American FactFinder (Census Bureau), http://factfinder.census.gov; Accessed: 1/28/01.

" PCT5. Sex by Age:2000—Race or Ethnic Group: Black or African American—Rhode Island ." Data Set: Census, 2000 (SF2). Available at American FactFinder (Census Bureau), http://factfinder.census.gov; Accessed: 1/28/01.

Acknowledgement

This guide is adapted from the How to Cite Data LibGuide  by Hailey Mooney and Scout Calvert (Michigan State University Libraries).

  • << Previous: Cite Articles or Books
  • Next: Avoiding Plagiarism >>
  • Last Updated: Jun 7, 2024 8:35 PM
  • URL: https://ucsd.libguides.com/howtocite

University Library

Cite data and statistics.

  • How to Cite Data and Statistics
  • Citing Sources

General Guidelines

When you use numeric datasets or a prepared statistical table you must cite where you retrieved the information.  Data and statistical tables contain unique elements not specifically addressed by most citation styles.  Citations for data or statistical tables should include at least the following pieces of information, which you will need to arrange according to the citation style you use.  

  • Author or creator - the person(s), organization, issuing agency or agencies responsible for creating the dataset
  • Date of publication  - the year the dataset was published, posted or otherwise released to the public (not the date of the subject matter).
  • Title or description - complete title or  if no title exists, you must create a brief description of the data, including time period covered in the data if applicable
  • Publisher  - entity (organization, database, archive, journal) responsible for hosting the data 
  • URL or DOI   - the unique identifier if the data set is online

Certain styles may also ask for additional information such as:

  • Edition or version
  • Date accessed online (Note: APA does not require this)
  • Format description e.g. data file, database, CD-ROM, computer software

Tips for finding additional citation guidance:

  • Check to see if the publisher or distributor of your dataset provides suggestions for citing their data.  For example data providers like OECD and repositories like  ICPSR  and Dryad  offer guidance for formatting citations to the hundreds of datafiles they host or produce.
  • Look through your style manual for instructions on using a similar format such as citation styles for electronic resources, electronic references, web pages, or tables.

This guide provides information for citing data and tables to include in your bibliography.  Consult the Purdue OWL for guidance on incorporating data and statistics in the body of your paper.

This guide is intended as a guideline only, check your citation manual, ask a librarian, or confer with your professor if your specific data set does not contain the elements needed to draft a useful citation. In general, it’s better to include more information than called for than to leave out information that could help the reader locate data you cite.

Examples - APA Style

Unless otherwise noted, the basic elements and guidelines described here are from the Publication Manual of the American Psychological Association, 6th edition (McHenry Reference Desk BF 76.7 .P83 2010).  You may also wish to consult the Purdue OWL  or How to Cite Data from Michigan State University for MLA examples and explanations.

1. Include format type in brackets [ ] to describe format , not title information (e.g. data set, data file and codebook).  [See APA guidelines for "Nonroutine information in titles" (pp. 186)]

2. Use “Available from” if the URL or DOI points you to a website or information on how to obtain or download data at a general site that houses data sets. Use “Retrieved from” if the URL or DOI takes you directly to the data table or database. (APA Style Manual, 2001 ed ., pp.281 or  Purdue OWL Electronic Sources : Data Sets)

Basic Elements:  [Follow APA guidelines for "Data set" (pp. 210-211) or online from MSU ] 

Author/Rightsholder, A. A. (Year). Title of publication or data set  (Version number if available) [Data File]. Retrieved from (or available from) http://xxxx

The title of the data set should be italicized unless the data set is included as part of a larger work or volume

The World Bank, World Development Indicators (2012). GNI per capita, Atlas method  [Data file]. Retrieved from http://data.worldbank.org/indicator/NY.GNP.PCAP.CD

Example of  Table generated from an interactive data set:

Bureau of Economic Analysis, U.S. Department of Commerce (2013).  U.S. Direct Investment Abroad, All U.S. Parent Companies 2009-2010 . [Data file].  Available from BEA.gov/iTable 

II. Table from a publication  

Basic Elements: [Follow APA guidelines for "entry in a reference work" (p. 205)] 

Author. (Year). Title of entry. In Editor (Edition),  Title of publication  (pp. xxx-xxx). Retrieved from http:// OR Location: Publisher OR doi:xxxx.

Example: (Note: Editor & Edition elements are not applicable in this example)

World Trade Organization. (2012). Table I.3: World merchandise trade and trade in commercial services by region and selected economy, 2005-2011.  In International Trade Statistics, 2012  (p. 22).  Retrieved from: http://www.wto.org/english/res_e/statis_e/its2012_e/its12_toc_e.htm

The title of the data set should be italicized unless the data set is included as part of a larger work or volume , as in the example above.  

Quick Guides to Citing Data

  • ICPSR: How to Cite Data
  • IASSIST Quick Guide to Citing Data
  • How to Cite Data - Michigan State University A longer guide with many examples of how to cite datasets and statistical tables
  • Writing with Statistics - Purdue OWL Explains how to properly incorporate statistics into a paper, including inferential and descriptive statistics, and using visuals: tables, graphs, and charts
  • Census Data & Tables (American Factfinder)

What is a DOI?

DOI stands for Digital Object Identifier and is a unique number used to precisely locate electronic items like webpages, articles, files, etc.  A DOI is persistent, which means it does not "break" the way a URL can when a website is updated.

  • See: What is a DOI? (ICPSR)
  • Next: Citing Sources >>

spacer bullet

Creative Commons Attribution 3.0 License except where otherwise noted.

Library Twitter page

Land Acknowledgement

The land on which we gather is the unceded territory of the Awaswas-speaking Uypi Tribe. The Amah Mutsun Tribal Band, comprised of the descendants of indigenous people taken to missions Santa Cruz and San Juan Bautista during Spanish colonization of the Central Coast, is today working hard to restore traditional stewardship practices on these lands and heal from historical trauma.

The land acknowledgement used at UC Santa Cruz was developed in partnership with the Amah Mutsun Tribal Band Chairman and the Amah Mutsun Relearning Program at the UCSC Arboretum .

American Psychological Association

Database Information in References

Database information is seldom provided in reference list entries. The reference provides readers with the details they will need to perform a search themselves if they want to read the work—in most cases, writers do not need to explain the path they personally used.

Think of it this way: When you buy a book at a bookstore or order a copy off the internet, you do not write the name of the (online) bookstore in the reference. And when you go to the library and get a book off the shelf, you do not write the name of the library in the reference. It is understood that readers will go to their bookstore or library of choice to find it.

The same is true for database information in references. Most periodicals and books are available through a variety of databases or platforms as well as in print. Different readers will have different methods or points of access, such as university library subscriptions. Most of the time, it does not matter what database you used, so it is not necessary to provide database information in references.

However, there are a few cases when it is necessary for readers to retrieve the cited work from a particular database or archive, either because the database publishes original, proprietary content or because the work is of limited circulation. This page explains how to write references for works from academic research databases and how to provide database information in references when it is necessary to do so.

Database information in references is covered in the seventh edition APA Style manuals in the Publication Manual Section 9.30 and the Concise Guide Section 9.30

rr-icon-new

Related handout

  • Creating an APA Style Reference List (PDF, 179KB)

Works from academic research databases

Do not include database information for works obtained from most academic research databases or platforms because works in these resources are widely available. This includes journal articles, books, and book chapters from academic research databases.

  • Examples of academic research databases and platforms include APA PsycNet, PsycInfo, Academic Search Complete, CINAHL, Ebook Central, EBSCOhost, Google Scholar, JSTOR (excluding its primary sources collection because these are works of limited distribution), MEDLINE, Nexis Uni, Ovid, ProQuest (excluding its dissertations and theses databases because dissertations and theses are works of limited circulation), PubMed Central (excluding authors’ final peer-reviewed manuscripts because these are works of limited circulation), ScienceDirect, Scopus, and Web of Science.
  • When citing a work from one of these databases or platforms, do not include the database or platform name in the reference list entry unless the work falls under one of the exceptions described next ( databases with original, proprietary content and works of limited circulation ).
  • Likewise, do not include URLs from these academic research databases in reference list entries because these URLs will not resolve for readers.
  • Instead of a database URL, include a DOI if the work has one. If a widely available work (e.g., journal article, book, book chapter) from an academic research database does not have a DOI, treat the work as a print version. See the guidelines for how to include DOIs and URLs in references for more information.

The following example shows how to create a reference list entry for a journal article with a DOI from an academic research database.

Hallion, M., Taylor, A., Roberts, R., & Ashe, M. (2019). Exploring the association between physical activity participation and self-compassion in middle-aged adults. Sport, Exercise, and Performance Psychology , 8 (3), 305–316. https://doi.org/10.1037/spy0000150

  • Parenthetical citation: (Hallion et al., 2019)
  • Narrative citation: Hallion et al. (2019)

If the article did not have a DOI, the reference would simply end after the page range, the same as the reference for a print work.

Databases with original, proprietary content

Provide the name of the database or archive when it publishes original, proprietary works available only in that database or archive (e.g., UpToDate or the Cochrane Database of Systematic Reviews). Readers must retrieve the cited work from that exact database or archive, so include information about the database or archive in the reference list entry.

References for works from proprietary databases are similar to journal article references. The name of the database or archive is written in italic title case in the source element, the same as a periodical title, and followed by a period. After the database or archive information, also provide the DOI or URL of the work . If the URL is session-specific (meaning it will not resolve for readers), provide the URL of the database home page or login page instead.

The following example shows how to create a reference list entry for an article from the UpToDate database:

Stein, M. B., & Taylor, C. T. (2019). Approach to treating social anxiety disorder in adults. UpToDate . Retrieved September 13, 2019, from https://www.uptodate.com/contents/approach-to-treating-social-anxiety-disorder-in-adults

  • Parenthetical citation: (Stein & Taylor, 2019)
  • Narrative citation: Stein and Taylor (2019)

Works of limited circulation

Provide the name of the database or archive for works of limited circulation, such as dissertations and theses, manuscripts posted in a preprint archive, and monographs in ERIC. The database may also contain works of wide circulation, such as journal articles—only the works of limited circulation need database information in the reference.

References for works of limited circulation from databases or archives are similar to report references. The name of the database or archive is provided in the source element (in title case without italics ), the same as a publisher name, and followed by a period. After the database or archive information, also provide the DOI or URL of the work. If the URL is session-specific (meaning it will not resolve for readers), provide the URL of the database home page or login page instead.

The following are examples of works of limited circulation from databases or archives (for additional examples, see Section 9.30 of the Publication Manual ):

  • dissertations and theses published in ProQuest Dissertations and Theses Global

Risto, A. (2014). The impact of social media and texting on students’ academic writing skills (Publication No. 3683242) [Doctoral dissertation, Tennessee State University]. ProQuest Dissertations and Theses Global.

  • Parenthetical citation: (Risto, 2014)
  • Narrative citation: Risto (2014)
  • manuscripts posted in a preprint archive such as PsyArXiv

Inbar, Y., & Evers, E. R. K. (2019). Worse is bad: Divergent inferences from logically equivalent comparisons . PsyArXiv. https://doi.org/10.31234/osf.io/ueymx

  • Parenthetical citation: (Inbar & Evers, 2014)
  • Narrative citation: Inbar and Evers (2014)
  • monographs published in ERIC

Riegelman, R. K., & Albertine, S. (2008). Recommendations for undergraduate public health education (ED504790). ERIC. https://files.eric.ed.gov/fulltext/ED504790.pdf

  • Parenthetical citation: (Riegelman & Albertine, 2008)
  • Narrative citation: Riegelman and Albertine (2008)

If you are in doubt as to whether to include database information in a reference, refer to the template for the reference type in question (see Chapter 10 of the Publication Manual ).

The Sheridan Libraries

  • Data and Statistics
  • Sheridan Libraries
  • Citing Data
  • Business and Economics Data and Statistics
  • Humanities Data and Statistics
  • Natural Science Data and Statistics
  • Local City and State Resources
  • Demographics, Labor, and Crime
  • Health and Medicine
  • International
  • Politics and Public Opinion

Citing Data and Statistics

Guide to data citation.

  • Find Data by Topic

Whether you use a numeric dataset or a prepared statistical table from an existing source (print or electronic) you need to cite the source of your information.  

It is critical to correctly cite data and statistics. This ensures that research data and statistics can be:

  • replicated for verification
  • credited for recognition
  • tracked to measure usage and impact

By citing your dataset or statistics, you ensure that your work can be reproduced, and you also attribute credit to those who provided the data or statistics.

Elements of Data Citation

It is important to identify the elements of your data and statistics, as these elements are organized into a properly formatted citation in accordance with your associations preferred style guide.

Citation Element Description
Name(s) of each individual or organizational entity responsible for the creation of the dataset.
Year the dataset was published or disseminated.
Complete title of the dataset, including the edition or version number, if applicable.
Organizational entity that makes the dataset available by archiving, producing, publishing, and/or distributing the dataset.
Web address or unique, persistent, global identifier used to locate the dataset (such as a DOI). Append the date retrieved if the title and locator are not specific to the exact instance of the data you used.

These are the minimum elements required for dataset identification and retrieval. Fewer or additional elements may be requested by author guidelines or style manuals. Be sure to include as many elements as needed to precisely identify the dataset or statistics you have used.

Arrange these elements following the order and punctuation specified by your style guide. If examples for datasets are not provided, the format for books is generally considered a generic format that can be modified for other source types.

Source: Quick Guide to Data Citation - IASSIST Special Interest Group on Data Citation (SIGDC)

  • << Previous: Politics and Public Opinion
  • Next: Find Data by Topic >>
  • Last Updated: Jul 15, 2024 12:55 PM
  • URL: https://guides.library.jhu.edu/data-stats

Data Citation

Benefits of citing data.

Proper citation of data sources has both immediate and long term benefits to users and producers of data. “Data citation is the practice of referencing data products used in research. A data citation includes key descriptive information about the data, such as the title, source, and responsible parties.” ( USGS )

Benefits for data producers

  • provides proper attribution and credit
  • creates a bibliographic “trail”, connecting publications and supporting data, and establishing a timeline of publication and usage
  • demonstrates the impact of their work and establishes research data as an important contribution to the scholarly record

Benefits for data users

  • citation makes it easier to find datasets
  • supports persistence of datasets
  • encourages the reuse of data for new research questions

Benefits for everyone

  • increases transparency and reproducibility

Components of a data citation

Citing data is very similar to citing publications; there are many “correct” formats to use, but we suggest including the following important information:

  • creator(s) or contributor(s)
  • date of publication
  • title of dataset
  • identifier (e.g. Handle, ARK, DOI) or URL of source
  • version, when appropriate
  • date accessed, when appropriate

The order of the information is not as important as having sufficient information to find the data set(s) used. Consider the style guidelines of the research domain or lab group, data source, or preferred publisher (see related information ).

A suggested citation format may be specified by some publishers, with specific additional information (e.g. resource type, retrieval data, funder/sponsor). They may also request citation of related publication(s) along with the data. Be sure to review citation style guides carefully. When citation formats are not specified, you can follow your discipline’s scholarly citation style. The next section provides examples of common repository styles, as well as APA/MLA/Chicago styles.

Examples of data citation styles

StyleExample(s)More information
APA (6th edition)Smith, T.W., Marsden, P.V., & Hout, M. (2011).   (ICPSR31521-v1) [data file and codebook]. Chicago, IL: National Opinion Research Center [producer]. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor]. doi: 10.3886/ICPSR31521.v1
ChicagoSmith, Tom W., Peter V. Marsden, and Michael Hout. 2011.  . ICPSR31521-v1. Chicago, IL: National Opinion Research Center. Distributed by Ann Arbor, MI: Inter-university Consortium for Political and Social Research. doi:10.3886/ICPSR31521.v1
DataCiteBarclay, Janet Rice (2013) Stream Discharge from Harford, NY. Cornell University Library eCommons Repository. 
DRYADYannic G, Pellissier L, Dubey S, Vega R, Basset P, Mazzotti S, Pecchioli E, Vernesi C, Hauffe HC, Searle JB, Hausser J (2012) Data from: Multiple refugia and barriers explain the phylogeography of the Valais shrew, Sorex antinorii (Mammalia: Soricomorpha). Dryad Digital Repository.   
ESIPCline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2003. CLPX-Ground: ISA snow depth transects and related measurements ver. 2.0. Edited by M. A. Parsons and M. J. Brodzik. NASA National Snow and Ice Data Center Distributed Active Archive Center.  . Accessed 2008-05-14.
ICPSRJacob, Philip, and Henry Teune. International Studies of Values in Politics, 1966. ICPSR07006-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 1978. 
FigshareRodriguez, Tommy (2013): 17,170 Base Pair Alignment of Thirteen Time-Extended Lineages [data: (complete) mtDNA; format: ClustalW]. figshare.   Retrieved: 16 26, Jan 04, 2016 (GMT)
MLA (7th edition)Smith, Tom W., Peter V. Marsden, and Michael Hout.  . ICPSR31521-v1. Chicago, IL: National Opinion Research Center [producer]. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2011. Web. 23 Jan 2012. doi:10.3886/ICPSR31521.v1

The Digital Curation Centre (DCC) provides additional guidance on  how to cite datasets and link to publications .

Related information

  • The Austin Principles  (Linguistics Data Citation)
  • Dataset and Software References and Citation Examples  (American Meteorological Society (AMS)) 
  • Data Citation  (Australian National Data Service (ANDS))
  • Data Citation  (United States Geological Survey (USGS)) 
  • Data Citation Guidelines for Earth Science Data Version 2  (Earth Science Information Partners (ESIP), 2019)
  • Data Citation Standards and Practices  (CODATA-ICSTI)
  • Data Citation Synthesis Group: Joint Declaration of Data Citation Principles  (FORCE11) 
  • Get Recognition: Data Citation  (The DataVerse Network) 
  • Data Citation (DataONE) 

WashU Libraries

Citing your sources & writing styles.

  • Sciences/Engineering
  • Legal/Government/Business
  • Citation Software
  • Chicago/Turabian
  • Other Styles
  • Citing your Data
  • Ask us! This link opens in a new window

Why cite your data

Just as you cite journal articles, websites, and any books you reference in your publication, so too do you need to cite any data your publication uses. 

Citing datasets, such as spreadsheets, interview transcripts, images, etc., is crucial in providing context for your research and giving credit to the individual who's data you've used.

Generic data citation

A dataset citation includes many of the same components of a traditional citation.

Many style manuals have not developed specific instructions for citing data. If the style guide you are using does not address data citations, you may use the basic citation elements, regardless of the type of work.

  • author(s) (Who created the data? an organization, individual, group of individuals) ,
  • title (name of the study or title of the dataset),
  • year of publication,
  • publisher (or location of where the data was found)
  • edition/version
  • access information (URL/doi where data was found)

Joint Declaration of Data Citation Principles , Feb. 2014

Image source:  Joint Declaration of Data Citation Principles

APA 6th edition formatting and examples

For a complete description of data citation guidelines refer to pp. 210-211 of the Publication Manual of the American Psychological Association, 6th edition

  • Author Last Name, First Initial. (Year). Title of data set (Version number) [Description of form]. Location: Name of producer.

                                OR

  • Author Last Name, First Initial. (Year). Title of data set [Description of form].  Retrieved from http://xxx
  • Pew Hispanic Center. (2004).  Changing channels and crisscrossing cultures: A survey of Latinos on the news media [Data file and code book].  Retrieved from http://pewhispanic.org/datasets

MLA 9th edition examples using general rules

Since MLA has not developed a specific citation style for datasets, the general rules for citing a web document may be applied. 

  • Author Last Name, First Name. Title of data set . (Version). Publisher location: Publisher name, Date of publication. Medium of publication. Date accessed. doi/url of data
  • Pew Hispanic Center.  Changing channels and crisscrossing cultures: A survey of Latinos on the news media. (Data file and code book). Washington, DC: Pew Research Center, 2004.  Web.  19 Sep 2011. < http://pewhispanic.org/datasets/signup.php?DatasetID=5 >

More about citing sources

  • DCC cite datasets and link to publications
  • Why and how should I cite data? from Inter-University Consortium for Social and Political Research (ICPSR)
  • How to cite data (including datasets) with more examples; from Michigan State University
  • Citing data from MIT Libraries
  • Elements of a Data Citation - examples in several citation styles from How to Cite Datasets and Link to Publications
  • << Previous: Images
  • Next: Ask us! >>
  • Last Updated: Aug 2, 2024 11:29 AM
  • URL: https://libguides.wustl.edu/citestyles

Citing sources: Cite data

  • Citation style guides

Manage your references

Use these tools to help you organize and cite your references:

  • Citation Management and Writing Tools

If you have questions after consulting this guide about how to cite, please contact your advisor/professor or the writing and communication center .

Cite data in your paper/presentation so that you can:

  • Give the data producer appropriate credit
  • Enable readers of your work to access the data, for their own use and to replicate your results
  • Fulfills some publisher requirements

Include in your citation:

  • Year of publication
  • Publisher or distributor
  • URL, identifier, or other access location

Using citation software or style guides ? In Endnote use the reference type for "dataset." If you're using Mendeley or Zotero, make due with using other more generic reference type templates and fill in the essentials for your dataset.

Cite data: examples

Want detailed guidelines for citing data?  See:

  • Quick Guide to Data Citation (IASSIST)
  • How to Cite Data (MSU)
  • How to Cite Datasets and Link to Publications (DCC)

Examples of data citations include:

  • Bachman, Jerald G., Lloyd D. Johnston, and Patrick M. O'Malley. Monitoring the Future: A Continuing Study of American Youth (12th-Grade Survey), 1998 [Computer file]. Conducted by University of Michigan, Survey Research Center. ICPSR02751-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [producer and distributor], 2006-05-15. http://dx.doi.org/10.3886/ICPSR02751 .
  • ASTER Global Digital Elevation Model, version 1, ASTGTM_N11E122_num.tif, ASTGTM_N11E123_num.tif, Ministry of Economy, Trade, and Industry (METI) of Japan and NASA, downloaded from https://wist.echo.nasa.gov/api/ , October 27, 2009
  • Cite a subject archive entry, e.g.: Genbank accession number, available at: http://www.ncbi.nlm.nih.gov .

Data archives may provide guidelines on how to cite the data, e.g.,:

  • Data catalogs like the Harvard Dataverse Network and ICPSR have standard citations included in the study record.
  • ICPSR: Why and how should I cite data?
  • How to Cite Roper Center Data
  • Dryad Good Data Practices
  • Earth Science Information Partner Federation Data Stewardship/Citations
  • NOAA Paleoclimatology Program: Data Citation
  • PANGEA Citation
  • Citing and linking to the Gene Expression Omnibus (NCBI) database

Cite data using Zotero

As Zotero lacks an "item type" for datasets, enter the citation in the system as a "Document," depending upon if/how the data producer provides a recommended citation; either:

  • Export an RIS file and import this file into Zotero
  • Copy and paste the information from a recommended citation into a new Zotero item with the type "Document"
  • Otherwise, use the "Document" item type to add the components of the citation
  • << Previous: Citation style guides
  • Last Updated: Jan 16, 2024 7:02 AM
  • URL: https://libguides.mit.edu/citing
  • Skip to Guides Search
  • Skip to breadcrumb
  • Skip to main content
  • Skip to footer
  • Skip to chat link
  • Report accessibility issues and get help
  • Go to Penn Libraries Home
  • Go to Franklin catalog
  • Penn Libraries
  • Research Data & Digital Scholarship

Data Management Resources

  • Citing Data
  • Data Management Plans
  • File Organization
  • Spreadsheets
  • Metadata & Standards
  • ReadMe Files
  • Codebooks & Data Dictionaries
  • Repositories
  • Storage & Backups
  • Sustainable File Types

Data Citation Resources

  • DOI Citation Formatter  -  a fantastic tool that allows you to put in a DOI and get out a dataset citation in your desired style
  • Digital Curation Center (DCC), How to Cite Datasets and Link to Publications
  • University of Illinois Data Nudge, Data Citation

Why Cite Data?

Datasets used during the research process should be cited like you would cite an article - in the reference, cited sources, and bibliographies sections of your works. The process of citing research data has developed as the researchers and stakeholders realize that the inclusion of data is necessary for a complete scholarly record between a research product and the evidence it is based on.

Citing data:

  • attributes credit to the responsible researchers
  • allows those sharing the data to measure its impact
  • supports the research infrastructure by connecting data and published research 
  • improves access to data
  • provides opportunities for verifying data and enable reuse
  • promotes data as an equal scholarly output to a written work

How To Cite Data

While citing data has become an expectation, scholarly communities and communities of practice have largely struggled to develop data citation standards within their existing citation styles. This leaves the burden on research data communities to create a data citation format that conforms to existing styles rules as best as possible. When citing a dataset in a paper, follow the citation style required by the publisher. If they do not have a format for datasets, collect all the core elements and match the citation for textual publications. You can also follow DataCite's citation style for a dataset and adapt it to match the citation style you are using. 

Core Elements:

  • author/creator
  • date of publication
  • title, including version or edition
  • publisher or distributor (such as the name of the repository where the data was found)
  • URL, DOI or other persistent identifier 

Example Citations from IASSIST's Quick Guide to Data Citation

APA (6th edition)

Smith, T.W., Marsden, P.V., & Hout, M. (2011). General social survey, 1972-2010 cumulative file (ICPSR31521-v1) [data file and codebook]. Chicago, IL: National Opinion Research Center [producer]. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor]. doi: 10.3886/ICPSR31521.v1

MLA (7th edition)

Smith, Tom W., Peter V. Marsden, and Michael Hout. General Social Survey, 1972-2010 Cumulative File . ICPSR31521-v1. Chicago, IL: National Opinion Research Center [producer]. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2011. Web. 23 Jan 2012. doi:10.3886/ICPSR31521.v1

Chicago (16th edition) (author-date)

Smith, Tom W., Peter V. Marsden, and Michael Hout. 2011. General Social Survey, 1972-2010 Cumulative File . ICPSR31521-v1. Chicago, IL: National Opinion Research Center. Distributed by Ann Arbor, MI: Inter-university Consortium for Political and Social Research. doi:10.3886/ICPSR31521.v1

Research Data Engineer

Profile Photo

Lauren Phegley holds consultations on data management, DMPTool, writing Data Management Plans (DMPs), and data sharing.

Head of Research Data Services

Profile Photo

Director of Research Data & Digital Scholarship

See schedule button for current dates and times. Appointments available in person and on zoom.

  • << Previous: Sustainable File Types
  • Last Updated: Apr 29, 2024 10:21 AM
  • URL: https://guides.library.upenn.edu/datamgmt
  • UNC Libraries
  • Data Service
  • How to Cite Data
  • Key Components

How to Cite Data: Key Components

Why cite data, key components of a data citation, other possible elements, citing scraped data.

  • Numeric Data
  • Geospatial Data
  • Additional Resources

Library Data Services

Library Data Services  caters to researchers interested in working with data, mapping, texts, visualization, and technology. Many of these services are available online. Davis Library Data Services, located on the second floor of Davis Library, offers:

  • A computing lab with  specialized software  for GIS and data visualization & analysis.
  • Walk-in assistance provided by knowledgeable student consultants during set  hours . 
  • Consultations with  specialists  for more in-depth inquiries (by appointment).
  • Spaces  for collaboration and presentation, complete with white boards and external displays.
  • Technology short courses and programs that promote digital scholarship.

When you collect your own data, citing its location makes it possible for others to find them and extend your research, raising your profile as a researcher. ICPSR provides a good overview of the importance of data citation :

"Citing data files in publications based on those data is important for several reasons:

  • Other researchers may want to replicate research findings and need the bibliographic information provided in citations to identify and locate the referenced data.
  • Citations appearing in publication references are harvested by key electronic social sciences indexes, such as Web of Science, providing credit to the researchers.
  • Data producers, funding agencies, and others can track citations to specific collections to determine types and levels of usage, thus measuring impact."

If you're using data you didn't gather yourself, citing your source is just as important as citing your other research sources. For other scholars to be able to examine and extend your work, they must be able to  find the original data .

Consequently, although most style guides do not include examples for citing data, consider the key components and other elements at right and work them into the style you're using.

Author

The original researcher(s) who collected the data

Study name/Title

What did the original researcher call it?

Producer

The organization that sponsored the research, usually the author's institution. This takes the place of a publisher in an ordinary citation, so be prepared to list the place of publication as well. It may be useful to add a designation like [producer] if it is not actually a publisher.

Year Data Produced

When did the Producer first release the data? Treat this like the publication date.

Unique Identifier, like a Digital Object Identifier (DOI)

If you got the data from a repository like ICPSR, note their unique identifier as part of the title. If the data file has a DOI, include it as you would a URL for a web site. Check   for information on how to obtain a DOI.

Distributor

The organization that makes the data available. From what organization did you get it? If directly from the author, listing the author's institution/organization once (as the publisher) is sufficient. However if the distributor is different from the producer, it's important to list it separately; it may be useful to add a designation like “[distributor]” to clarify its role.

Year Data Collected

When did the original researcher collect the data? You may choose how specific to be--it may only be important to list the years, or you may want to provide more specific date ranges if it would be important for subsequent users to know the periodicity (months, weeks, days, etc.).

Note that the elements provided here all refer to datasets that have been either published in some way, or deposited in a repository.  It is more difficult to cite data that have not been preserved or fixed in some way. 

If you plan to scrape data, FIRST CONTACT DIGITAL RESEARCH SERVICES to be sure you are not violating the legal license terms under which we operate.  You will also need to explore if copyright and licensing terms allow you to preserve and/or share the data you obtain in this manner.

Once you are sure you have permission to scrape, preserve and/or share, make a plan for how to share this information with other researchers.

You may want to

  • Deposit and cite the data you scraped, and
  • Deposit the script(s) you used to scrape them in figshare or Zenodo , and cite  them.  (Both of these repositories can assign Digital Object Identifiers (DOIs), to both software [i.e., scripts] and datasets, making them easier and more reliable to cite.)

If you are scraping web pages (as opposed to database content), you should cite a list of all the urls you scraped.  You may also wish to make sure all scraped pages are archived by the WayBackMachine so that they continue to be accessible in the format you encountered despite later changes.

Thanks to Sebastian Karcher of the Qualitative Data Archive for much of this advice.

  • << Previous: Home
  • Next: Numeric Data >>
  • Last Updated: Jul 9, 2024 4:52 PM
  • URL: https://guides.lib.unc.edu/citedata

How to Cite Statistics

Create citations for free.

Website Book Journal Other

If you’re quoting or referring to statistics in your academic papers, the short and simple answer is, yes, of course, you should always cite your sources. This will allow your reader—usually your lecturer—to check the statistic for themselves with a clear point of reference for reviewing the relevant study in more detail if they wish.

But What If It’s Common Knowledge?

Even if you’re referring to something that’s often quoted and could be considered common knowledge—for example, that 50% of marriages end in divorce, or that 80% of businesses fail in their first year—you should still back this up by quoting a study from a reputable source. You might even find that your “common knowledge” statistic isn’t as reliable as you originally thought.

So How Do I Cite a Statistic?

Assuming that you’ve found a reliable source for the statistic that you’re quoting or referring to, you now need to create a citation to point to that source. How you do this will depend on the citation format that you’re required to use and the actual source type of the statistic. Once you know whether you’re expected to cite your sources in MLA or APA style , citing a statistic is essentially no different from citing anything else from that particular type of source.

For example, if you took the statistic from a website, you cite it as you would any other website. The same goes for statistics found in books, journals, magazines, or databases—simply follow the usual citation method for each source.

Here is an example of a statistic found online and cited in MLA style (9 th Edition)

Full Citation Structure

Author’s last name, First name. “Title of Document/Webpage: Subtitle.” Title of Website, Publisher/Affiliated organization, Date published, URL.

More females than males attended college/university in the US in 2017.

Full Citation in Works Cited List

“Table 105.20: Enrollment in Elementary, Secondary, and Degree-Granting Postsecondary Institutions, by Level and Control of Institution, Enrollment Level, and Attendance Status and Sex of Student: Selected Years, Fall 1990 through Fall 2026.” Digest of Education Statistics , National Center for Education Statistics, February 2017, https://nces.ed.gov/programs/digest/d16/tables/dt16_105.20.asp?current=yes.

Note the above example does not have author information available, so the citation starts with the title of the document. Also, the citation provides information for an individual data table on a webpage, rather than simply the webpage itself.

In-text Citation

An in-text citation in MLA is a parenthetical citation . The standard format for this citation is:

(Author, page number)

Creating an in-text citation for a webpage can be tricky due to the absence of page numbers (and, in this case, the absence of an author). The advice for MLA format is to include the first item of your full citation, whatever that may be. This will enable the reader to easily identify the full citation, which is, of course, the point of an in-text citation. You can condense the item if necessary.

So, for the above example, the in-text citation would be:

(“Table 105.20”)

That Seems Like A Lot of Information!

Generally, the more information you can give on a resource the better—so if it’s available, include it. However, it’s understood that sometimes you might have to leave some components out. If you follow the format as instructed by your lecturer and include enough information to enable them to find your sources, your citation should be correct.

Remember! Don’t rely on “common knowledge” to cite statistics. Your lecturer will want to see firm sources to validate those statistics. Citation Machine can help with quick citation creation, making it easy to back up those statistics with properly referenced sources. There are thousands of styles including the Chicago Manual of Style and many others.

How useful was this post?

Click on a star to rate it!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?

Generate accurate APA citations for free

  • Knowledge Base
  • How to cite a report in APA Style

How to Cite a Report in APA Style | Format & Examples

Published on November 6, 2020 by Jack Caulfield . Revised on December 1, 2023.

Reports may be published by governments , task groups, or other organizations. To reference a report with an individual author, include the author’s name and initials, the report title (italicized), the report number, the organization that published it, and the URL (if accessed online, e.g. as a PDF ).

APA format Author last name, Initials. (Year). (Report No. number). Publisher name. URL
Bedford, D. A. D. (2017). (Report No. WA-RD 896.4). Washington State Department of Transportation. https://www.wsdot.wa.gov/research/reports/fullreports/896-4.pdf
(Bedford, 2017, p. 12)

Note that brochures are cited in a similar format. You can easily create accurate APA citations using our free Citation Generator.

Generate APA citations

Table of contents

Report with multiple authors, report with organization as author, where to find the report number, frequently asked questions about apa style citations.

When a report has multiple authors, up to 20 should be listed in the reference.

If the report has 21 or more authors, list the first 19, then an ellipsis, then the last listed author:

With in-text citations, list up to two authors. For three or more, list the first followed by “ et al. ”

(Bedford & Caulfield, 2012)
(Davis et al., 2015)

Scribbr Citation Checker New

The AI-powered Citation Checker helps you avoid common mistakes such as:

  • Missing commas and periods
  • Incorrect usage of “et al.”
  • Ampersands (&) in narrative citations
  • Missing reference entries

how to cite research data

Sometimes, reports do not list individual authors, only the organization responsible. In these cases, list the organization in the author position.

Europeana Task Force on Metadata Quality. (2015). . Europeana. https://pro.europeana.eu/files/Europeana_Professional/Europeana_Network/metadata-quality-report.pdf
(Europeana Task Force on Metadata Quality, 2015)

This sometimes results in the name of the author and publisher being identical. Omit the second mention of the organization in this case.

Kellogg Company. (2019). . https://www.annualreports.com/HostedData/AnnualReports/PDF/NYSE_K_2019.pdf
(Kellogg Company, 2019)

Many reports are associated with a specific number. If a report has a number, it will typically be listed in the database where you found the report.

APA report number in database

It will also generally appear on the cover or title page of the report itself.

APA report number on cover

A report number should always be included when available, but if a report doesn’t have one, you can just leave this part out.

When no individual author name is listed, but the source can clearly be attributed to a specific organization—e.g., a press release by a charity, a report by an agency, or a page from a company’s website—use the organization’s name as the author in the reference entry and APA in-text citations .

When no author at all can be determined—e.g. a collaboratively edited wiki or an online article published anonymously—use the title in place of the author. In the in-text citation, put the title in quotation marks if it appears in plain text in the reference list, and in italics if it appears in italics in the reference list. Shorten it if necessary.

The abbreviation “ et al. ” (meaning “and others”) is used to shorten APA in-text citations with three or more authors . Here’s how it works:

Only include the first author’s last name, followed by “et al.”, a comma and the year of publication, for example (Taylor et al., 2018).

You may include up to 20 authors in a reference list entry .

When an article has more than 20 authors, replace the names prior to the final listed author with an ellipsis, but do not omit the final author:

Davis, Y., Smith, J., Caulfield, F., Pullman, H., Carlisle, J., Donahue, S. D., James, F., O’Donnell, K., Singh, J., Johnson, L., Streefkerk, R., McCombes, S., Corrieri, L., Valck, X., Baldwin, F. M., Lorde, J., Wardell, K., Lao, W., Yang, P., . . . O’Brien, T. (2012).

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Caulfield, J. (2023, December 01). How to Cite a Report in APA Style | Format & Examples. Scribbr. Retrieved August 5, 2024, from https://www.scribbr.com/apa-examples/report/

Is this article helpful?

Jack Caulfield

Jack Caulfield

Other students also liked, beginner's guide to apa in-text citation, how to cite an interview in apa style, how to cite a patent in apa style, scribbr apa citation checker.

An innovative new tool that checks your APA citations with AI software. Say goodbye to inaccurate citations!

Purdue Online Writing Lab Purdue OWL® College of Liberal Arts

Tables and Figures

OWL logo

Welcome to the Purdue OWL

This page is brought to you by the OWL at Purdue University. When printing this page, you must include the entire legal notice.

Copyright ©1995-2018 by The Writing Lab & The OWL at Purdue and Purdue University. All rights reserved. This material may not be published, reproduced, broadcast, rewritten, or redistributed without permission. Use of this site constitutes acceptance of our terms and conditions of fair use.

Note:  This page reflects the latest version of the APA Publication Manual (i.e., APA 7), which released in October 2019. The equivalent resources for the older APA 6 style  can be found at this page  as well as at this page (our old resources covered the material on this page on two separate pages).

The purpose of tables and figures in documents is to enhance your readers' understanding of the information in the document; usually, large amounts of information can be communicated more efficiently in tables or figures. Tables are any graphic that uses a row and column structure to organize information, whereas figures include any illustration or image other than a table.

General guidelines

Visual material such as tables and figures can be used quickly and efficiently to present a large amount of information to an audience, but visuals must be used to assist communication, not to use up space, or disguise marginally significant results behind a screen of complicated statistics. Ask yourself this question first: Is the table or figure necessary? For example, it is better to present simple descriptive statistics in the text, not in a table.

Relation of Tables or Figures and Text

Because tables and figures supplement the text, refer in the text to all tables and figures used and explain what the reader should look for when using the table or figure. Focus only on the important point the reader should draw from them, and leave the details for the reader to examine on their own.

Documentation

If you are using figures, tables and/or data from other sources, be sure to gather all the information you will need to properly document your sources.

Integrity and Independence

Each table and figure must be intelligible without reference to the text, so be sure to include an explanation of every abbreviation (except the standard statistical symbols and abbreviations).

Organization, Consistency, and Coherence

Number all tables sequentially as you refer to them in the text (Table 1, Table 2, etc.), likewise for figures (Figure 1, Figure 2, etc.). Abbreviations, terminology, and probability level values must be consistent across tables and figures in the same article. Likewise, formats, titles, and headings must be consistent. Do not repeat the same data in different tables.

Data in a table that would require only two or fewer columns and rows should be presented in the text. More complex data is better presented in tabular format. In order for quantitative data to be presented clearly and efficiently, it must be arranged logically, e.g. data to be compared must be presented next to one another (before/after, young/old, male/female, etc.), and statistical information (means, standard deviations, N values) must be presented in separate parts of the table. If possible, use canonical forms (such as ANOVA, regression, or correlation) to communicate your data effectively.

screenshot-2024-07-15-at-2.05.03pm.png

A generic example of a table with multiple notes formatted in APA 7 style.

Elements of Tables

Number all tables with Arabic numerals sequentially. Do not use suffix letters (e.g. Table 3a, 3b, 3c); instead, combine the related tables. If the manuscript includes an appendix with tables, identify them with capital letters and Arabic numerals (e.g. Table A1, Table B2).

Like the title of the paper itself, each table must have a clear and concise title. Titles should be written in italicized title case below the table number, with a blank line between the number and the title. When appropriate, you may use the title to explain an abbreviation parenthetically.

Comparison of Median Income of Adopted Children (AC) v. Foster Children (FC)

Keep headings clear and brief. The heading should not be much wider than the widest entry in the column. Use of standard abbreviations can aid in achieving that goal. There are several types of headings:

  • Stub headings describe the lefthand column, or stub column , which usually lists major independent variables.
  • Column headings describe entries below them, applying to just one column.
  • Column spanners are headings that describe entries below them, applying to two or more columns which each have their own column heading. Column spanners are often stacked on top of column headings and together are called decked heads .
  • Table Spanners cover the entire width of the table, allowing for more divisions or combining tables with identical column headings. They are the only type of heading that may be plural.

All columns must have headings, written in sentence case and using singular language (Item rather than Items) unless referring to a group (Men, Women). Each column’s items should be parallel (i.e., every item in a column labeled “%” should be a percentage and does not require the % symbol, since it’s already indicated in the heading). Subsections within the stub column can be shown by indenting headings rather than creating new columns:

Chemical Bonds

     Ionic

     Covalent

     Metallic

The body is the main part of the table, which includes all the reported information organized in cells (intersections of rows and columns). Entries should be center aligned unless left aligning them would make them easier to read (longer entries, usually). Word entries in the body should use sentence case. Leave cells blank if the element is not applicable or if data were not obtained; use a dash in cells and a general note if it is necessary to explain why cells are blank.   In reporting the data, consistency is key: Numerals should be expressed to a consistent number of decimal places that is determined by the precision of measurement. Never change the unit of measurement or the number of decimal places in the same column.

There are three types of notes for tables: general, specific, and probability notes. All of them must be placed below the table in that order.

General  notes explain, qualify or provide information about the table as a whole. Put explanations of abbreviations, symbols, etc. here.

Example:  Note . The racial categories used by the US Census (African-American, Asian American, Latinos/-as, Native-American, and Pacific Islander) have been collapsed into the category “non-White.” E = excludes respondents who self-identified as “White” and at least one other “non-White” race.

Specific  notes explain, qualify or provide information about a particular column, row, or individual entry. To indicate specific notes, use superscript lowercase letters (e.g.  a ,  b ,  c ), and order the superscripts from left to right, top to bottom. Each table’s first footnote must be the superscript  a .

a  n = 823.  b  One participant in this group was diagnosed with schizophrenia during the survey.

Probability  notes provide the reader with the results of the tests for statistical significance. Asterisks indicate the values for which the null hypothesis is rejected, with the probability ( p value) specified in the probability note. Such notes are required only when relevant to the data in the table. Consistently use the same number of asterisks for a given alpha level throughout your paper.

* p < .05. ** p < .01. *** p < .001

If you need to distinguish between two-tailed and one-tailed tests in the same table, use asterisks for two-tailed p values and an alternate symbol (such as daggers) for one-tailed p values.

* p < .05, two-tailed. ** p < .01, two-tailed. † p <.05, one-tailed. †† p < .01, one-tailed.

Borders 

Tables should only include borders and lines that are needed for clarity (i.e., between elements of a decked head, above column spanners, separating total rows, etc.). Do not use vertical borders, and do not use borders around each cell. Spacing and strict alignment is typically enough to clarify relationships between elements.

This image shows an example of a table presented in the text of an APA 7 paper.

Example of a table in the text of an APA 7 paper. Note the lack of vertical borders.

Tables from Other Sources

If using tables from an external source, copy the structure of the original exactly, and cite the source in accordance with  APA style .

Table Checklist

(Taken from the  Publication Manual of the American Psychological Association , 7th ed., Section 7.20)

  • Is the table necessary?
  • Does it belong in the print and electronic versions of the article, or can it go in an online supplemental file?
  • Are all comparable tables presented consistently?
  • Are all tables numbered with Arabic numerals in the order they are mentioned in the text? Is the table number bold and left-aligned?
  • Are all tables referred to in the text?
  • Is the title brief but explanatory? Is it presented in italicized title case and left-aligned?
  • Does every column have a column heading? Are column headings centered?
  • Are all abbreviations; special use of italics, parentheses, and dashes; and special symbols explained?
  • Are the notes organized according to the convention of general, specific, probability?
  • Are table borders correctly used (top and bottom of table, beneath column headings, above table spanners)?
  • Does the table use correct line spacing (double for the table number, title, and notes; single, one and a half, or double for the body)?
  • Are entries in the left column left-aligned beneath the centered stub heading? Are all other column headings and cell entries centered?
  • Are confidence intervals reported for all major point estimates?
  • Are all probability level values correctly identified, and are asterisks attached to the appropriate table entries? Is a probability level assigned the same number of asterisks in all the tables in the same document?
  • If the table or its data are from another source, is the source properly cited? Is permission necessary to reproduce the table?

Figures include all graphical displays of information that are not tables. Common types include graphs, charts, drawings, maps, plots, and photos. Just like tables, figures should supplement the text and should be both understandable on their own and referenced fully in the text. This section details elements of formatting writers must use when including a figure in an APA document, gives an example of a figure formatted in APA style, and includes a checklist for formatting figures.

Preparing Figures

In preparing figures, communication and readability must be the ultimate criteria. Avoid the temptation to use the special effects available in most advanced software packages. While three-dimensional effects, shading, and layered text may look interesting to the author, overuse, inconsistent use, and misuse may distort the data, and distract or even annoy readers. Design properly done is inconspicuous, almost invisible, because it supports communication. Design improperly, or amateurishly, done draws the reader’s attention from the data, and makes him or her question the author’s credibility. Line drawings are usually a good option for readability and simplicity; for photographs, high contrast between background and focal point is important, as well as cropping out extraneous detail to help the reader focus on the important aspects of the photo.

Parts of a Figure

All figures that are part of the main text require a number using Arabic numerals (Figure 1, Figure 2, etc.). Numbers are assigned based on the order in which figures appear in the text and are bolded and left aligned.

Under the number, write the title of the figure in italicized title case. The title should be brief, clear, and explanatory, and both the title and number should be double spaced.

The image of the figure is the body, and it is positioned underneath the number and title. The image should be legible in both size and resolution; fonts should be sans serif, consistently sized, and between 8-14 pt. Title case should be used for axis labels and other headings; descriptions within figures should be in sentence case. Shading and color should be limited for clarity; use patterns along with color and check contrast between colors with free online checkers to ensure all users (people with color vision deficiencies or readers printing in grayscale, for instance) can access the content. Gridlines and 3-D effects should be avoided unless they are necessary for clarity or essential content information.

Legends, or keys, explain symbols, styles, patterns, shading, or colors in the image. Words in the legend should be in title case; legends should go within or underneath the image rather than to the side. Not all figures will require a legend.

Notes clarify the content of the figure; like tables, notes can be general, specific, or probability. General notes explain units of measurement, symbols, and abbreviations, or provide citation information. Specific notes identify specific elements using superscripts; probability notes explain statistical significance of certain values.

This image shows a generic example of a bar graph formatted as a figure in APA 7 style.

A generic example of a figure formatted in APA 7 style.

Figure Checklist 

(Taken from the  Publication Manual of the American Psychological Association , 7 th ed., Section 7.35)

  • Is the figure necessary?
  • Does the figure belong in the print and electronic versions of the article, or is it supplemental?
  • Is the figure simple, clean, and free of extraneous detail?
  • Is the figure title descriptive of the content of the figure? Is it written in italic title case and left aligned?
  • Are all elements of the figure clearly labeled?
  • Are the magnitude, scale, and direction of grid elements clearly labeled?
  • Are parallel figures or equally important figures prepared according to the same scale?
  • Are the figures numbered consecutively with Arabic numerals? Is the figure number bold and left aligned?
  • Has the figure been formatted properly? Is the font sans serif in the image portion of the figure and between sizes 8 and 14?
  • Are all abbreviations and special symbols explained?
  • If the figure has a legend, does it appear within or below the image? Are the legend’s words written in title case?
  • Are the figure notes in general, specific, and probability order? Are they double-spaced, left aligned, and in the same font as the paper?
  • Are all figures mentioned in the text?
  • Has written permission for print and electronic reuse been obtained? Is proper credit given in the figure caption?
  • Have all substantive modifications to photographic images been disclosed?
  • Are the figures being submitted in a file format acceptable to the publisher?
  • Have the files been produced at a sufficiently high resolution to allow for accurate reproduction?
  • Directories
  • What are citations and why should I use them?
  • When should I use a citation?
  • Why are there so many citation styles?
  • Which citation style should I use?
  • Chicago Notes Style
  • Chicago Author-Date Style
  • AMA Style (medicine)
  • Bluebook (law)
  • Additional Citation Styles
  • Built-in Citation Tools
  • Quick Citation Generators
  • Citation Management Software
  • Start Your Research
  • Research Guides
  • University of Washington Libraries
  • Library Guides
  • UW Libraries
  • Citing Sources

Citing Sources: What are citations and why should I use them?

What is a citation.

Citations are a way of giving credit when certain material in your work came from another source. It also gives your readers the information necessary to find that source again-- it provides an important roadmap to your research process. Whenever you use sources such as books, journals or websites in your research, you must give credit to the original author by citing the source. 

Why do researchers cite?

Scholarship is a conversation  and scholars use citations not only to  give credit  to original creators and thinkers, but also to  add strength and authority  to their own work.  By citing their sources, scholars are  placing their work in a specific context  to show where they “fit” within the larger conversation.  Citations are also a great way to  leave a trail  intended to help others who may want to explore the conversation or use the sources in their own work.

In short, citations

(1) give credit

(2) add strength and authority to your work

(3) place your work in a specific context

(4) leave a trail for other scholars

"Good citations should reveal your sources, not conceal them. They should honeslty reflect the research you conducted." (Lipson 4)

Lipson, Charles. "Why Cite?"  Cite Right: A Quick Guide to Citation Styles--MLA, APA, Chicago, the Sciences, Professions, and More . Chicago: U of Chicago, 2006. Print.

What does a citation look like?

Different subject disciplines call for citation information to be written in very specific order, capitalization, and punctuation. There are therefore many different style formats. Three popular citation formats are MLA Style (for humanities articles) and APA or Chicago (for social sciences articles).

MLA style (print journal article):  

Whisenant, Warren A. "How Women Have Fared as Interscholastic Athletic Administrators Since the Passage of Title IX." Sex Roles Vol. 49.3 (2003): 179-182.

APA style (print journal article):

Whisenant, W. A. (2003) How Women Have Fared as Interscholastic Athletic Administrators Since the Passage of Title IX. Sex Roles , 49 (3), 179-182.

Chicago style (print journal article):

Whisenant, Warren A. "How Women Have Fared as Interscholastic Athletic Administrators Since the Passage of Title IX." Sex Roles 49, no. 3 (2003): 179-182.

No matter which style you use, all citations require the same basic information:

  • Author or Creator
  • Container (e.g., Journal or magazine, website, edited book)
  • Date of creation or publication
  • Publisher 

You are most likely to have easy access to all of your citation information when you find it in the first place. Take note of this information up front, and it will be much easier to cite it effectively later.

  • << Previous: Basics of Citing
  • Next: When should I use a citation? >>
  • Last Updated: May 1, 2024 12:48 PM
  • URL: https://guides.lib.uw.edu/research/citations
  • PRO Courses Guides New Tech Help Pro Expert Videos About wikiHow Pro Upgrade Sign In
  • EDIT Edit this Article
  • EXPLORE Tech Help Pro About Us Random Article Quizzes Request a New Article Community Dashboard This Or That Game Happiness Hub Popular Categories Arts and Entertainment Artwork Books Movies Computers and Electronics Computers Phone Skills Technology Hacks Health Men's Health Mental Health Women's Health Relationships Dating Love Relationship Issues Hobbies and Crafts Crafts Drawing Games Education & Communication Communication Skills Personal Development Studying Personal Care and Style Fashion Hair Care Personal Hygiene Youth Personal Care School Stuff Dating All Categories Arts and Entertainment Finance and Business Home and Garden Relationship Quizzes Cars & Other Vehicles Food and Entertaining Personal Care and Style Sports and Fitness Computers and Electronics Health Pets and Animals Travel Education & Communication Hobbies and Crafts Philosophy and Religion Work World Family Life Holidays and Traditions Relationships Youth
  • Browse Articles
  • Learn Something New
  • Quizzes Hot
  • Happiness Hub
  • This Or That Game
  • Train Your Brain
  • Explore More
  • Support wikiHow
  • About wikiHow
  • Log in / Sign up
  • Education and Communications
  • College University and Postgraduate
  • Academic Writing

How to Cite Statistics

Last Updated: May 1, 2024 References

This article was co-authored by Gerald Posner and by wikiHow staff writer, Jennifer Mueller, JD . Gerald Posner is an Author & Journalist based in Miami, Florida. With over 35 years of experience, he specializes in investigative journalism, nonfiction books, and editorials. He holds a law degree from UC College of the Law, San Francisco, and a BA in Political Science from the University of California-Berkeley. He’s the author of thirteen books, including several New York Times bestsellers, the winner of the Florida Book Award for General Nonfiction, and has been a finalist for the Pulitzer Prize in History. He was also shortlisted for the Best Business Book of 2020 by the Society for Advancing Business Editing and Writing. There are 8 references cited in this article, which can be found at the bottom of the page. This article has been viewed 30,809 times.

When you're working on a research paper, citing datasets and statistics you used is just as important as citing articles and other references from your research. It allows your readers to independently examine the data and verify the methodology used in collecting it. The basic information in your citation is similar, but the format may differ depending on whether you're using the Modern Language Association (MLA), American Psychological Association (APA), or Chicago citation style. [1] X Research source

Step 1 Start your Works Cited entry with the author of the statistical document.

  • Example: New York City Department of Health and Mental Hygiene.
  • Individual author example: Sunshine, Sally.
  • If there are 2 authors, place a comma after the first author's name, then type the word "and" and list the second author's name in first name-last name order. For example: Sunshine, Sally and Luna Wolfe.
  • For more than 2 authors, type the first author's name and a comma followed by the abbreviation "et. al." For example: Sunshine, Sally, et. al.

Step 2 Provide the title of the statistical document in quotation marks.

  • Example: New York City Department of Health and Mental Hygiene. "Community Health Profiles 2015, Brooklyn Community District 17: East Flatbush."

Step 3 List publication information for the document.

  • Example: New York City Department of Health and Mental Hygiene. "Community Health Profiles 2015, Brooklyn Community District 17: East Flatbush." NYC.gov , New York City Department of Health and Mental Hygiene, 2015.
  • If a specific date is provided, use day-month-year format, abbreviating months with names longer than 4 letters. For example: 22 Feb. 2016.

Step 4 Include a direct URL or DOI for the statistical document.

  • URL example: New York City Department of Health and Mental Hygiene. "Community Health Profiles 2015, Brooklyn Community District 17: East Flatbush." NYC.gov , New York City Department of Health and Mental Hygiene, 2015. www1.nyc.gov/assets/doh/downloads/pdf/data/2015chp-bk17.pdf.
  • DOI example: "Hazardous Drinking Rates, Drinkers Only, Population Aged 15-74." Tackling Harmful Alcohol Use: Economics and Public Health Policy , Organization for Economic Co-operation and Development, 24 Dec. 2015. OECD iLibrary , doi:10.1787/9789264181069-graph7-en.

Step 5 Close with the access date for online documents.

  • Example: New York City Department of Health and Mental Hygiene. "Community Health Profiles 2015, Brooklyn Community District 17: East Flatbush." NYC.gov , New York City Department of Health and Mental Hygiene, 2015. www1.nyc.gov/assets/doh/downloads/pdf/data/2015chp-bk17.pdf. Accessed 24 Jan. 2017.

MLA Works Cited Entry Format:

Author Last Name, First Name. "Title of Document: Subtitle if Any." Title of Website or Publication , Name of Publisher, Day-Month-Year published or last modified. URL/DOI. Accessed Day-Month-Year.

Step 6 Use the author's last name and page number for in-text citations.

  • For example, you might write: Statistics show 30 percent of the adult residents of East Flatbush are obese (New York City Department of Health and Mental Hygiene 9).
  • If the source isn't paginated, you only need to provide the author's last name in the parenthetical citation.
  • If you mention the author's name in your text, provide a page number in the parenthetical. For example, you might write: According to the New York City Department of Health and Mental Hygiene, 30 percent of the adult residents of East Flatbush are obese (9). If the source is not paginated, you don't need a parenthetical at all if you mention the author's name in your text.

Step 1 Start with the name of the author or rights holder.

  • Example: National Center for Health Statistics.
  • If there are 2 to 7 authors, list each name using the same last name-initials format. Place a comma between names and an ampersand before the final author's name. If there are more than 7 authors listed, place an ellipsis after the 6th author's name, then provide the last author's name. Never list more than 7 authors in an APA reference list entry. [4] X Trustworthy Source Purdue Online Writing Lab Trusted resource for writing and citation guidelines Go to source

Step 2 Provide the year the document was published in parentheses.

  • Example: National Center for Health Statistics. (2016).

Step 3 Include the title of the document followed by a brief description.

  • Example: National Center for Health Statistics. (2016). Health, United States, 2015: With special feature on racial and ethnic health disparities [Statistical report].
  • Examples of possible descriptions include "statistical report," "data file," "dataset," "preliminary report," or "statistical analysis."
  • If there is a version number, include it in parentheses between the title and the description.

Step 4 Close with the permalink URL or DOI for the document.

  • URL example: National Center for Health Statistics. (2016). Health, United States, 2015: With special feature on racial and ethnic health disparities [Statistical report]. Retrieved from https://www.cdc.gov/nchs/data/hus/hus15.pdf
  • DOI example: Organization for Economic Co-operation and Development (2015). Hazardous drinking rates, drinkers only, population aged 15-74 [Statistical report]. Retrieved from doi: 10.1787/9789264181069-graph7-en

APA Reference List Format:

Author Last Name, A. A. (Year). Title of document: Subtitle if any (Version # if available) [Description of document]. Retrieved from URL/DOI

Step 5 Use the author's last name and the publication year for in-text citations.

  • For example, you might write: In 2014, life expectancy for males increased 1.4 years (National Center for Health Statistics, 2016).
  • If you mention the author in the text of your paper, include the year in parentheses immediately after the author's name. For example, you might write: According to the National Center for Health Statistics (2016), life expectancy for males increased by 1.4 years in 2014.
  • If you happen to mention both the author's name and the year of publication in the text of your paper, there's no need for a parenthetical citation unless you have directly quoted the source. In that case, you would include the page number in parentheses at the end of the sentence.

Step 1 Start with the name of the individual or institutional author.

  • Institutional author example: National Center for Health Statistics.
  • Individual author example: Sunshine, Sally K.
  • For 2 or 3 authors, list each author's name separated by commas with the word "and" before the final author's name All authors other than the first author are listed in first name-last name format. For example: Sunshine, Sally K. and Luna Wolfe.
  • If there are more than 3 authors, type the first author's name followed by a comma and the abbreviation "et. al." For example: Sunshine, Sally K., et. al. [10] X Research source

Step 2 Include the title of the statistical document in italics.

  • Example: National Center for Health Statistics. Health, United States, 2015: With Special Feature on Racial and Ethnic Disparities .

Step 3 List the location and name of the publisher.

  • Example: National Center for Health Statistics. Health, United States, 2015: With Special Feature on Racial and Ethnic Disparities . Washington, D.C.: U.S. Government Printing Office.

Step 4 Provide the distributor of the statistics if different from the publisher.

  • Example: National Center for Health Statistics. Health, United States, 2015: With Special Feature on Racial and Ethnic Disparities . Washington, D.C.: U.S. Government Printing Office. Distributed by Hyattsville, MD: National Center for Health Statistics, 2016.

Step 5 Close with a permalink URL or DOI, if applicable.

  • URL Example: National Center for Health Statistics. Health, United States, 2015: With Special Feature on Racial and Ethnic Disparities . Washington, D.C.: U.S. Government Printing Office. Distributed by Hyattsville, MD: National Center for Health Statistics, 2016. https://www.cdc.gov/nchs/data/hus/hus15.pdf.
  • DOI example: Organization for Economic Co-operation and Development. Hazardous Drinking Rates, Drinkers Only, Population Aged 15-74 . Paris, France: OECD iLibrary. doi: 10.1787/9789264181069-graph7-en.

Chicago Bibliography Format:

Author Last Name, First Name. Title of Document: Subtitle if Any. Location: Publisher. Distributed by Location: Distributor (if different from publisher), Year. URL/DOI.

Step 6 Use the same information with different punctuation in footnotes.

  • Example: National Center for Health Statistics, Health, United States, 2015: With Special Feature on Racial and Ethnic Disparities , (Washington, D.C.: U.S. Government Printing Office, distributed by Hyattsville, MD: National Center for Health Statistics, 2016) https://www.cdc.gov/nchs/data/hus/hus15.pdf, 65-86.

Expert Q&A

  • If the statistics are reported in a book or journal article, rather than a statistical report or data set, cite to the book or journal article. There is no special format required because you're citing statistics rather than other content. Thanks Helpful 0 Not Helpful 0

how to cite research data

You Might Also Like

Cite the WHO in APA

  • ↑ https://www.icpsr.umich.edu/files/ICPSR/enewsletters/iassist.html
  • ↑ https://owl.purdue.edu/owl/research_and_citation/mla_style/mla_formatting_and_style_guide/mla_in_text_citations_the_basics.html
  • ↑ https://guides.library.ucsc.edu/citedata
  • ↑ https://owl.purdue.edu/owl/research_and_citation/apa_style/apa_formatting_and_style_guide/reference_list_author_authors.html
  • ↑ https://owl.purdue.edu/owl/research_and_citation/apa_style/apa_formatting_and_style_guide/in_text_citations_the_basics.html
  • ↑ https://guides.lib.umich.edu/c.php?g=282964&p=3285995
  • ↑ https://library.ulethbridge.ca/chicagostyle/books/multiple
  • ↑ https://research.wou.edu/c.php?g=551307&p=3785233

About This Article

Gerald Posner

  • Send fan mail to authors

Did this article help you?

Do I Have a Dirty Mind Quiz

Featured Articles

How to Increase Your Self Confidence with Positive Daily Practices

Trending Articles

What's the Best Vegan Meal Kit for Me Quiz

Watch Articles

Clean the Bottom of an Oven

  • Terms of Use
  • Privacy Policy
  • Do Not Sell or Share My Info
  • Not Selling Info

Get all the best how-tos!

Sign up for wikiHow's weekly email newsletter

University of Iowa

University Libraries - Research Data Services

  • How to Cite Data and Code
  • Research Data home
  • NIH Data Management & Sharing Plan Checklist
  • NIH Data Plan Templates and Examples
  • Examples of NIH Plans
  • NSF – Data Management Plans
  • Data Management Plans – Other Agencies
  • Documenting Data: Metadata
  • File Formats
  • File Naming & Organization
  • File Version Control
  • Lab Notebooks
  • Spreadsheet Data Structure
  • Storage & Backup
  • Selecting a Data Repository
  • Data Deposit Guide for IRO
  • IRO Metadata Best Practices
  • Licenses & Copyright
  • Documenting Your Code
  • Software and Code Licenses
  • UI Research Data Policies
  • Data Analysis & Visualization
  • Data Sources
  • Open Science
  • Research Data News
  • Training & Other Events
  • Request a workshop/training
  • Contact us / Get help or referral

Citing Data |  Data Availability Statements | Citing Code

Cite Your Own Data

Are you publishing a paper referencing your research data? Include a reference to your data in the text of the paper with a data availability statement and add a data citation to your references section.

This will ensure that the data citation becomes part of the scholarly record and provides pathways for others to find your work. Research funders also want you to share data and a citation is proof of your data being shared.

If you are depositing data with the UI, we can reserve a DOI for your dataset, so you can include it in the article submission. We can also assist with sharing and publishing data.  More here

Cite Others’ Data

Give credit to other data sources when you use them, just as you do when using published literature. Whether for a paper or a presentation, it’s important to cite the data files used.

Citation Elements for Data

A data citation should include at least the following elements. The specific information will depend on established practices in your research field, as well as the type of data, the repository you use, and the citation style of the publication.

  • Responsible party (i.e., investigator, sample collector, creator)
  • Title of dataset
  • Date of publication of the dataset
  • Version, when appropriate
  • Name of data center, repository, and/or publication
  • Analysis software, if required
  • Date accessed
  • Identifier (e.g., DOI or other persistent link)

Tip: Citation formatters If you have a DOI, you can use the CrossCite DOI data citation formatter  or the  DataCite citation formatter  to create citations corresponding to a variety of citation styles.

Most data repositories will provide a suggested citation for their datasets. Some will also request that you cite the related publication(s) along with the data. Follow the most appropriate format while meeting the requirements of the data creators and repositories.

Guidelines and Examples

Citation style guides/manuals are beginning to include data as a resource type. The Citation Formatters (above) will provide the information in a style that approximates style requirements, so you may want to confirm that those generated citations completely follow a particular citation style guide.

Here are some examples of guidelines:

  • American Geophysical Union (AGU)  author guidelines  for citing data sets
  • Federation of Earth Science Information Partners (ESIP) Interagency Data Stewardship/Citations
  • Citing and linking to the Gene Expression Omnibus (NCBI) database
  • Using data in  Dryad
  • The Inter-university Consortium for Political and Social Research (ICPSR) provides recommended citation procedures
  • DataCite citation examples

Data Availability Statements

If you’re publishing an article using your research data, the journal may require a data availability statement that briefly describes if and how readers can access the data that informs the research. The chart below shows some sample language you might use for a data availability statement. More examples of template data availability statements, which include examples of openly available and restricted access datasets, are from several publishers, including Taylor & Francis and Cambridge University Press .

The datasets generated during and/or analyzed during the current study are available in the [repository name, e.g. “Iowa Research Online”] at [http://doi.org/[doi]]
The datasets generated during and/or analyzed during the current study are not publicly available due to [explanation of restrictions, e.g. “their containing private information”] but are available from the corresponding author on reasonable request.
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

 

All data generated or analyzed during this study are included in this published article [and/or] its supplementary information files.

 

The data that support the findings of this study are available from [third party name] but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are, however, available from the authors upon reasonable request and with permission of [third party name].

 

The chart above is adapted from the article cited below and licensed under a Creative Commons Attribution license (CC-BY): Hrynaszkiewicz, I, Simons, N, Hussain, A, Grant, R and Goudie, S. 2020. “Developing a Research Data Policy Framework for All Journals and Publishers.” Data Science Journal , DOI: http://doi.org/10.5334/dsj-2020-005

Citing code (your own and that of others) is equally important as citing data, and for similar reasons: you’re providing appropriate credit, facilitating reproducibility, and ensuring future researchers can find and use the code.

Citation Elements for Code

  • Creator (i.e., authors or organization who developed the software)
  • Date of publication
  • Publisher (e.g., repository name)

The Force11 Software Citation Implementation Working Group has created principles for software citation. Their GitHub page shows examples of citing software in both APA and Chicago Style.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 20 November 2018

A data citation roadmap for scientific publishers

  • Helena Cousijn   ORCID: orcid.org/0000-0001-6660-6214 1   na1   nAff14 ,
  • Amye Kenall 2   na1 ,
  • Emma Ganley   ORCID: orcid.org/0000-0002-2557-6204 3 ,
  • Melissa Harrison   ORCID: orcid.org/0000-0003-3523-4408 4 ,
  • David Kernohan 5 ,
  • Thomas Lemberger   ORCID: orcid.org/0000-0002-2499-4025 6 ,
  • Fiona Murphy   ORCID: orcid.org/0000-0003-1693-1240 7 ,
  • Patrick Polischuk   ORCID: orcid.org/0000-0002-6866-1628 12 ,
  • Simone Taylor 8 ,
  • Maryann Martone 9 &
  • Tim Clark   ORCID: orcid.org/0000-0003-4060-7360 10 , 11  

Scientific Data volume  5 , Article number:  180259 ( 2018 ) Cite this article

31k Accesses

82 Citations

123 Altmetric

Metrics details

  • Data publication and archiving
  • Research data

This article presents a practical roadmap for scholarly publishers to implement data citation in accordance with the Joint Declaration of Data Citation Principles (JDDCP), a synopsis and harmonization of the recommendations of major science policy bodies. It was developed by the Publishers Early Adopters Expert Group as part of the Data Citation Implementation Pilot (DCIP) project, an initiative of FORCE11.org and the NIH BioCADDIE program. The structure of the roadmap presented here follows the “life of a paper” workflow and includes the categories Pre-submission, Submission, Production, and Publication. The roadmap is intended to be publisher-agnostic so that all publishers can use this as a starting point when implementing JDDCP-compliant data citation. Authors reading this roadmap will also better know what to expect from publishers and how to enable their own data citations to gain maximum impact, as well as complying with what will become increasingly common funder mandates on data transparency.

Similar content being viewed by others

how to cite research data

Unfolding the downloads of datasets: A multifaceted exploration of influencing factors

how to cite research data

A large dataset of scientific text reuse in Open-Access publications

how to cite research data

CORE: A Global Aggregation Service for Open Access Papers

Introduction.

Over the past several years many authoritative science policy bodies have recommended robust archiving and citation of primary research data to resolve problems in reproducibility, robustness and reusability. Studies by CODATA ( https://www.codata.org ), the U.S. National Academy of Sciences, the Royal Society, and other groups recommend that scholarly articles now treat the primary data upon which they rely as first class research objects 1 – 5 . Primary data should be robustly archived and directly cited as support for findings, just as literature is cited; and where data are re-used for subsequent analysis, it should be cited as well, thus recognising the value of the data, and ensuring credit to those who generated the data. The archived data are strongly recommended – as a matter of good scientific practice - to be “FAIR”: Findable, Accessible, Interoperable, and Reusable 6 ; and to be accessible from the primary article. A widely recommended method for establishing this accessibility is by data citation.

The Joint Declaration of Data Citation Principles (JDDCP) summarizes the recommendations of these studies and has been endorsed by over 100 scholarly organizations, funders and publishers 7 . Further elaboration on how to implement the JDDCP was provided in Starr et al. 8 , with an emphasis on accessibility practices for digital repositories. There is a clear emerging consensus in the scholarly community, including researchers, funders, and publishers, supporting the practice of data archiving and citation. This is reflected not only in the broad endorsement of the JDDCP, but also in the increasing proliferation of workshops on this topic. At least one journal, which had earlier published a widely discussed editorial by clinical trialists arguing against openly sharing data, is now leading an effort to help provide institutional incentives for authors to share and cite data 9 .

There is also evidence to suggest that researchers, primarily to enhance the visibility and impact of their work, but also to facilitate transparency and encourage re-use, are increasingly sharing their own data, and are making use of shared data from other researchers 10 . Researchers with funding that requires open data therefore need to know what to expect from publishers that support data citation. Which databases or repositories are acceptable places to archive their data? Should they deposit in institutional repositories, general-purpose repositories such as Dataverse, Dryad, or Figshare, or domain-specific repositories? How are embargoes handled? Can confidential data or data requiring special license agreements for sharing be archived and cited? How should the citation itself be formatted? We attempt to answer these and other key questions in this article.

While intellectual property and confidentiality remain important considerations for researchers as potential inhibitors to sharing, researchers are also concerned about receiving appropriate citation credit or attribution for major data production efforts. We hope to provide a standardized route to clear and accessible data citation practices, which should help to alleviate most authors’ concerns and clear up any potential confusion about sharing.

The growing expectation of authors to support the assertions made in their research articles in order to maintain transparency and reproducibility requires depositing the underlying data in openly accessible locations (which are not Supplementary files to the research article). There is evidence to support this as a benefit to authors by increasing citations and usage 11 , 12 as well as scientific progress itself 13 . The additional benefit to authors is that their data are more likely to sit alongside appropriate material in a site-specific repository, as well as receiving guidance from the repository managers and curators. Many databases are well equipped to ensure publication of datasets is timed with publication of the associated research article.

Funders and research institutions increasingly will require full primary data archiving and citation. Publishers must therefore adapt their workflows to enable data citation practices and provide tools and guidelines that improve the implementation process for authors and editors, and relieve stress points around compliance. One approach that has been taken as a means to recognising data as a first-class research object is to create Data journals, such as Scientific Data , Data in Brief , and Gigascience . These journals oversee peer review of a publication about the dataset itself and its generation; publication of a data descriptor paper results. However, this article presents a path for other journals to implement data citation developed by a team of experts from leading early adopters of data citation in the publishing world who have collectively outlined a standard model. It covers all phases of the publishing life cycle, from instructions to authors, through internal workflows and peer reviewing, down to digital and print presentation of content.

Implementing data citation is not meant to replace or bypass citation of the relevant literature, but rather to ensure we provide verifiable and re-usable data that supports published conclusions and assertions. Data citation is aimed at significantly improving the robustness and reproducibility of science; and enabling FAIR data at the point of its production. The present document is a detailed roadmap to implementing JDDCP-compliant data citation, prepared by publishers, for an audience of publishers and authors, as part of a larger effort involving roadmap and specification development for and by repositories, informaticians, and identifier / metadata registries 14 , 15 . We hope, in the long run, that open data will become a common enough practice so that all authors will eventually expect to provide it and cite it, and that this practice will be supported by all publishers as a matter of course.

This section briefly explains data citations and presents implementation recommendations for publishers, editors and scholarly societies. Although throughout this roadmap we refer to implementation falling under the remit of the publisher, due to the diversity of publishing models, this might not always be the case. Where an aspect of implementation falls to another party (e.g., a society journal where journal policy would often be set by the society), approval of and participation in implementation from that party would be needed.

Data citations are formal ways to ground the research findings in a manuscript upon their supporting evidence, when that evidence consists of externally archived datasets. They presume that the underlying data have been robustly archived in an appropriate long-term-persistent repository. This approach supersedes “Supplemental Data” as a home for article-associated datasets. It is designed to make data fully FAIR (Findable, Accessible, Interoperable and Reusable).

Publishers implementing data citation will provide domain-specific lists of acceptable repositories for this purpose, or guide authors to sites that maintain these lists. We provide examples of some of these lists further along in the manuscript. Guidance on why and how to cite data for authors can be found in Table 1 . Formatting guidance will differ by publisher and by journal, but some examples of data citation reference styles can be found in Box 1 . Figure 1 illustrates a data citation. Figure 2 shows the ideal resolution structure from data citations, to dataset landing pages, and to archived data.

figure 1

(1) Data citation in text; (2) Reference; (3) Globally resolvable unique identifier. Example from Beresford NA, et al. (2016). Available at https://doi.org/10.1016/j.jenvrad.2015.03.022 16 .

figure 2

Articles (1) link to datasets in appropriate repositories, on which their conclusions are based, through citation to a dataset ( a ), whose unique persistent identifier (PID) resolves ( b ) to a landing page (2) in a well-supported data repository. The data landing page contains human- and machine-readable metadata, to support search and to resolve ( c ) back to the citing article, and ( d ) a link to the data itself (3).

Both the dataset reference in the primary article, including its globally resolvable unique persistent identifier (PID), and the archival repository, should follow certain conventions. These are ultimately based upon the JDDCP’s eight principles. Initial conventions for repositories were developed in Starr et al. 8 and are presented in more depth and detail in Fenner et al. 15 .

The remainder of this article is organized as a set of proposed actions for publishers, and linked to author responsibilities, applicable to each point in the lifecycle of a research article: Pre-submission, Submission, Production, and Publication.

Pre-submission

Revise editor training and advocacy material.

Editor advocacy and training material should be revised. This may differ by journal or discipline, and whether there are in-house editors, academic editors, or both. For example, this might involve updates to the editor training material (internally maintained, for example, on PowerPoint or PDFs, or externally on public websites) or updates to advocacy material. The appropriate material should be revised to enable editors to know what data citation is, why it should be done, what data to cite, and how to cite data. This should equip editors to instruct reviewers and authors on journal policy around data citation.

Revise reviewer training material

Reviewer training material should be revised to equip reviewers with the knowledge about what data authors should cite in the manuscript, how to cite this data and how to access the underlying data to a manuscript. Training material should also communicate expectations around data review. Several other projects are underway focusing on defining criteria for data peer review.

Provide guidance on author responsibilities

Data citation is based on the idea that the data underlying scientific findings or assertions should be treated as first-class research objects. This begins with author responsibility to properly manage their own data prior to submission. The Corresponding Author should have ultimate oversight responsibility to ensure this is done in a transparent, robust and effective way 17 . Researchers are also increasingly required by funders to submit data management plans. No later than the time of submission (and ideally at the time of data generation), researchers should take responsibility for determining an appropriate repository that supports data citation (with landing pages, PID, and versioning) and provides support to ensure appropriate metadata are present. The publisher’s responsibility in this regard is to provide or refer authors to a definitive list of such repositories in the Guide for Authors .

Specify a policy for data citation

Data citation should be implemented at a journal policy level, as part of a journal’s wider policy on data sharing. It is recommended that this policy, since it is discipline-specific, should be determined by the journal community (editor, reviewers, etc.) as well as the publisher. Relevant work in this area is currently being carried out by the RDA Data Policy Standardization and Implementation Working Group.

There are multiple options for a data policy. For example, Springer Nature, Wiley and Elsevier have all rolled out a range of multi-level policies depending on specific journal needs. This means that they offer their journals a range of policy options ranging from encouragement of data sharing, to strong encouragement, to mandatory data sharing. Additionally, data policies can also be defined at the domain level as was done by COPDESS (the Coalition for Publishing Data in the Earth & Space Sciences), an initiative within the geosciences. Another approach, taken by the Public Library of Science (PLOS), was to have a single policy requiring that all underlying data be made available at the time of publication with rare exception for all of their journals 18 . Whatever the level of the policy, it should specify which datasets to cite (whether only underlying data or also relevant data not used for analysis) and how to format data citations. Authors should provide details of previously published major datasets used and also major datasets generated by the work of the paper. It is recommended if at all possible that data citation occurs either in the standard reference list or (less preferable) in a separate list of cited data, formatted similarly to standard literature references. But regardless of where citations appear in the manuscript, they should be in readily parsable form and therefore machine readable.

Ask authors for a Data Availability Statement (DAS)

It is recommended that as part of data citation implementation publishers adopt standardized Data Availability Statements (DASs). DASs provide a statement about where data supporting the results reported in a published article can be found, including, where applicable, unique identifiers linking to publicly archived datasets analyzed or generated during the study. In addition, DASs can increase transparency by providing a reason why data cannot be made immediately available (such as the need for registration, due to ethical or legal restrictions, or because of an embargo period). Some research funders, including Research Councils UK, require data availability statements to be included in publications so it is an important element of a publisher’s data policy. It is recommended that publicly available datasets referred to in DASs are also cited in reference lists.

Specify how to format data citations

Whilst there are many referencing style guides, including formal standards managed by ISO/BS (ISO 690-2010) and ANSI/NISO (NISO Z39.29-2005 R2010), several of the key style guides provide guidance on how to cite datasets in the reference list. In addition, the reference should also include the tag “[dataset]” within the reference citation so that it becomes easily recognizable within the production process. This additional tag does not have to be visible within the reference list of the article. It is critical to ensure the recommended format of the data citation also adheres to the Joint Declaration of Data Citation Principles. Publishers should provide an example of the in-text citation and of the reference to a dataset in their references formatting section (see Box 1 for examples).

Similar to article references, key elements for data citation include, but may not be limited to: author(s), title, year, version, data repository, PID. Researchers should refer to journal-specific information for authors on publisher websites for definitive guidance on how to cite data when submitting their manuscript for publication.

Provide guidance around suitable repositories (general, institutional, and subject-specific) and how to find one

Publishers should provide or point to a list of recommended repositories for data sharing. Many publishers already maintain such a list. The Registry of Research Data Repositories (Re3Data, https://www.re3data.org ) is a full-scale resource of registered repositories across subject areas. Re3Data provides information on an array of criteria to help researchers identify the ones most suitable for their needs (licensing, certificates & standards, policy, etc.). A list of recommended repositories is provided by FAIRsharing.org, where some publishers also maintain collections of recommended resources. FAIRsharing started out as a resource within the life sciences but has recently expanded and now includes repositories within all disciplines.

Where a suitable repository does not exist for a given discipline or subject area, publishers should provide guidance for the use of a general purpose or institutional repository where these meet the recommendations of the repository roadmap 15 (briefly, by providing authors’ datasets with a globally resolvable unique identifier - ideally a DataCite DOI where possible, or other PID, providing a suitable landing page, using open licenses, and ensuring longevity of the resource).

Some research funders may stipulate that data must be deposited in a domain-specific repository where possible, which aligns well with publishers providing lists of recommended repositories.

Examples of publisher- or consortium-maintained recommended repositories lists include:

PLOS: http://journals.plos.org/plosbiology/s/data-availability#loc-recommended-repositories

SpringerNature: http://www.springernature.com/gp/group/data-policy/repositories

EMBO Press: http://emboj.embopress.org/authorguide#datadeposition

Elsevier: https://www.elsevier.com/authors/author-services/research-data/data-base-linking/supported-data-repositories

COPDESS: https://copdessdirectory.osf.io

Fairsharing.org is currently working with several publishers to develop and host a common list. At that time, participating publishers hope to link directly to the single recommended list from their author instructions.

Provide specific guidance on in-text accessions or other identifiers, particularly in citing groups of datasets reused in meta-analyses

Publishers should provide guidance to authors on dealing with list of accessions or other identifiers in text. This is especially relevant for re-used datasets in meta-analysis studies. Particularly, in the biomedical sciences, meta-analyses may reuse a large number of datasets from archives such as the Gene Expression Omnibus (GEO) 20 , 21 . When many input datasets need to be cited, authors should use the EMBL-EBI’s Biostudies database 22 or similar, to group the input accessions under a single master accession for the meta-analysis, and they should then cite the master accession for the Biostudies entry in their reference list. Supplements or Appendices should not be employed for this purpose.

In general, any lists of accessions or other identifiers appearing in the text should be accompanied by appropriate data citations, mapped to appropriate entries in the Reference list, or grouped in a Biostudies or similar entry and cited as a group. Editors should receive guidance from the Publisher on how to promote this approach.

Consider licensing included under “publicly accessible” and implications (e.g. automated reuse of data)

Publishers should consider the types of licensing allowed under their data policy. It is recommended that data submitted to repositories with stated licensing policies should have licensing that allows for the free reuse of that data, where this does not violate protection of human subjects or other overriding subject privacy concerns. Many publishers use Creative Commons licenses as a guide for equivalence criteria that repository licenses should meet.

Update guidelines for internal customer services queries and provide author FAQs

Publishers will need to include a support service around their data policy. This might include a list of author-focused FAQs. Internal FAQs should also be provided to customer services. Alternatively, or in addition, publishers might set up a specific email address for queries concerning data. PLOS, Springer Nature and Elsevier provide such email addresses.

Submission and review

Cite datasets in text of manuscript, and present full data citations in the reference list.

At the submission stage it is important that all the required elements are captured to create a data citation: author(s), title, year, version, data repository, PID. The recommended way to capture data citations is to have authors include these in the reference list of the manuscript. Instructions for data citation formatting can be found in the pre-submission section above and will depend on the reference style of the journal. In all cases, datasets should be cited in the text of the manuscript and the reference should appear in the reference list. To ensure data references are recognized, authors should indicate with the addition of “[dataset]” that this is a data reference (see examples in Box 1 above).

Data availability should be captured in a structured way

At the time of submission, authors should be requested to include a DAS about the availability of their data. In situations where data cannot be made publicly available, this should be explained here. This statement can be used to detail any other relevant data-related information. The JATS for Reuse (JATS4R) group has produced a draft recommendation for tagging data availability statements ( http://jats4r.org/data-availability-statements ). This group recommends the statement is separate and not displayed as part of the acknowledgements.

Editors and reviewers are enabled to check the data citation and underlying data

Through the data citation, editors and reviewers should be able to access underlying datasets. Datasets on which any claims in an article are based should always be available to peer reviewers. If researchers do not want their data to be public ahead of the manuscript’s publication, some repositories can provide a reviewer access link. If available, this should be provided at the time of submission. If the data are not available from the repository during review, authors should be willing to work with the Publisher to provide access in another mutually agreeable manner. Reviewer forms should be updated with information on how to access the data and a question about whether data sharing standards/policies have been met. Publishers should be mindful that they do not reveal the identity of the authors in cases where peer-review is double-blind.

Processing Data citations

When data citations are present in the reference list of the manuscript, these should be processed in the same way as other references by the publisher. This means that formatting and quality control should take place at the production stage (see JATS4R data citation recommendations: https://jats4r.org/data-citations ).

DOIs and Compact Identifiers

Digital Object Identifiers (DOIs) are well understood by publishers as identifiers. DOIs are also assigned by many repositories to identify datasets. When available, they should be included in the data reference similarly to the use of DOIs for article references. An advantage to DOIs for data is that the associated metadata are centrally managed by the DataCite organization, similarly to how Crossref manages article metadata. DataCite and Crossref collaborate closely.

However, many domain-specific repositories in biomedical research do not issue DOIs, instead they issue locally-assigned identifiers (“accessions”, “accession numbers”). Funders of biomedical research may require data to be deposited in domain specific repositories e.g. GEO, dbGAP, and SRA, many of which use such locally resolvable accession numbers in lieu of DOIs.

Prior informal practice had been to qualify these by a leading prefix, so that the identifier became unique. In 2012 the European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI) began tracking and issuing formal namespace prefixes to avoid collisions and support formal resolution on the Web 23 . Subsequent efforts developed a collaborative curation model 24 .

Now EMBL-EBI and the California Digital Library (CDL) maintain a common shared namespace registry and resolvers capable of interpreting and resolving PREFIX:ACCESSION patterns, or “compact identifiers”, hosted at these leading institutions 14 . Technical work to develop the common repository and resolution rules was coordinated with the work of our Publishers Roadmap team.

This means that compact identifiers have now been formalized, are institutionally supported in the U.S. and in Europe, and may be used by in place of DOIs. We recommend this be done (1) where the repository does not issue DOIs for deposited datasets and (2) where the repository’s prefix has been registered. Similar to DOIs, compact identifiers are dereferenced by resolvers hosted at well-known resolver web addresses: http://identifiers.org (EMBL-EBI) and http://n2t.net (CDL). These resolver addresses, for example, both resolve the Gene Expression Omnibus local accession number GDS5157 (as https://identifiers.org/GEO:GDS5157 or https://n2t.net/GEO:GDS5157 ) to a primary expression dataset generated on the Illumina MouseWG-6 v2.0 expression beadchip, supporting findings on genetics of fear expression in an article by Andero et al. 25 ).

While these resources are still under active development to resolve an increasing number of identifiers, ensuring that either a DOI or a Compact Identifier is associated with data references will be important to support automatic resolution of these identifiers by software tools, which benefits authors, data providers and service providers. Other working group efforts are underway within the Research Data Alliance (RDA), for example in the Scholarly Link Exchange (Scholix) project ( https://www.rd-alliance.org/groups/rdawds-scholarly-link-exchange-scholix-wg ); and in other efforts such as THOR and FREYA (funded by the European Commission) to ensure the infrastructure to enable accurate and expedient resolvable linking between publishers, referenced datasets, and repositories.

The main relevant components of the production process are the input from the peer review process (typically author manuscript in Word or LaTex files), conversion of this to XML and other formats (such as PDF, ePub), and the author proofing stage. Following all the preceding recommendations for the editorial process, the production process needs to identify relevant content and convert it to XML.

Data citations

The production department and its vendor(s) must ensure all data citations provided by the author in the reference list are processed appropriately using the correct XML tags. Typesetters must be provided with detailed instructions to achieve this. It is out of the scope of this paper to provide tools to identify datasets that are alluded to in a manuscript but are not present in the reference list; however, simple search and find commands could be executed using common terms and common database names.

XML requirements for data citations

For publishers using NISO standard JATS, version 1.1 and upwards, the JATS4R recommendation on data citations should be followed. The main other publisher-specific DTDs contain similar elements to allow for correct XML tagging.

eLIFE recommendation: https://github.com/elifesciences/XML-mapping/blob/master/elife-00666.xml

JATS4R recommendation and examples: http://jats4r.org/data-citations

Data availability statement (DAS)

Output format from the editorial process will inform the production department as to how to identify and handle this content. For instance, some publishers require authors to provide the details within the submission screens and thus can output structured data from the submission system to production, others require a separate Word file to be uploaded, and others request the authors include this information in the manuscript file. Depending on the method used, production will need to process and convert this content accordingly.

Where the DAS will be contained/displayed within the PDF/ePub format of the article is decided by the individual publisher and this group will not provide recommendations for this.

Publication

Display data citations in the article.

There are two primary methods of displaying data citations in a manuscript--in a separate data citations section or in the main references section. A separate data citations section promotes visibility, but inclusion in the main references section helps establish equal standing between data citations and standard references, and aids machine readable recovery, so is recommended.

Data citations should include a PID (such as a DOI) and should ideally include the minimum information recommended by DataCite and the FORCE11 data citation principles (Author, year, title, PID, repository). Where possible, PIDs should be favored over URLs, and they should function as links that resolve to the landing page of the dataset. Optionally, some publishers may choose to highlight the datasets on which the study relies by visualizing these.

Data Availability Statements

Data Availability Statements (DAS) should be rendered in the article (see Fig. 3 ).

figure 3

Taken from Ma et al. 19 . Available at https://doi.org/10.1186/s13059-018-1435-z .

Downstream delivery to Crossref

Crossref ensures that links to scholarly literature persist over time through the Digital Object Identifier (DOI). They also provide infrastructure to the community by linking the publications to associated works and resources through the metadata that publishers deposit at publication, making research easy to find, cite, link, and assess. Links to data resources (i.e., data citations) are part of this service.

There are two main ways publishers can deposit data citations to crossref and both are part of the existing content registration process/metadata deposit. They can deposit as bibliographic references and/or relation-type component:

Bibliographic references: Publishers include the data citation into the deposit of bibliographic references, following the normal process for depositing references (citations) by applying tags to structure the metadata, as applicable.

Relation type: Publishers assert the data citation in an existing section of the metadata deposit dedicated to connecting the publication to a variety of research objects associated with it (e.g., data and software, supporting information, protocols, videos, published peer reviews, preprint, conference papers). In addition to providing structured information about the data, it also allows publishers to identify whether the data are a direct output of the research results or is referenced from elsewhere. Also, if the publisher has not opened their references (see https://i4oc.org/ ) this is the only way this information will be publicly available for data mining.

Each method has its own benefit but using both is encouraged where possible. By sending these data citations to Crossref, they become available in a Scholix compliant way ( http://www.scholix.org/ ) which enables their retrieval through ScholeXplorer or Event Data- an easy way for both publishers and repositories to retrieve information about associations between articles and datasets.

More detail can be found in the Data & Software Citations Deposit Guide 26 .

Downstream delivery to PubMed

Metadata about data linked as a direct output of the research results can be deposited with the PubMed record for a research article for inclusion within PubMed, which maintains a controlled list of allowed databases here:

https://www.ncbi.nlm.nih.gov/books/NBK3828/#publisherhelp.Object_O

Here is a tagging example:

<Object Type=“Dryad”>

<Param Name=“id”>

10.5061/dryad.2f050</Param>

</Object>

Box 1: Reference style examples for citing data

Numbered style:

[dataset] [27] M. Oguro, S. Imahiro, S. Saito, T. Nakashizuka, Mortality data for Japanese oak wilt disease and surrounding forest compositions, Mendeley Data, v1, 2015. https://doi.org/10.17632/xwj98nb39r.1

[dataset] [28] D. Deng, C. Xu, P.C. Sun, J.P. Wu, C.Y. Yan, M.X. Hu, N. Yan, Crystal structure of the human glucose transporter GLUT1, Protein Data Bank, 21 May 2014. https://identifiers.org/pdb:4pyp

Harvard style:

[dataset] Farhi, E., Maggiori, M., 2017. “Replication Data for: ‘A Model of the International Monetary System’“, Harvard Dataverse, V1. https://doi.org/10.7910/DVN/8YZT9K

[dataset] Aaboud, M, Aad, G, Abbott, B, Abdallah, J, Abdinov, O, Abeloos, B, AbouZeid, O, Abraham, N, Abramowicz, H, Abreu, H., 2017. Dilepton invariant mass distribution in SRZ. HEPData, 2017-02-08. https://doi.org/10.17182/hepdata.76903.v1/t1

Vancouver style:

[dataset] [52] Wang G, Zhu Z, Cui S, Wang J. Data from: Glucocorticoid induces incoordination between glutamatergic and GABAergic neurons in the amygdala. Dryad Digital Repository, August 11, 2017. https://doi.org/10.5061/dryad.k9q7h

[dataset] [17] Polito VA, Li H, Martini-Stoica H, Wang B et al. Transcription factor EB overexpression effect on brain hippocampus with an accumulation of mutant tau deposits. Gene Expression Omnibus, December 19, 2013. https://identifiers.org/GEO:GDS5303

[dataset] Golino, H., Gomes, C. (2013). Data from the BAFACALO project: The Brazilian Intelligence Battery based on two state-of-the-art models: Carroll’s model and the CHC model. Harvard Dataverse, V1, https://doi.org/10.7910/DVN/23150

[dataset] Justice, L. (2017). Sit Together and Read in Early Childhood Special Education Classrooms in Ohio (2008-2012). ICPSR 36738. https://doi.org/10.3886/ICPSR36738.v1

[dataset] 12. Kory Westlund, J. Measuring children’s long-term relationships with social robots. Figshare, v2; 2017. https://doi.org/10.6084/m9.figshare.5047657

[dataset] 34. Frazier, JA, Hodge, SM, Breeze, JL, Giuliano, AJ, Terry, JE, Moore, CM, Makris, N. CANDI Share Schizophrenia Bulletin 2008 data; 2008. Child and Adolescent NeuroDevelopment Initiative. https://doi.org/10.18116/C6159Z

Several publishers are now in the process of implementing the JDDCP in line with the steps described in this roadmap. More work is still needed, both by individual publishers and by this group. This document describes basic steps that should be taken to enable authors to cite datasets. As a next step, improved workflows and tools should be developed to automate data citation further. In addition, authors need to be made aware of the importance of data citation and will require guidance on how to cite data. Ongoing coordination amongst publishers, data repositories, and other important stakeholders will be essential to ensure data are recognized as a primary research output. Table 2 outlines the implementation timelines of the different publishers that participated in this project. To be clear, data citation as described in this article will not be possible at a given publisher until the “planned go-live date”. Until that time, authors are able to cite data in their articles, but this will not necessarily be captured through XML tagging and the other technical processes described in this article.

This roadmap originated through the implementation phase of a project aimed at enhancing the reproducibility of scientific research and increasing credit for and reuse of data through data citation. The project was organized as a series of Working Groups in FORCE11 ( https://force11.org/ ), an international organization of researchers, funders, publishers, librarians, and others seeking to improve digital research communication and eScholarship.

The effort began with the Joint Declaration of Data Citation Principles 7 , 27 , which distilled and harmonized conclusions of significant prior studies by science policy bodies on how research data should be made available in digital scholarly communications. In the implementation phase (the Data Citation Implementation Pilot, ( https://www.force11.org/group/dcip ), repositories, publishers, and data centers formed three Expert Groups, respectively, with the aim of creating clear recommendations for implementing data citation in line with the JDDCP.

Once the steps outlined in this roadmap are implemented, authors will be able to cite datasets in the same way as they cite articles. In addition to ‘author’, ‘year’, and ‘title’, they will need to add the data repository, version and persistent unique identifier to ensure other researchers can unambiguously identify datasets. Publishers will be able to recognize the references as data references and process these accordingly, so that it becomes possible for data citations to be counted and for researchers to get credit for their work. These are essential steps for substantially increasing the FAIRness 6 of research data. We believe this will in turn lead to better, more reproducible, and re-usable science and scholarship, with many benefits to society.

In a series of teleconferences over a period of a year, major publishers compared current workflows and processes around data citation. Challenges were identified and recommendations structured according to the publisher workflows were drafted. In July 2016 this group met with additional representatives from publishers, researchers, funders, and not-for-profit open science organizations in order to resolve remaining challenges, validate recommendations, and to identify future tasks for development. From this the first full draft of the Publisher Roadmap was created. Feedback was then solicited and incorporated from other relevant stakeholders in the community as well as the other Data Citation Implementation Pilot working groups.

Additional Information

How to cite this article : Cousijn, H. et al . A data citation roadmap for scientific publishers. Sci. Data . 5:180259 doi: 10.1038/sdata.2018.259 (2018).

Publisher’s note : Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Uhlir P. (ed.) For attribution: developing data attribution and citation practices and standards: summary of an international workshop . (National Academies: Washington DC, 2012).

CODATA/ITSCI Task Force on Data Citation. Out of cite, out of mind: the current state of practice, policy and technology for data citation. Data Sci Journal 12 , 1–75 https://doi.org/10.2481/dsj.OSOM13-043 (2013).

Article   Google Scholar  

Hodson, S. & Molloy, L. Current best practice for research data management policies. Zenodo https://doi.org/10.5281/zenodo.27872 (2015).

Committee on Ensuring the Utility and Integrity of Research Data in a Digital Age. Ensuring the integrity, accessibility, and stewardship of research data in the digital age . (The National Academies Press, 2009).

Royal Society. Science as an open enterprise . (The Royal Society Science Policy Center: London, 2012).

Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3 , 160018 https://doi.org/10.1038/sdata.2016.18 (2016).

Data Citation Synthesis Group. Joint declaration of data citation principles. FORCE11 https://doi.org/10.25490/a97f-egyk (2014).

Starr, J. et al. Achieving human and machine accessibility of cited data in scholarly publications. PeerJ 1 , e1 https://doi.org/10.7717/peerj-cs.1 (2015).

Bierer, B. E., Crosas, M. & Pierce, H. H. Data authorship as an incentive to data sharing. N Engl J Med 377 , 402 https://doi.org/10.1056/NEJMc1707245 (2017).

Vocile, B. Open science trends you need to know about. in Discover the Future of Research . (The Wiley Network, 2017).

Michener, W. K. Ecological data sharing. Ecol Inform 29 , 33–44 https://doi.org/10.1016/j.ecoinf.2015.06.010 (2015).

Piwowar, H. A., Day, R. S. & Fridsma, D. B. Sharing detailed research data is associated with increased citation rate. PLOS ONE 2 , e308 https://doi.org/10.1371/journal.pone.0000308 (2007).

Article   ADS   Google Scholar  

McKiernan, E. C. et al. How open science helps researchers succeed. eLife 5 , e16800 https://doi.org/10.7554/eLife.16800 (2016).

Wimalaratne, S. M. et al. Uniform resolution of compact identifiers for biomedical data. Sci. Data 5 , 180029 https://doi.org/10.1038/sdata.2018.29 (2018).

Fenner, M. et al. A data citation roadmap for scholarly data repositories. bioRxiv 097196 https://doi.org/10.1101/097196 (2017).

Beresford, N.A. et al. Making the most of what we have: application of extrapolation approaches in radioecological wildlife transfer models. Journal of Environmental Radioactivity 151 , 373–386 https://doi.org/10.1016/j.jenvrad.2015.03.022 (2016).

Article   CAS   Google Scholar  

McNutt, M. et al. Transparency in authors’ contributions and responsibilities to promote integrity in scientific publication. bioRxiv 140228 https://doi.org/10.1101/140228 (2017).

Bloom, T., Ganley, E. & Winker, M. Data access for the open access literature: PLOS’s data policy. PLOS Biol 12 , e1001797 https://doi.org/10.1371/journal.pbio.1001797 (2014).

Ma, C et al. RNA m6A methylation participates in regulation of postnatal development of the mouse cerebellum. Genome Biol 19 , 68 https://doi.org/10.1186/s13059-018-1435-z (2018).

Edgar, R., Domrachev, M. & Lash, A. Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30 , 207–210 https://doi.org/10.1093/nar/30.1.207 (2002).

Barrett, T. et al. NCBI GEO: archive for functional genomics data sets—10 years on. Nucleic Acids Res 39 , D1005–D1010 https://doi.org/10.1093/nar/gkq1184 (2011).

Sarkans, U. et al. The BioStudies database—one stop shop for all data supporting a life sciences study. Nucleic Acids Res 46 , D1266–D1270 https://doi.org/10.1093/nar/gkx965 (2018).

Juty, N., Le Novère, N. & Laibe, C. Identifiers.org and MIRIAM registry: community resources to provide persistent identification. Nucleic Acids Res 40 , D580–D586 https://doi.org/10.1093/nar/gkr1097 (2012).

Juty, N., Le Novère, N., Hermjakob, H. & Laibe, C. Towards the collaborative Curation of the Registry underlying identifiers.org. Database 2013 , bat017-bat017 https://doi.org/10.1093/database/bat017 (2013).

Andero, R., Dias, Brian, G. & Ressler, KJ A role for Tac2, NkB, and Nk3 receptor in normal and dysregulated fear memory consolidation. Neuron 83 , 444–454 https://doi.org/10.1016/j.neuron.2014.05.028 (2014).

Crossref. Crossref data & software citation deposit guide for publishers https://support.crossref.org/hc/en-us/articles/215787303-Crossref-Data-Software-Citation-Deposit-Guide-for-Publishers (2018).

Altman, M., Borgman, C., Crosas, M. & Martone, M. An introduction to the joint principles for data citation. Bull Am Soc Inf Sci 41 , 43–45 https://doi.org/10.1002/bult.2015.1720410313 (2015).

Google Scholar  

Download references

Acknowledgements

Research reported in this publication was supported in part by the National Institutes of Health under award number U24HL126127. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The authors gratefully acknowledge the following members of the FORCE11/bioCADDIE Data Citation Pilot Publishers Expert Group, who participated in workshops and/or telecons to assist in developing this Roadmap: Helen Atkins (Public Library of Science); Paul Donohoe (SpringerNature); Scott Edmunds (GigaScience); Martin Fenner (DataCite); Ian Fore (National Cancer Institute, National Institutes of Health); Carole Goble (University of Manchester); Florian Graef (European Bioinformatics Institute); Iain Hrynaszkiewicz (SpringerNature); Johanna McEntyre (European Bioinformatics Institute); Ashlynne Merrifield (Taylor and Francis); Eleonora Presani (Elsevier); Perpetua Socorro (Frontiers); Caroline Sutton (Taylor and Francis); Michael Taylor (Digital Science).

Author information

Helena Cousijn

Present address: Present address: DataCite, Hannover, Germany.,

Helena Cousijn and Amye Kenall: These authors contributed equally to this work.

Authors and Affiliations

Elsevier, Amsterdam, 1043 NX, Netherlands

Springer Nature, London, N1 9XW, UK

Amye Kenall

Public Library of Science, San Francisco, 94111, CA, USA

Emma Ganley

eLife Sciences Publications, Ltd, Cambridge, CB4 1YG, UK

Melissa Harrison

JISC, Bristol, BS2 0JA, UK

David Kernohan

EMBO Press, Heidelberg, 69117, Germany

Thomas Lemberger

University of Reading, Reading, RG6 6AH, UK

Fiona Murphy

John Wiley & Sons, Inc., Hoboken, 07030, NJ, USA

Simone Taylor

University of California San Diego, La Jolla, 92093, CA, USA

Maryann Martone

University of Virginia, School of Medicine, Charlottesville VA, 22908, USA

University of Virginia, Data Science Institute, Charlottesville VA, 22904, USA

Crossref, Lynnfield, MA, 01940, USA

Patrick Polischuk

You can also search for this author in PubMed   Google Scholar

Contributions

H.C. and A.K. co-chaired the DCIP Publishers Expert Group which produced this article. They had primary responsibility for leading regular telecons as well as a face-to-face meeting of participants (see Acknowledgements) at the Springer Nature London campus in July of 2016. H.C. and A.K. provided the article structure; organized their Expert Group to collect and integrate information from the participating publishers, including their own organizations; and did the majority of writing for this article. They made equal contributions to the work. E.G., P.P., M.H., D.K., T.L., S.T. and F.M. participated in the work of the Publishers Expert Group and co-authored this article. They provided knowledgeable content and input to the work from the perspectives of their respective organizations. In addition, M.H. coordinated and informed this work with the perspective of the JATS4R group (Journal Article Tag Suite for Reuse), which she chairs. T.C. coordinated the work of the Publishers Expert Group with the other DCIP participants (Repositories, Identifiers, JATS, and Primer/FAQ), co-authored sections of this article and edited the whole. T.C. and M.M. co-led the Data Citation Implementation Pilot as a whole.

Corresponding author

Correspondence to Helena Cousijn .

Ethics declarations

Competing interests.

Scientific Data is published by Springer Nature, one of the participating publishers and authors.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article.

Cousijn, H., Kenall, A., Ganley, E. et al. A data citation roadmap for scientific publishers. Sci Data 5 , 180259 (2018). https://doi.org/10.1038/sdata.2018.259

Download citation

Received : 18 September 2018

Accepted : 04 October 2018

Published : 20 November 2018

DOI : https://doi.org/10.1038/sdata.2018.259

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Best practices for genetic and genomic data archiving.

  • Deborah M. Leigh
  • Amy G. Vandergast
  • Ivan Paz-Vinas

Nature Ecology & Evolution (2024)

The challenges of research data management in cardiovascular science: a DGK and DZHK position paper—executive summary

  • Sabine Steffens
  • Katrin Schröder

Clinical Research in Cardiology (2024)

Of data and transparency

Nature Computational Science (2023)

Visibility, impact, and applications of bibliometric software tools through citation analysis

  • Robert Tomaszewski

Scientometrics (2023)

Journal Production Guidance for Software and Data Citations

  • Shelley Stall
  • Geoffrey Bilder
  • Timothy Clark

Scientific Data (2023)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

how to cite research data

How to Cite

  • Getting Started
  • Citation Elements
  • Major Styles
  • Using Style Guides
  • Citation Management Tools
  • Data: Citation Elements
  • Data: Citation Styles
  • Data: Repositories
  • Data: Additional Resources
  • Citing Indigenous Elders and Knowledge Keepers
  • Citing Asian-Language Sources
  • Citing Generative AI
  • Recap: Videos
  • Troubleshooting and FAQ

Repositories

Dataset repositories, also known as research data repositories , provide researchers with a stable place to store and provide others with access to their research data.

Depending on the research discipline, data can often be deposited in one or more data centers (or repositories) that will provide access to the data. These repositories may have specific requirements :

  • subject/research domain
  • data re-use and access

(UO Libraries. "Data Repositories" . Research Data Management)

Some dataset repositories also have their own guidelines and suggestions for how to construct a data citation, which elements to include, and where to find those on the site. Look carefully around the repository's website to see if you can find any information about citing their data.

If you are interested in looking at some research data repositories here is a very short list. Databib maintains a very extensive list of research data repositories if you would like to explore further.

  • Abacus Abacus is the Research Data Collection of the British Columbia Research Libraries' Data Services, a collaboration involving the Data Libraries at Simon Fraser University (SFU), the University of British Columbia (UBC), the University of Northern British Columbia (UNBC) and the University of Victoria (UVic).
  • figshare A cloud-based storage system which "allows researchers to publish all of their data in a citable, searchable and sharable manner." NOTE: "all figures, media, poster, papers and multiple file uploads (filesets) are published under a CC-BY license... All datasets are published under CC0 "
  • UK Data Service Provides "single point of access to a wide range of secondary data including large-scale government surveys, international macrodata, business microdata, qualitative studies and census data from 1971 to 2011." Mostly UK data, but also includes some data from IGOs like the IMF, OECD and the World Bank.
  • ICSPR "An international consortium of more than 700 academic institutions and research organizations....ICPSR maintains a data archive of more than 500,000 files of research in the social sciences. It hosts 16 specialized collections of data in education, aging, criminal justice, substance abuse, terrorism, and other fields."

source: http://wiki.ubc.ca/Library:How_to_Cite_Data

  • << Previous: Data: Citation Styles
  • Next: Data: Additional Resources >>
  • Last Updated: Jun 11, 2024 1:10 PM
  • URL: https://guides.library.ubc.ca/howtocite
  • Download PDF
  • CME & MOC
  • Share X Facebook Email LinkedIn
  • Permissions

Modernizing the Data Infrastructure for Clinical Research to Meet Evolving Demands for Evidence

  • 1 Verily Life Sciences, South San Francisco, California
  • 2 Center for Biostatistics & Qualitative Methodology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania
  • 3 Bakar Computational Health Sciences Institute, University of California, San Francisco
  • 4 Center for Data-Driven Insights and Innovation, University of California Health, Oakland
  • 5 Faculty of Health Sciences, McMaster University, Hamilton, Ontario, Canada
  • 6 Departments of Surgery and Radiology and Institute for Health Policy Studies, University of California, San Francisco
  • 7 Anesthesiology and Critical Care, University of Pennsylvania Perelman School of Medicine, Philadelphia
  • 8 Biogen, Boston, Massachusetts
  • 9 Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland
  • 10 Yale University School of Medicine, New Haven, Connecticut
  • 11 National Institute for Health and Care Research (NIHR) Health and Social Care Delivery Research Programme, London, United Kingdom
  • 12 Intensive Care National Audit & Research Centre (ICNARC), London, United Kingdom
  • 13 Highlander Health, Dallas, Texas

Importance   The ways in which we access, acquire, and use data in clinical trials have evolved very little over time, resulting in a fragmented and inefficient system that limits the amount and quality of evidence that can be generated.

Observations   Clinical trial design has advanced steadily over several decades. Yet the infrastructure for clinical trial data collection remains expensive and labor intensive and limits the amount of evidence that can be collected to inform whether and how interventions work for different patient populations. Meanwhile, there is increasing demand for evidence from randomized clinical trials to inform regulatory decisions, payment decisions, and clinical care. Although substantial public and industry investment in advancing electronic health record interoperability, data standardization, and the technology systems used for data capture have resulted in significant progress on various aspects of data generation, there is now a need to combine the results of these efforts and apply them more directly to the clinical trial data infrastructure.

Conclusions and Relevance   We describe a vision for a modernized infrastructure that is centered around 2 related concepts. First, allowing the collection and rigorous evaluation of multiple data sources and types and, second, enabling the possibility to reuse health data for multiple purposes. We address the need for multidisciplinary collaboration and suggest ways to measure progress toward this goal.

Read More About

Franklin JB , Marra C , Abebe KZ, et al. Modernizing the Data Infrastructure for Clinical Research to Meet Evolving Demands for Evidence. JAMA. Published online August 05, 2024. doi:10.1001/jama.2024.0268

Manage citations:

© 2024

Artificial Intelligence Resource Center

Cardiology in JAMA : Read the Latest

Browse and subscribe to JAMA Network podcasts!

Others Also Liked

Select your interests.

Customize your JAMA Network experience by selecting one or more topics from the list below.

  • Academic Medicine
  • Acid Base, Electrolytes, Fluids
  • Allergy and Clinical Immunology
  • American Indian or Alaska Natives
  • Anesthesiology
  • Anticoagulation
  • Art and Images in Psychiatry
  • Artificial Intelligence
  • Assisted Reproduction
  • Bleeding and Transfusion
  • Caring for the Critically Ill Patient
  • Challenges in Clinical Electrocardiography
  • Climate and Health
  • Climate Change
  • Clinical Challenge
  • Clinical Decision Support
  • Clinical Implications of Basic Neuroscience
  • Clinical Pharmacy and Pharmacology
  • Complementary and Alternative Medicine
  • Consensus Statements
  • Coronavirus (COVID-19)
  • Critical Care Medicine
  • Cultural Competency
  • Dental Medicine
  • Dermatology
  • Diabetes and Endocrinology
  • Diagnostic Test Interpretation
  • Drug Development
  • Electronic Health Records
  • Emergency Medicine
  • End of Life, Hospice, Palliative Care
  • Environmental Health
  • Equity, Diversity, and Inclusion
  • Facial Plastic Surgery
  • Gastroenterology and Hepatology
  • Genetics and Genomics
  • Genomics and Precision Health
  • Global Health
  • Guide to Statistics and Methods
  • Hair Disorders
  • Health Care Delivery Models
  • Health Care Economics, Insurance, Payment
  • Health Care Quality
  • Health Care Reform
  • Health Care Safety
  • Health Care Workforce
  • Health Disparities
  • Health Inequities
  • Health Policy
  • Health Systems Science
  • History of Medicine
  • Hypertension
  • Images in Neurology
  • Implementation Science
  • Infectious Diseases
  • Innovations in Health Care Delivery
  • JAMA Infographic
  • Law and Medicine
  • Leading Change
  • Less is More
  • LGBTQIA Medicine
  • Lifestyle Behaviors
  • Medical Coding
  • Medical Devices and Equipment
  • Medical Education
  • Medical Education and Training
  • Medical Journals and Publishing
  • Mobile Health and Telemedicine
  • Narrative Medicine
  • Neuroscience and Psychiatry
  • Notable Notes
  • Nutrition, Obesity, Exercise
  • Obstetrics and Gynecology
  • Occupational Health
  • Ophthalmology
  • Orthopedics
  • Otolaryngology
  • Pain Medicine
  • Palliative Care
  • Pathology and Laboratory Medicine
  • Patient Care
  • Patient Information
  • Performance Improvement
  • Performance Measures
  • Perioperative Care and Consultation
  • Pharmacoeconomics
  • Pharmacoepidemiology
  • Pharmacogenetics
  • Pharmacy and Clinical Pharmacology
  • Physical Medicine and Rehabilitation
  • Physical Therapy
  • Physician Leadership
  • Population Health
  • Primary Care
  • Professional Well-being
  • Professionalism
  • Psychiatry and Behavioral Health
  • Public Health
  • Pulmonary Medicine
  • Regulatory Agencies
  • Reproductive Health
  • Research, Methods, Statistics
  • Resuscitation
  • Rheumatology
  • Risk Management
  • Scientific Discovery and the Future of Medicine
  • Shared Decision Making and Communication
  • Sleep Medicine
  • Sports Medicine
  • Stem Cell Transplantation
  • Substance Use and Addiction Medicine
  • Surgical Innovation
  • Surgical Pearls
  • Teachable Moment
  • Technology and Finance
  • The Art of JAMA
  • The Arts and Medicine
  • The Rational Clinical Examination
  • Tobacco and e-Cigarettes
  • Translational Medicine
  • Trauma and Injury
  • Treatment Adherence
  • Ultrasonography
  • Users' Guide to the Medical Literature
  • Vaccination
  • Venous Thromboembolism
  • Veterans Health
  • Women's Health
  • Workflow and Process
  • Wound Care, Infection, Healing
  • Register for email alerts with links to free full-text articles
  • Access PDFs of free articles
  • Manage your interests
  • Save searches and receive search alerts

Venezuela opposition says its victory is irreversible, citing 73% of vote tallies

  • Medium Text
  • Both government and opposition claim victory
  • Opposition says it has 73% of voting tallies
  • Independent exit polls point to landslide opposition win
  • US, Brazil and others call for transparency in count
  • Maduro victory would extend socialist rule

Supporters of Venezuelan opposition demonstrate following the announcement that Venezuela's President Maduro won the presidential election, in Caracas

PROTESTS TURN VIOLENT

International reaction.

Sign up here.

Reporting by Mariela Nava in Maracaibo, Mircely Guanipa in Maracay and Vivian Sequera, Deisy Buitrago, Mayela Armas and Julia Symmes Cobb in Caracas; additional reporting by Keren Torres in Barquisimeto, Tibisay Romero in Valencia, Tathiana Ortiz in San Cristobal and Maria Ramirez in Puerto Ordaz Writing by Julia Symmes Cobb and Oliver Griffin; Editing by Angus MacSwan, Peter Graff, Rosalba O'Brien and Michael Perry

Our Standards: The Thomson Reuters Trust Principles. , opens new tab

how to cite research data

Thomson Reuters

Vivian reports on politics and general news from Venezuela's capital, Caracas. She is interested in reporting on how Venezuela's long economic crisis, with its rampant inflation, has affected human rights, health and the Venezuelan people, among other topics. She previously worked for the Associated Press in Venezuela, Colombia, Cuba and Brazil.

Mexico's President Andres Manuel Lopez Obrador and President-elect Claudia Sheinbaum hold an event with relatives of Pasta de Conchos miners, in Nueva Rosita

Venezuela's Maduro signs decree blocking X access for 10 days

Venezuela's President Nicolas Maduro on Thursday said he has signed a resolution to block access to social media platform X in the country for 10 days.

People attend a protest against Rio Tinto's opening lithium mine

Grab your spot at the free arXiv Accessibility Forum

Help | Advanced Search

Computer Science > Computation and Language

Title: a comparison of llm finetuning methods & evaluation metrics with travel chatbot use case.

Abstract: This research compares large language model (LLM) fine-tuning methods, including Quantized Low Rank Adapter (QLoRA), Retrieval Augmented fine-tuning (RAFT), and Reinforcement Learning from Human Feedback (RLHF), and additionally compared LLM evaluation methods including End to End (E2E) benchmark method of "Golden Answers", traditional natural language processing (NLP) metrics, RAG Assessment (Ragas), OpenAI GPT-4 evaluation metrics, and human evaluation, using the travel chatbot use case. The travel dataset was sourced from the the Reddit API by requesting posts from travel-related subreddits to get travel-related conversation prompts and personalized travel experiences, and augmented for each fine-tuning method. We used two pretrained LLMs utilized for fine-tuning research: LLaMa 2 7B, and Mistral 7B. QLoRA and RAFT are applied to the two pretrained models. The inferences from these models are extensively evaluated against the aforementioned metrics. The best model according to human evaluation and some GPT-4 metrics was Mistral RAFT, so this underwent a Reinforcement Learning from Human Feedback (RLHF) training pipeline, and ultimately was evaluated as the best model. Our main findings are that: 1) quantitative and Ragas metrics do not align with human evaluation, 2) Open AI GPT-4 evaluation most aligns with human evaluation, 3) it is essential to keep humans in the loop for evaluation because, 4) traditional NLP metrics insufficient, 5) Mistral generally outperformed LLaMa, 6) RAFT outperforms QLoRA, but still needs postprocessing, 7) RLHF improves model performance significantly. Next steps include improving data quality, increasing data quantity, exploring RAG methods, and focusing data collection on a specific city, which would improve data quality by narrowing the focus, while creating a useful product.
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as: [cs.CL]
  (or [cs.CL] for this version)
  Focus to learn more arXiv-issued DOI via DataCite

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

how to cite research data

Catalysis Science & Technology

Generating knowledge graphs through ai-assisted text mining of catalysis research related literature.

Structured research data management in catalysis is crucial, especially for large amounts of data, and should be guided by FAIR principles for easy access and compatibility of data. Ontologies help to organize knowledge in a structured and FAIR way. The increasing numbers of scientific publications call for automated methods to preselect and access the desired knowledge while minimizing the effort to search for relevant publications. While ontology learning can be used to create structured knowledge graphs, named entity recognition allows to detect and categorize important information in text. This work combines ontology learning and named entity recognition for automated extraction of key data from publications and organization of the implicit knowledge in a machine- and user-readable knowledge graph and data. CatalysisIE is a pre-trained model for such information extraction for catalysis research. This model is used and extended in this work based on a new data set, increasing the precision and recall of the model with regards to the data set. Validation of the presented workflow is presented on two datasets regarding catalysis research. Preformulated SPARQL-queries are provided to show the usability and applicability of the resulting knowledge graph for researchers.

  • This article is part of the themed collection: Digital Catalysis

Supplementary files

  • Supplementary information XLSX (16K)

Article information

how to cite research data

Download Citation

Permissions.

how to cite research data

A. S. Behr, D. Chernenko, D. Koßmann, A. Neyyathala, S. Hanf, S. A. Schunk and N. Kockmann, Catal. Sci. Technol. , 2024, Accepted Manuscript , DOI: 10.1039/D4CY00369A

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence . You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content .

Social activity

Search articles by author.

This article has not yet been cited.

Advertisements

IMAGES

  1. IMAGES

    how to cite research data

  2. How To Cite a Research Paper in 2024: Citation Styles Guide

    how to cite research data

  3. Research Paper Citing Help

    how to cite research data

  4. How should I cite data in APA Style?

    how to cite research data

  5. APA 7th Edition

    how to cite research data

  6. 4 Ways to Cite Sources

    how to cite research data

VIDEO

  1. Statistical Procedure in Meta-Essentials

  2. Learn to cite references using Mendeley desktop with Scientistwhoslays

  3. Artificial Intelligence

  4. How to cite the source of research materials when writing final year Project

  5. Research Integrity @ MTU

  6. Research Ethics @ MTU

COMMENTS

  1. Research Guides: Data Sources: How to Cite Data & Statistics

    A data citation includes the typical components of other citations: Author or creator: the entity/entities responsible for creating the data. Date of publication: the date the data was published or otherwise released to the public. Title: the title of the dataset or a brief description of it if it's missing a title.

  2. Data Sets

    Provide a retrieval date only if the data set is designated to change over time; Date for published data is the year of publication; Date for unpublished data is the year(s) of collection; If version number exists, include in parentheses after the title

  3. Citing Data & Statistics

    General Rules. Some style manuals do provide instructions for the citation of data, and selected examples are listed on the Data Citations tab. If the style manual you are using does not address data citations, you can follow these general rules. Usually a style manual will lay out basic rules for the order of citation elements, regardless of ...

  4. Data & Reports

    Whenever possible, give a citation for the measurements' supporting literature (e.g. manual, book, or journal article ). If the supporting literature is unavailable, cite the the test itself or database record using the following rule. General Rule: Author name. (year). Title of the test.

  5. Cite Data or Statistics

    Data requires citations for the same reasons journal articles and other types of publications require citations: to acknowledge the original author/producer and to help other researchers find the resource. Some style manuals provide instructions for the citation of data, and selected examples are listed below.

  6. How to Cite Data and Statistics

    Unless otherwise noted, the basic elements and guidelines described here are from the Publication Manual of the American Psychological Association, 6th edition (McHenry Reference Desk BF 76.7 .P83 2010). You may also wish to consult the Purdue OWL or How to Cite Data from Michigan State University for MLA examples and explanations.. Notes: 1. Include format type in brackets [ ] to describe ...

  7. Database Information in References

    Database information is seldom provided in reference list entries. The reference provides readers with the details they will need to perform a search themselves if they want to read the work—in most cases, writers do not need to explain the path they personally used. Think of it this way: When you buy a book at a bookstore or order a copy off ...

  8. Citing Data

    Citing Data and Statistics. Whether you use a numeric dataset or a prepared statistical table from an existing source (print or electronic) you need to cite the source of your information. It is critical to correctly cite data and statistics. This ensures that research data and statistics can be: discovered. reused. replicated for verification.

  9. Data Citation

    Benefits of citing data. Proper citation of data sources has both immediate and long term benefits to users and producers of data. "Data citation is the practice of referencing data products used in research. A data citation includes key descriptive information about the data, such as the title, source, and responsible parties.".

  10. Citing your Data

    Just as you cite journal articles, websites, and any books you reference in your publication, so too do you need to cite any data your publication uses. Citing datasets, such as spreadsheets, interview transcripts, images, etc., is crucial in providing context for your research and giving credit to the individual who's data you've used.

  11. How to Cite Sources

    To quote a source, copy a short piece of text word for word and put it inside quotation marks. To paraphrase a source, put the text into your own words. It's important that the paraphrase is not too close to the original wording. You can use the paraphrasing tool if you don't want to do this manually.

  12. Cite data

    As Zotero lacks an "item type" for datasets, enter the citation in the system as a "Document," depending upon if/how the data producer provides a recommended citation; either: Export an RIS file and import this file into Zotero. Copy and paste the information from a recommended citation into a new Zotero item with the type "Document".

  13. Citing Data

    Citing data: attributes credit to the responsible researchers. allows those sharing the data to measure its impact. supports the research infrastructure by connecting data and published research. improves access to data. provides opportunities for verifying data and enable reuse. promotes data as an equal scholarly output to a written work.

  14. How to Cite Data: Key Components

    Deposit and cite the data you scraped, and. Deposit the script (s) you used to scrape them in figshare or Zenodo, and cite them. (Both of these repositories can assign Digital Object Identifiers (DOIs), to both software [i.e., scripts] and datasets, making them easier and more reliable to cite.) If you are scraping web pages (as opposed to ...

  15. Citing Data

    How to Cite Data. The most important thing to remember is that you want your citation to include enough information so that a reader could find the same dataset again in the future, even if the link you provide no longer works. It's necessary to include a mixture of general and specific information to help them be certain that they've found the ...

  16. How to Cite Statistics

    The advice for MLA format is to include the first item of your full citation, whatever that may be. This will enable the reader to easily identify the full citation, which is, of course, the point of an in-text citation. You can condense the item if necessary. So, for the above example, the in-text citation would be: ("Table 105.20")

  17. How to Cite a Report in APA Style

    To reference a report with an individual author, include the author's name and initials, the report title (italicized), the report number, the organization that published it, and the URL (if accessed online, e.g. as a PDF ). Author last name, Initials. ( Year ). Report title: Subtitle (Report No. number ).

  18. APA Tables and Figures

    Cite your source automatically in APA. The purpose of tables and figures in documents is to enhance your readers' understanding of the information in the document; usually, large amounts of information can be communicated more efficiently in tables or figures. Tables are any graphic that uses a row and column structure to organize information ...

  19. Citing Sources: What are citations and why should I use them?

    Different subject disciplines call for citation information to be written in very specific order, capitalization, and punctuation. There are therefore many different style formats. Three popular citation formats are MLA Style (for humanities articles) and APA or Chicago (for social sciences articles). MLA style (print journal article):

  20. 3 Easy Ways to Cite Statistics

    3. Include the title of the document followed by a brief description. Type the title of the document in italics. Use sentence-case, capitalizing only the first word and any proper nouns in the title. If there is a subtitle, place a colon at the end of the title and then type the subtitle, also in sentence-case.

  21. How to Cite Data and Code

    Citation Elements for Data. A data citation should include at least the following elements. The specific information will depend on established practices in your research field, as well as the type of data, the repository you use, and the citation style of the publication. Responsible party (i.e., investigator, sample collector, creator) Title ...

  22. A data citation roadmap for scientific publishers

    The present document is a detailed roadmap to implementing JDDCP-compliant data citation, prepared by publishers, for an audience of publishers and authors, as part of a larger effort involving ...

  23. Data: Repositories

    Repositories. Dataset repositories, also known as research data repositories, provide researchers with a stable place to store and provide others with access to their research data. Depending on the research discipline, data can often be deposited in one or more data centers (or repositories) that will provide access to the data.

  24. Modernizing the Data Infrastructure for Clinical Research to Meet

    Importance The ways in which we access, acquire, and use data in clinical trials have evolved very little over time, resulting in a fragmented and inefficient system that limits the amount and quality of evidence that can be generated.. Observations Clinical trial design has advanced steadily over several decades. Yet the infrastructure for clinical trial data collection remains expensive and ...

  25. Venezuela opposition says its victory is irreversible, citing 73% of

    Venezuela opposition leader Maria Corina Machado said on Monday the country's opposition has 73.2% of the voting tallies from Sunday's election, allowing it to prove election results it says give ...

  26. PDF Global Macro ISSUE 129

    data centers and companies' green energy commitments. The data centers that power AI models must essentially run 24/7 given the nature of AI workloads, and so require a constant energy source like natural gas that can be dispatched on demand rather than renewables, which are more intermittent in . nature.

  27. [2408.03562] A Comparison of LLM Finetuning Methods & Evaluation

    This research compares large language model (LLM) fine-tuning methods, including Quantized Low Rank Adapter (QLoRA), Retrieval Augmented fine-tuning (RAFT), and Reinforcement Learning from Human Feedback (RLHF), and additionally compared LLM evaluation methods including End to End (E2E) benchmark method of "Golden Answers", traditional natural language processing (NLP) metrics, RAG Assessment ...

  28. Genomic data reveal a north-south split and introgression ...

    The human parasitic fluke, Schistosoma haematobium hybridizes with the livestock parasite S. bovis in the laboratory, but the extent of hybridization in nature is unclear. We analyzed 34.6 million single nucleotide variants in 162 samples from 18 African countries, revealing a sharp genetic discontinuity between northern and southern S. haematobium. We found no evidence for recent ...

  29. Generating knowledge graphs through AI-assisted text mining of

    Structured research data management in catalysis is crucial, especially for large amounts of data, and should be guided by FAIR principles for easy access and compatibility of data. Ontologies help to organize knowledge in a structured and FAIR way. The increasing numbers of scientific publications call for Digital Catalysis