Ethics of Data with Humans Subjects

Conducting social science research typically requires using data from human research subjects (also called participants). Doing this type of research is both incredibly useful for improving lives across the globe and full of ethical conundrums. There are major risks to human safety and wellbeing that need to be carefully considered before conducting research on or about humans. This guide will give you a brief overview of key issues along with resources for more details.

What qualifies as human subject research?

According to the 1979 Belmont Report, research is defined as

the term "research" designates an activity designed to test a hypothesis, permit conclusions to be drawn, and thereby to develop or contribute to generalizable knowledge (expressed, for example, in theories, principles, and statements of relationships). Research is usually described in a formal protocol that sets forth an objective and a set of procedures designed to reach that objective (Belmont Report, 1979, Part A).

Using that definition, the Belmont Report also determines that “the general rule is that if there is any element of research in an activity, that activity should undergo review for the protection of human subjects” (Belmont Report, 1979, Part A).

The bottom line here is that any activity that involves testing a hypothesis and uses humans (or their data) to do so is considered research and must first be approved by the relevant regulatory body (known as the IRB) in order to ensure appropriate protections for the humans involved.

Data Scientists need to remember the data they are using oftentimes comes from real humans and they should strive to protect them even when the data has been de-identified.

Who governs human subject research in the U.S.? Globally?

Within the U.S. research is regulated via several federal offices including the Department of Health and Human Services (DHHS) and the Office for Human Research Protections (OHRP). This OHRP oversees all research institutions, private and public.

Specific institutions of higher education (universities) also have an internal Office for the Protection of Research Subjects (OPRS) and an Institutional Review Board (IRB). The IRB is made up of two panels of research experts that include faculty, staff, and representative community members. One panel focuses on social/behavioral research and the other focuses on biomedical research. OPRS staff and the IRB panels determine what qualifies as human subject research and they facilitate the review and approval process for research proposals. Depending on the proposed research study, this process can include a full review by the IRB that may require significant modifications in order to protect the research participants.

Data scientists in Data Science DISCOVERY are considered students using publicly available data sets for classroom purposes, not for research. Also, the data sets used in DISCOVERY have been approved for public use and have been de-identified, which means that personal identifiers (e.g. name, address, etc.) have been removed. This allows DISCOVERY students to ethically practice basic data science without an IRB approval. However, if you were to choose to conduct research outside of our course, you would need to seek IRB approval first.

What are the key guiding ethical principles for human subject research?

There are 3 Belmont Principles that guide ethical research conduct:

Respect for Persons

  • Individuals are autonomous agents and anyone with diminished autonomy is entitled to protection.
  • This means people have the ability to make decisions for themselves about themselves (self determination), including about whether or not to participate in a research study. People in situations where such decisions are limited (e.g. prisoners, children under 18, pregnant and nursing people, people with learning disabilities, etc.) need additional considerations before they can reasonably make a decision to consent to research participation.


  • Do not harm.
  • Maximize benefit and minimize possible harm from research.
  • This means that your research study design should never involve harming people intentionally. Additionally, your research study design must involve taking every step to ensure maximum benefit for the research participants while also taking every step to minimize any possible harm. That even includes things like minimizing possible harm to the person’s reputation.


  • Who should receive benefits and bear the burden of research?
  • Equals should be treated equally.
  • This principle is informed by a long history of unethical research practices involving the use of marginalized populations as research subjects who bore the burden of research (e.g. Tuskegee Syphilis Study, U.S. Plantation Physician Studies, Nazi violations of human rights and the development of the Nuremberg Code).

Why do these issues matter for Data Scientists?

These ethical issues are especially important for Data Scientists who use data sets that include personal information about research participants. In most cases, publicly available datasets are required to be “de-identified” so that researchers (and general public) cannot identify the participants.

But even publicly available data sets have use agreements and security protocols to keep the data secure (e.g. NHANES User Agreement and NHS Digital Data Security Information). In some cases, even de-identified datasets include enough information that an analyst could attempt to identify the participants. Doing so would violate the ethical principles outlined in the Belmont Report as well as federal regulations around the protection of human research subjects. Ethical researchers using these data sets agree not to attempt to identify participants.

This means ethical Data Scientists do not “Doxx” research participants even if they think they can figure out how to do so. Those who do so are violating federal laws and are legally liable.

Where can I learn more or get certified?

At the University of Illinois Urbana-Champaign OPRS office uses CITI Training, which is available for free to all UIUC students, staff, and faculty here: Many other universities across the U.S. use this same training. We encourage you to take the required core Basic Course certifications and the Social and Behavioral Responsible Conduct of Research Course. These certifications last for 3 years.

Students who are interested in joining faculty research labs conducting research using human subjects on campus will need to complete this training anyway in order to be allowed access to lab materials. Completing the training before applying to be a research assistant is a great way to demonstrate your interest and commitment on your resume!