Westat Data Scientists Detail Best Coding Practices

Data scientists and statisticians, interested in making their results accessible to others, face a number of challenges to ensure their analyses are reproducible. These obstacles, as well as recommendations to address them, are spotlighted in Best Coding Practices to Ensure Reproducibility (PDF), an Issue Brief by Westat’s Data Scientists Gonzalo Rivero, Ph.D., and Kristin Chen.

Gonzalo Rivero
Gonzalo Rivero explains that the lack of specific training in reproducibility is a significant challenge.

“As professionals in the collection and analysis of data, we face distinct challenges due to the nature of the artifacts with which we interact, the type of output we produce, and our own technical backgrounds and priorities,” says Dr. Rivero. “Chief among these hurdles are lack of specific training in reproducibility, the competing pressure of deadlines, and the subjective and social nature of the problem itself.”

To mitigate these weaknesses, Dr. Rivero stresses that when scientists write code to share results, they must ensure that end users can understand the meaning of the code so that they can verify it and contribute to it as well.

Kristin Chen
Kristin Chen notes that for code to be reproducible it first must be easy to understand.

Co-author, Kristin Chen explains that for code to be reproducible, it must be stable, portable, and easily understood: “Good code leads to a transparent, consistent, readable product so that the analyses and the thinking process can be communicated between users or between the statisticians and data scientists and future users.”

Although reproducibility is a matter of communication, workflow, and process, Dr. Rivero and Ms. Chen offer technical recommendations using examples from the ecosystem of the R language. They emphasize that what separates good code from bad code is largely how the information is organized and conveyed.

Their recommendations include

  • Embracing conventions in naming functions or objects
  • Adopting a style guide that relies on idioms
  • Avoiding assumptions about the execution environment
  • Structuring the code in predictable ways

The authors also address other challenges, including when code dependencies undergo changes through different iterations, including situations in which the statistical environment itself changes in a way that can affect the original intention of the code. “We cannot assume that we will have access to the same computational environment in which data processing and data analysis originally took place,” says Dr. Rivero. “We must ensure that we can replicate in the future the exact network of dependencies we used today to run our analysis.”

Dr. Rivero says he understands the challenges that statisticians and data scientists face: “We are squarely in the terrain of software engineers, but we can all learn how to write good, usable code, especially if we put ourselves in the shoes of the end users.”

Bottom line, notes Dr. Rivero, writing reproducible code is an evolving and collaborative enterprise among research scientists, and it requires good tools to support good practices and processes. Because of that, Dr. Rivero offers this article as a “starting point for a wider conversation about computational reproducibility within the community of researchers.”

We are squarely in the terrain of software engineers, but we can all learn how to write good, usable code, especially if we put ourselves in the shoes of the end users.

- Gonzalo Rivero, Ph.D., Data Scientist, Statistics & Evaluation Sciences

Want to work with us?
You’ll be in great company.

About Us Careers

Westat Employees.
Westat Employee.
Centers for Disease Control and Prevention
Centers for Medicare & Medicaid Services
Substance Abuse and Mental Health Services Administration
The National Institutes of Health
AAA Foundation for Traffic Safety
The Johns Hopkins University
University of Maryland Baltimore Campus
University of Denver
U.S. Department of Veterans Affairs
U.S. Department of Transportation
U.S. Department of Justice
U.S. Department of Health and Human Services
U.S. Department of Education
U.S. Department of Agriculture
Toyota
The Verizon Foundation
Texas Education Agency
Baltimore Metropolitan Council
Teach for America
Social Security Administration
SiriusXM
Robert Wood Johnson Foundation
Organization for Economic Cooperation and Development
NYC Mayor’s Office for Economic Opportunity
National Science Foundation
Michigan Department of Health and Human Services
Maryland Cancer Registry
Internal Revenue Service
Georgia Department of Transportation
DC Public Schools
ClearWay Minnesota
Chicago Metropolitan Agency for Planning
University of Michigan
Explore Our Clients

Please wait...

Forbes 2020 The Best Employers for Diversity Powered by Statista Copyright 2019 Forbes Media LLC. Used with permissionForbes 2019: The Best Employers for Women Powered by Statista Copyright 2019 Forbes Media LLC. Used with permission

Westat is an Equal Opportunity Employer and does not discriminate on the basis of race, creed, color, religion, sex, national origin, age, veteran status, disability, marital status, sexual orientation, citizenship status, genetic information, gender identity or expression, or any other protected status under applicable law. Notices to Employees & Applicants.