Get Involved
top of page

Data Investigation Processes: Connected, Iterative, and Cyclic

By Hollylynne Lee, Gemma Mojica and Emily Thrasher from the Hub for Innovation and Research in Statistics and Data Science Education at the Friday Institute for Educational Innovation at NC State


Authors’ Note: In this blog post we use both text and videos to help communicate why and how data investigation processes are a critical aspect of learning data science in K-12.


The Data Investigation Processes in Data Science


Developing the ability to do data science and make sense of the data science work done by others involves building skills and knowledge that span across all phases of a data life cycle and draw on many subject areas.

Science tends to put a heavy emphasis on reasoning from and with data to understand scientific phenomena. English and Social Studies incorporate the use of data and data visualizations as evidence to support arguments, interpret information, and evaluate claims in social structures of our world. Mathematics has emphasized working with measurement data, graphical representations, and developing statistical and probabilistic reasoning. Throughout K-12, students should develop a practice of using data in investigations of real-world phenomena through processes that will prepare them to be data-literate citizens and open doors for data-intensive career pathways in sciences, technology, engineering, journalism, medicine, sports analytics, business, data science, and so many more.

The K-12 Data Literacy and Data Science Learning Progressions provide some guidance on what skills, knowledge, and dispositions may be applicable for students to learn across different grade bands.

An important sub-strand of the data science learning progressions is investigative dispositions, which is within Strand A of Essential Core Habits that include dispositions and responsibilities. According to the learning progressions, students start in K-2 recognizing the concept of an investigative process to explore questions about our world and develop by 12th grade the ability to conduct investigations independently (Concept A3.1).

The importance of the investigative dispositions reflects a long history of guidelines and K-12 curricula suggesting that investigating statistical problems involves using phases of a data investigation. In 1999, Wild and Pfannkuch described a five-phase investigative cycle: Problem, Plan, Data, Analysis, and Conclusion (image 1). A four-phase cycle has been commonly labeled as Pose, Collect, Analyze, and Interpret (PCAI, e.g., Franklin et al., 2007; Graham, 1987) and was further expanded by the Guidelines for Assessment and Instruction in Statistics Education report in 2007 and 2020 (image 2).

In addition, dispositions crucial to productively investigating data have been identified by many (e.g., Wild & Pfannkuch, 1999; Lee & Tran, 2015; EDC, 2014), such as: imagination, curiosity and awareness, openness, engagement, being logical, propensity to seek deeper meaning, and perseverance.

As you look across the diagrams below, you will notice many similarities as well as differences:


Image 1: The PPDAC Investigative Cycle, 1999


Image 2: GAISE II Statistical Problem Solving as an Investigation Process, 2020


In 2019, the International Data Science in Schools Project (IDSSP) released a curriculum framework for guiding K-12 schools in how to introduce students to learning with and from data that included: problem elicitation and formulation, getting the data, exploring data, analyzing data, and communicating results (image 3).


Image 3: IDSSP Data Cycle framework, 2019


In our 2022 article, we proposed a six-phase Data Investigation Process that brings together the work of data scientists and many of the cycles and processes that have been used over the years to frame statistics and data science learning. The six phases fit together like pieces of a puzzle and are all needed for a holistic and productive approach to data investigations (image 4).


Image 4: Data Investigation Process, 2022


In the 2025 release of the K-12 Data Literacy and Data Science learning progressions, five strands, including one on dispositions and responsibility, are proposed and depicted as part of a pizza making metaphor (image 5).


Image 5: K-12 Data Literacy and Data Science Learning Progression Strands, 2025


Data Investigation Process as an Example


In this section, we do a deeper dive into the Data Investigation Process (2022) to provide an example of how an investigative process is addressed within the Learning Progressions. This is one of several models of an investigative process that teachers could use to support developing investigative dispositions when enacting the learning progressions.


For a quick visual reference, download a 1 page handout/poster describing the six phases and critical habits of mind and dispositions.


The jigsaw-like diagram illustrates that each of the phases comes together as essential aspects of a Data Investigation Process to complete the “entire picture” needed when exploring a real-world phenomenon and making evidence-based claims with data. The video below illustrates how we developed the Data Investigation Process framework.



The second video includes an illustration of how a teacher can create a meaningful learning opportunity for high school students to engage in a data investigation with CODAP.



Data Investigations as Connected, Iterative and Cyclic


We share more details here and make explicit connections to how the Data Investigation Process connects with and can support implementing the K-12 learning progressions. The six phase Data Investigation Process framework can help organize learners’ work with data in classrooms.


While some investigations may proceed linearly in a cycle, not all investigations emerge and proceed in this way. This is why there are two concepts, Iteration and Dynamic Inferences, with related competencies in the learning progressions. Learners may begin with a set of data that has already been collected for them (called secondary data) and do some preliminary exploration and visualization of data, often called exploratory data analysis (EDA, Tukey, 1980).


From what is noticed, you may go back to Consider and Gather Data to consider the data source, make sense of different measures, and decide to use different strategies to Process Data in meaningful ways (see the strand on Creation and Curation in the learning progressions).



You may then dive into resources to Frame the Problem by making sense of the bigger context that the data represent and pose a targeted statistical question involving only a few variables in the data set (see the substrand Problem Identification and Question Formation).



From there, the appropriate data for the variables of interest would be selected, and you may proceed to Consider Models and require additional work in the Explore and Visualize Data phase (see Analysis and Modeling Techniques).



Deciding how to Communicate and Propose Actions may spark new or additional questions to require further investigation with data at hand or require additional data collection and processing (see Visualization and Communication).



Sense-making and interpretation occurs throughout the entire process, as noted in the Interpreting Problems and Results in the learning progressions. As illustrated, the learning progressions provide opportunities for students to learn durable data literacy and data science skills by engaging in a data investigative process throughout their K-12 experiences.


As you work to implement the strands, concepts, and competencies from the K-12 data literacy and data science learning progressions, remember that every experience with data may not involve engagement with all phases; instead students should have experiences with all phases over their K-12 learning experiences with data investigations. Learning to teach data science is an exciting opportunity to tap into the interests of your students, help them learn about the world in which they live, and for you to expand your instructional repertoire of new strategies and tools.


Ready to Learn More?


We hope what we shared helps you develop a vision for how a data investigation process underpins and frames how learners can develop strong dispositions, skills, and competencies in K-12 data literacy and data science. Over the past 3 years our team has translated research into classroom-ready artifacts that can help support the learning of statistics, data literacy, and data science in our instepwithdata.org, a free online professional learning platform built to support self-guided teacher learning! If you are looking for ways to learn more about teaching data science, come learn with us in InSTEP!


Questions? Feel free to reach out to Hollylynne Lee at hollylynne@ncsu.edu

----------------------------------------------------------------

The ideas and videos presented in this blog are adapted from:

Lee, H. S., Mojica, G. F., Thrasher, E., & Vaskalis, Z. (2020). The data investigation process, In Invigorating Statistics Teacher Education through Professional Online Learning (http://instepwithdata.org), Friday Institute for Educational Innovation: NC State University. Available at: http://cdn.instepwithdata.org/DataInvestigationProcess.pdf


And

ESTEEM Curriculum Team. (2025). Foundations in Data Science and Statistics Teaching Module Set. In Enhancing Statistics Teacher Education Through E-Modules. Available at https://lor.instructure.com/resources/562b21d653624ec0a0ccf92738c38a35.

References

Funding Acknowledgement

The ideas described in this blog and the video artifacts were partially supported by the National Science Foundation under Grants DRL 1908760 and DUE 2141727 awarded to NC State University, DUE 2141716 awarded to Eastern Michigan University, and DUE 2141724 awarded to University of Southern Indiana. Any opinions, findings, and conclusions or recommendations expressed herein are those of the authors and do not necessarily reflect the views of the National Science Foundation.

 
 

Sign up for the latest updates:

bottom of page