Search Results
70 results found with an empty search
- There’s More to Data Science than Math and Programming
By Aaron R. Williams and Claire McKay Bowen of the Urban Institute 🎥 Video Interview: DS4E Communications Specialist Shea Stripling speaks with Aaron R. Williams, Lead Data Scientist for Statistical Computing at the Urban Institute, and Claire McKay Bowen, Senior Fellow at the Urban Institute, about why data science requires more than math and programming, highlighting the real-world skills that shape meaningful and ethical data work and explaining how educators can introduce these skills to students in their classrooms. Data-driven decision-making and the data that fuels it are important in research, business, and government. Since September 2025, we’ve chronicled importance of federal statistics, where federal statistics come from, the many daily uses of federal statistics , and more. Our posts have highlighted how data scientists and statisticians play a critical role in data-driven decision-making. People interested in becoming a data scientist often hear so much about the importance of math and programming. Math and programming are important skills, but there are many other skills that are often overlooked that can lead to success in the field. 1. Answering Useful Questions George E. P. Box famously wrote in 1976, “All models are wrong, but some are useful.” Nearly 50 years later (and the aphorism being said to predate Box’s initial writings), this statement remains relevant in the age of AI. Math and programming can answer many questions, but not always useful ones. An example is Zillow’s house-flipping[1] venture, which ended in 2021 after losing over $300 million in a single quarter due to inaccurate price predictions from its Zestimate models . Once boasting a median absolute percentage error of about 5% across 110 million homes, these models deteriorated over time, leading Zillow to overpay for properties and incur massive losses. The failure underscores five critical lessons for data science to answer useful questions with more useful models: · Data quality matters: small inaccuracies in data (e.g., number of rooms and distance from schools) can “snowball” into massive financial impacts. · Ensure humans are in the loop: algorithms shouldn’t be the sole decision‐makers, especially in high‑stakes domains. · Anticipate people gaming the system: market participants may manipulate data, so fraud‑detection is crucial. · Adopt holistic modeling: forecasting should encompass not just property attributes, but also buyer behavior and demand dynamics. · Consider external factors: cost inflation, labor shortages, and market shifts must be factored in. 2. Questionnaire design Data are not found, they are created. Gathering responses to questionnaires is a widespread process for creating data. The design and wording of questionnaires is incredibly important. For example, in January 2003, Pew Research Center found a major impact of questionnaire wording. When asked whether people, “favor or oppose taking military action in Iraq to end Saddam Hussein’s rule,” 68% said they favored military action while 25% said they opposed military action. However, when asked if people “favor or oppose taking military action in Iraq to end Saddam Hussein’s rule even if it meant that U.S. forces might suffer thousands of casualties ,” responses were dramatically different; only 43% said they favored military action, while 48% said they opposed it. The history of questionnaire design is full of these examples. As a response, data scientists and statisticians have a rich set of tools they use to develop questionnaires including evidence-based ways of designing questions, cognitive interviews, and validation. 3. Subject Matter Expertise Subject matter expertise (i.e., topical knowledge) remains essential to understanding how models will perform in the real world. Data scientists follow a series of best practices when developing models to ensure their models perform well on new data. How well a model performs on new data compared to the original data is called generalization. Math and programming are still necessary for building models, but, similar to our first bullet, subject matter expertise is necessary for knowing when the model will be useful or not. Google Flu Trends is a famous example of what can go wrong when applying a model to unseen data. Using historical data and 50 million common Google searches, Google developed a model that could predict flu-like illnesses. Their model could report data almost immediately while the CDC’s method took about two weeks. Speeding up information about flu-like illnesses could be crucial to understanding and remedying flu outbreaks. This was an incredible development and showed how massive corporate data could benefit the public good. Google launched Google Flu Trends (GFT) in 2008 but eventually closed down GFT in 2015 after it failed to predict the 2009 flu pandemic and consistently over-estimated illnesses in later years. So, what happened? Changes in peoples’ search behaviors and changes to the Google search algorithm meant the model, which was trained on historical data, did not make sense on new data . Data science requires subject matter expertise, attention to detail, and persistence to ensure that what worked in the past works in the future.[2] 4. Ethics Finally, just because a data scientist can do something doesn’t mean a data scientist should . Government statistical agencies like the U.S. Census Bureau and Statistics of Income Division at the IRS use statistical disclosure control and disclosure review boards to protect individual privacy. Public-sector data science organizations use institutional review boards to ensure that any projects using human subjects data are ethical and responsible. In contrast, many private companies use data science to learn customers’ secrets to grow their businesses . Math and programming are important skills for data science, but successful data science requires more than technical skills. It requires a clear understanding of the question being asked, understanding the process used to create the data, the subject matter expertise to ensure the solutions meet the needs of users, and the ethical knowledge to decide if the project is appropriate. Closing note from DS4E : These four skills map directly onto the K–12 Data Science Learning Progressions. Asking “useful” questions is exactly what Substrand D2: Problem Identification & Question Formation is about, making sure the question is worth answering before you build a model. Questionnaire design lives in Substrand B2: Designing for Data Collection and Substrand B3: Measurement & Datafication because the way we ask, measure, and record shapes everything that follows. Subject matter expertise shows up in Substrand C5: Models of Data , where context determines whether a model will actually hold up as the world changes. And ethics is central to Concept A2.1: Data Use Risks & Benefits , reminding us that data work always carries real consequences for real people. [1] “In finance, flipping is purchasing an asset to quickly resell (or “flip”) it for profit. Within the real estate industry, the term is used by investors to describe the process of buying, rehabbing, and selling properties for profit.” Accessed January 2, 2026. https://en.wikipedia.org/wiki/Flipping [2] Check out the CDC FluSight Challenge that encouraged academic, industry, and government forecasting teams to develop models to forecast the influenza season. https://www.cdc.gov/flu-forecasting/evaluation/2024-2025-report.html
- Strengthening the Data Science Thread in Indiana
Inside Indiana’s Call to Action Summit on data science, workforce readiness, and the future of learning Across Indiana, educators are reimagining what it means to prepare students for a data-driven world. There is a growing recognition that today’s learners need more than traditional math sequences alone. They need opportunities to reason with data, ask meaningful questions, interpret information, and apply quantitative thinking to real-world problems. From elementary classrooms to high school pathways, teachers are leading the shift toward learning experiences that connect math, science, social studies, and career-ready skills through real-world data. That is why the Indiana Department of Education and Data Science 4 Everyone came together December 11, 2025 at the Garrison Event Center in Indianapolis to host the Indiana Call to Action Summit: Strengthening the Data Science Thread . This convening brought together educators, higher education leaders, policymakers, and industry partners from across the state with one shared goal: imagining a future where every Indiana high school graduate can confidently navigate a data-driven world. As Rick Hudson, Professor of Mathematics at the University of Southern Indiana, reflected, “At the end of the day, it was clear we have a lot of work to do to ensure employers in Indiana have the data-literate employees they need. As educators, we need to prepare students for the data-rich world they’ll encounter.” The day began with an opening message from Dr. Hudson, who emphasized the importance of data literacy for all students and the role of the data process in helping learners make sense of the world around them. To bring this idea to life, he led participants through a hands-on demonstration using a “slow reveal” graph from the New York Times’ What’s Going On in This Graph? series. Attendees explored two related graphs—one showing the percentage of U.S. weather stations breaking all-time temperature records and another mapping where those records were set—without initially being told how the datasets were related. Participants were challenged to look for relationships, ask questions, and form hypotheses. Once participants formed their own hypotheses, Dr. Hudson showed them how to test those ideas by collecting the same weather data and using CODAP to build a map of median daily maximum temperatures by month, modeling the kind of inquiry-driven, data-rich learning experiences that can transform classrooms. The program then continued with a presentation from Zarek Drozda, Executive Director of Data Science 4 Everyone, who expanded on what data science can look like across K–12 education. Drozda introduced the newly developed K–12 Data Science Learning Progressions , the first comprehensive set of national guidance designed to build a data science mindset in students throughout their education, and highlighted how this work can dovetail with the efforts already underway in Indiana to integrate data science across subjects and grade levels. This was followed by the Data Science Beyond the Classroom panel, moderated by Shellie Hartford, Director of Curriculum & Instruction at the Indiana Department of Education, and featuring Mike Steele, Chair of the Department of Educational Studies at Ball State University; Hong Gao, Managing Director at ClearPath Insights LLC; and Mark Daniel Ward, Executive Director of The Data Mine at Purdue University. Together, the panelists explored why data science is becoming a foundational skill across K–12 education, higher education, and industry, with a focus on the needs of Indiana’s future workforce. Reflecting on the significance of this gathering, Mike Steele shared, “It was exciting to see such a diverse set of teachers, faculty, and leaders from across the state in the same room to talk about the importance of data science and data literacy. We have long known that data is an important idea for our students, but too often these efforts have not been fully realized.” He also emphasized the urgency of rethinking secondary mathematics, noting, “The customary high school mathematics sequence is not working for most of our students. Integrating data science across the secondary spectrum has great potential to make mathematics more meaningful and to better prepare students for their futures.” Throughout the day, participants also engaged in hands-on work that went far beyond traditional conference sessions. Educators explored the national learning progressions, examined models from other states, reviewed Indiana’s current standards and programs, and worked in grade-band breakout groups to identify where data science already lives in their classrooms and where new opportunities could emerge. In the K–8 breakout, educators discussed why data science education is especially crucial in early learning environments and explored next steps for classroom integration. In the 6–12 breakout, secondary educators connected the learning progressions to their content areas and helped shape Indiana’s next steps for high school data science education. Lin Chu, PhD candidate in Instructional Systems Technology at Indiana University Bloomington and an event participant, shared about her experience working with the K–8 grade-band breakout session, where educators analyzed the data science learning progression competencies and explored how they connect across subject areas:: “At the end of the summit, our K–8 group divided into three smaller groups and reviewed the data science learning progression competencies across different grade levels. Our goal was to determine whether each competency could be connected to specific subject areas…For me, this experience sparked a strong interest in exploring integration ideas in non-STEM subject areas with other educators at the summit. I shared my contact information and connected with an outstanding STEM specialist to continue these conversations and pursue future collaboration.” The summit concluded with a clear call to action, a shared sense of urgency, and next steps for the IDOE and all stakeholders present. Educators and higher education partners committed to one pedagogical change or implementation plan in their coursework, and IDOE is using feedback from the summit to create a report highlighting Indiana’s Data Science priorities to guide implementation in 2026 and 2027. Sarah Wegener, an 8th Grade English/Language Arts teacher and summit participant said of her experience, “Before attending [the Call to Action Summit] I was looking to see how I could implement data science in my classroom. After attending I have a much deeper understanding of data science, and I am even more excited about presenting it to my fellow staff.” As Hong Gao, Managing Director of ClearPath Insights LLC, reflected, “The future of education isn’t just about teaching students to use technology. It’s about empowering them to think logically, creatively, and responsibly in a world shaped by it.” Mike Steele also echoed this urgency: “Change to these traditional [educational] structures will take a sustained effort and a lot of political will, and this is what I see as the next important step for Indiana.” With new policy, growing momentum, and a shared commitment across sectors, Indiana is laying the groundwork for a future where every student graduates prepared to think critically with data and thrive in an increasingly data-driven world.
- Making Data Moves: The Prep Work Behind Every Good Analysis
By Tim Erickson, Epistemological Engineering I didn’t expect to learn anything about data science from repainting my porch steps—but that’s exactly what happened.It had been 15 years since my front steps were last painted. They were dirty, you could see bare wood on the treads. After borrowing a friend’s power washer —a sure way to discover just how filthy something has gotten—it was clear I needed to repaint them. I am no home-maintenance expert, so I asked Jan at the local hardware store how I should prepare. Power washing was a good start, she said. But let it dry, spackle the holes and big cracks, then use 80-grit sandpaper everywhere to make a good surface. Next, clean off the dust and blue-tape all the edges. Then, put on a coat of primer and two coats of deck-quality paint. It sounded simple, but if you’ve done this, you know: sanding, cleaning and taping easily takes three times as long as painting. This process is surprisingly analogous to data science. If you ask a data scientist, they will tell you that a huge proportion of their work is preparing the data: figuring out what the raw values actually mean, cleaning the data, organizing it, and getting it into shape for analysis. As educators, we often feel tempted to do all the prep work ourselves and just hand students the roller, ready to paint. There are times when that’s a good idea. But with modern tools for teaching data science, we can empower students to do a judiciously-chosen part of the prep work themselves. And in contrast to painting, this data-munging process—making data moves—is fun and rewarding in itself, in a way that sanding and taping never is. I hope by the end of this post, you’ll agree that learning data moves can actually help students become more critical consumers and producers of data. I hope you also see that data moves are for more than just prep work. Data moves can give you insight into the data and become an essential part of your data-analysis toolbox. Let’s see an extended example. Suppose we want to explore gender differences in income. We might begin with the assumption that males earn more, but we want to verify that and even determine how much more males earn. So, we get a bunch of U.S. Census data using the CODAP Microdata Portal (I will use CODAP here, but you can easily do this sort of thing using any modern data-analysis platform). We begin (foolishly, as we will see) by grouping the data by sex (the Census uses “sex” rather than “gender”) and computing the mean of total personal income. Here is what we see on the screen: That is, males earn an average of about $2.2 million, while females earn $1.7 million. Done. Problem solved. Just kidding. One foolish decision was to use mean instead of, say, median. But the real problem that we didn’t explore our data first, for example, by graphing: Who are all those people making 10 million a year? We click on one of those points in the graph, and a case highlights in the table. It’s a ten-year-old girl. What? We select a different point and it’s a two-year old boy. Further investigation—more selections, perhaps a graph—reveals that every case marked as having a total income of 9,999,999 is a child under 15. That is, the Census Bureau uses that number as a flag to indicate that we shouldn’t use that data. So we remove those data values—using some technique that will depend on your software—and proceed. Now we have a much more sensible result (we’re showing the median in the graph): We see that the median income for males is $11,100 more than the median for females. Interesting! If you use data straight out of the box (or directly from an AI; ask me how I know) you can get very wrong answers. To get a result that better reflects reality, you need to look critically at the dataset and alter it responsibly. When my colleagues and I thought about what we naturally did when we analyzed data, we saw ourselves doing the same kinds of things over and over. We called these common data-analysis actions data moves . Here are three: Filtering : We “slice” the dataset to show only a subset—here, cases that do not have 9999999 for total income. Filtering restricts the dataset to those cases that are relevant to the investigation. Grouping : The original dataset was not separated by sex. If you imagine the dataset as a deck of cards, this is like sorting the cards into two piles. Of course, students have to learn how to do that with their software. But more importantly, they must understand the need to group the data at all, that grouping is a necessary step in the process of assessing income differences. Summarizing : To compare the groups, we summarized them using the mean income. Again, students need to know how to create that measure, and how to apply it separately to the groups. We also need to know what measures are appropriate. Note: sometimes the best measure is one you invent yourself! What makes something a data move? According to our paper , a data move is an action that alters a dataset’s contents, structure, or values . In our example, we altered the dataset’s contents (by filtering out irrelevant cases), structure (by grouping the data into two subsets, Female and Male), and values (by calculating means and medians). To be sure, not noticing that all the children are making almost 10 million dollars was an egregious mistake. But data moves do much, much more than fix problems like these. A cycle of analysis and reflection often uncovers other issues or opportunities we might want to address. For example, in that last graph, did you notice that there is a big pile of points near zero income? And that the pile is taller for females than males? Is that because many women do not get paid for their work? This gives us additional ideas for analysis. For example, we could calculate what percentage of men and women of working age have, in fact, no income. Those numbers might be good to know in an investigation about income inequality. That would require an additional “summarizing” data move, to calculate that percentage for each group. It also makes us wonder, is the difference in medians we see between men and women simply because fewer women are working? That is, are women who work paid as much as men who work? We can investigate that by filtering again, leaving only people currently working. Fortunately, our dataset includes a column called Employment_status that we can use to set up that filter: Interesting! Now we have no piles at zero, and the difference in median incomes has narrowed from $11,100 to $9,000—but it did not go away. That is, a difference in employment status cannot completely explain the gender gap we see in median income. Something else is going on. Let’s reflect: I think, for lack of a better term, that an investigation like this, and the way we’re working with the data, “smells like” data science. How is this any different from the data manipulation we teach, for example, in elementary or middle school, or in a formal statistics class in high school or college? There, students learn the (very important) difference between mean and median, or how to find the interquartile range, or how to fit a least-squares line, or how to perform some statistical test. When we teach those skills, we often give our students pre-digested datasets that are set up to highlight the particular procedure that’s the focus of the lesson. For me, those skills, while vital, do not smell like data science. A task that passes the data science sniff test often has larger, more wide-ranging datasets, datasets where it’s not obvious what you’re supposed to do at first. We often say that in a data science task, you might feel “awash” in data. Even when we know what we’re trying to accomplish—to study gender differences in income—we discover that the situation is more complicated than we thought, that we need more nuance. And how do we get more nuance out of an ocean of confusing data? Often, that involves data moves. Consider this: a filtering move, by its very nature, reduces the size of the dataset, which can reduce that “awash” feeling. Even more importantly, looking at a carefully-chosen subset of the data will often give you insight into the larger world, or give you an idea about what analysis to apply. Therefore, if you’re feeling awash, consider filtering. For example, if you’re worried that differing education levels have scrambled your income analysis, temporarily filter out everyone who has a college education, and see what the no-college data look like. Or go even further: use grouping and summarizing to find the median incomes of each gender for each level of education. Now instead of our 470 employed individuals, we have data on twelve subgroups. We are no longer awash and can start to tell a compelling story: Pretty great, huh? (And maybe a bit depressing.) Let’s look at two more data moves. One we call calculating ( mutating in the tidyverse) is where you make a new variable whose values depend on some existing variable(s). This is like in a spreadsheet where you make a new column with a formula. A simple example would be unit conversion: you get weather data in Celsius but you know you want to think and communicate in Fahrenheit. Calculating is like summarizing, except that when you calculate, you’re creating a new value for every case. When you summarize, you’re aggregating the data, creating a new value for each group (or for the whole dataset). A more complicated example is recoding data. Suppose you wanted to further collapse the analysis about the effect of education on income and simply compare people who had gone to college with those who had not (rather than asking what degree they got or whether they finished high school). You would create a new column in the table that had only two values; college and no college. That’s recoding, conceptually just like converting Celsius to Fahrenheit; you can find out more about this particular move here. Finally, let’s look at joining (merging in the tidyverse). The point of a join is to connect two sources of data. For example: suppose we have an idea that in Texas, with lots of pickup trucks, people will have more vehicles than in other states. So we find a dataset—a table—with the number of car registrations in each state. Sure enough, Texas (selected in the graph) has a lot. But we realize that Texas also has a large population, so really we should find out how many cars there are per person in each State. We want to do a calculation with a formula that will be something like ( registrations / population ). The trouble is, population is not a column in our table. So we get a second table with the state populations. To make the formula work, we need both the registrations and the population in the same table. That’s what requires a “join.” Again, the precise process depends on your software. The result is a single table with both columns; you then make a new column and perform the calculation data move. The result? Texas actually has the fourth- smallest number of cars per person! Pretty cool. Five data moves: filtering, grouping, summarizing, calculating, and joining. I hope you see by now that these skills are for more than just data preparation; they are an essential part of your data-science toolbox. Most of us have never explicitly taught these skills; perhaps we’re under the impression that students will just sort of learn them as they go along—as we focus on the “real” curriculum. But I think we can make them part of what we teach. Data moves are accessible over a wide range of grade levels and can help prepare our students for more open-ended and complex data science investigations. I organized an introduction to data science for high-school juniors and seniors around data moves, and it felt good. Data moves lent needed structure to the investigations and to how I ended up assessing the student work. Another reason to include data moves in high school is this: we are preparing our students to be citizens in a world where, for better and for worse, we are all both beneficiaries and victims of data science. Data moves are an essential part of what data scientists do, so understanding them helps students evaluate arguments and decisions based on data. Many students have trouble with concepts like filtering, and the actions and formulas that make filtering happen. But if we include filtering in our instruction, and our students actually do it themselves, they will be more critical about what data appear in a media report. And a bonus: while knowing data moves will make our students better equipped to be critical consumers, it also makes them ready to learn more data science if it lights their fire. Of course, this post is just a quick taste of data moves. You can probably see how learning these moves can foster both independence and inquiry. For more detail and more ideas, read the original paper or this paper by our colleagues in the ESTEEM project that expands upon the original. If you want more detailed descriptions, bits of high-school curriculum and assessment, and live online opportunities to try all of this in CODAP, see my e-book, Awash in Data . Finally, of course, look to the data science learning progressions! B.1.3 and B.1.4 are dripping with data moves, but you will find connections throughout the site.
- Data Investigation Processes: Connected, Iterative, and Cyclic
By Hollylynne Lee, Gemma Mojica and Emily Thrasher from the Hub for Innovation and Research in Statistics and Data Science Education at the Friday Institute for Educational Innovation at NC State Authors’ Note: In this blog post we use both text and videos to help communicate why and how data investigation processes are a critical aspect of learning data science in K-12. The Data Investigation Processes in Data Science Developing the ability to do data science and make sense of the data science work done by others involves building skills and knowledge that span across all phases of a data life cycle and draw on many subject areas. Science tends to put a heavy emphasis on reasoning from and with data to understand scientific phenomena. English and Social Studies incorporate the use of data and data visualizations as evidence to support arguments, interpret information, and evaluate claims in social structures of our world. Mathematics has emphasized working with measurement data, graphical representations, and developing statistical and probabilistic reasoning. Throughout K-12, students should develop a practice of using data in investigations of real-world phenomena through processes that will prepare them to be data-literate citizens and open doors for data-intensive career pathways in sciences, technology, engineering, journalism, medicine, sports analytics, business, data science, and so many more. The K-12 Data Literacy and Data Science Learning Progressions provide some guidance on what skills, knowledge, and dispositions may be applicable for students to learn across different grade bands. An important sub-strand of the data science learning progressions is investigative dispositions , which is within Strand A of Essential Core Habits that include dispositions and responsibilities . According to the learning progressions, students start in K-2 recognizing the concept of an investigative process to explore questions about our world and develop by 12th grade the ability to conduct investigations independently (Concept A3.1). The importance of the investigative dispositions reflects a long history of guidelines and K-12 curricula suggesting that investigating statistical problems involves using phases of a data investigation. In 1999, Wild and Pfannkuch described a five-phase investigative cycle: Problem, Plan, Data, Analysis, and Conclusion (image 1). A four-phase cycle has been commonly labeled as Pose, Collect, Analyze, and Interpret (PCAI, e.g., Franklin et al., 2007; Graham, 1987) and was further expanded by the Guidelines for Assessment and Instruction in Statistics Education report in 2007 and 2020 (image 2). In addition, dispositions crucial to productively investigating data have been identified by many (e.g., Wild & Pfannkuch, 1999; Lee & Tran, 2015; EDC, 2014), such as: imagination, curiosity and awareness, openness, engagement, being logical, propensity to seek deeper meaning, and perseverance. As you look across the diagrams below, you will notice many similarities as well as differences: Image 1: The PPDAC Investigative Cycle, 1999 Image 2: GAISE II Statistical Problem Solving as an Investigation Process, 2020 In 2019, the International Data Science in Schools Project (IDSSP) released a curriculum framework for guiding K-12 schools in how to introduce students to learning with and from data that included: problem elicitation and formulation, getting the data, exploring data, analyzing data, and communicating results (image 3). Image 3: IDSSP Data Cycle framework, 2019 In our 2022 article , we proposed a six-phase Data Investigation Process that brings together the work of data scientists and many of the cycles and processes that have been used over the years to frame statistics and data science learning. The six phases fit together like pieces of a puzzle and are all needed for a holistic and productive approach to data investigations (image 4). Image 4: Data Investigation Process, 2022 In the 2025 release of the K-12 Data Literacy and Data Science learning progressions, five strands, including one on dispositions and responsibility, are proposed and depicted as part of a pizza making metaphor (image 5). Image 5: K-12 Data Literacy and Data Science Learning Progression Strands, 2025 Data Investigation Process as an Example In this section, we do a deeper dive into the Data Investigation Process (2022) to provide an example of how an investigative process is addressed within the Learning Progressions. This is one of several models of an investigative process that teachers could use to support developing investigative dispositions when enacting the learning progressions. For a quick visual reference, download a 1 page handout/poster describing the six phases and critical habits of mind and dispositions. The jigsaw-like diagram illustrates that each of the phases comes together as essential aspects of a Data Investigation Process to complete the “entire picture” needed when exploring a real-world phenomenon and making evidence-based claims with data. The video below illustrates how we developed the Data Investigation Process framework. The second video includes an illustration of how a teacher can create a meaningful learning opportunity for high school students to engage in a data investigation with CODAP . Data Investigations as Connected, Iterative and Cyclic We share more details here and make explicit connections to how the Data Investigation Process connects with and can support implementing the K-12 learning progressions. The six phase Data Investigation Process framework can help organize learners’ work with data in classrooms. While some investigations may proceed linearly in a cycle, not all investigations emerge and proceed in this way. This is why there are two concepts, Iteration and Dynamic Inferences , with related competencies in the learning progressions. Learners may begin with a set of data that has already been collected for them (called secondary data) and do some preliminary exploration and visualization of data, often called exploratory data analysis (EDA, Tukey, 1980). From what is noticed, you may go back to Consider and Gather Data to consider the data source, make sense of different measures, and decide to use different strategies to Process Data in meaningful ways (see the strand on Creation and Curation in the learning progressions). You may then dive into resources to Frame the Problem by making sense of the bigger context that the data represent and pose a targeted statistical question involving only a few variables in the data set (see the substrand Problem Identification and Question Formation ). From there, the appropriate data for the variables of interest would be selected, and you may proceed to Consider Models and require additional work in the Explore and Visualize Data phase (see Analysis and Modeling Techniques ). Deciding how to Communicate and Propose Actions may spark new or additional questions to require further investigation with data at hand or require additional data collection and processing (see Visualization and Communication ). Sense-making and interpretation occurs throughout the entire process, as noted in the Interpreting Problems and Results in the learning progressions. As illustrated, the learning progressions provide opportunities for students to learn durable data literacy and data science skills by engaging in a data investigative process throughout their K-12 experiences. As you work to implement the strands, concepts, and competencies from the K-12 data literacy and data science learning progressions , remember that every experience with data may not involve engagement with all phases; instead students should have experiences with all phases over their K-12 learning experiences with data investigations. Learning to teach data science is an exciting opportunity to tap into the interests of your students, help them learn about the world in which they live, and for you to expand your instructional repertoire of new strategies and tools. Ready to Learn More? We hope what we shared helps you develop a vision for how a data investigation process underpins and frames how learners can develop strong dispositions, skills, and competencies in K-12 data literacy and data science. Over the past 3 years our team has translated research into classroom-ready artifacts that can help support the learning of statistics, data literacy, and data science in our instepwithdata.org , a free online professional learning platform built to support self-guided teacher learning! If you are looking for ways to learn more about teaching data science, come learn with us in InSTEP! Questions? Feel free to reach out to Hollylynne Lee at hollylynne@ncsu.edu ---------------------------------------------------------------- The ideas and videos presented in this blog are adapted from: Lee, H. S., Mojica, G. F., Thrasher, E., & Vaskalis, Z. (2020). The data investigation process, In Invigorating Statistics Teacher Education through Professional Online Learning ( http://instepwithdata.org ), Friday Institute for Educational Innovation: NC State University. Available at: http://cdn.instepwithdata.org/DataInvestigationProcess.pdf And ESTEEM Curriculum Team. (2025). Foundations in Data Science and Statistics Teaching Module Set. In Enhancing Statistics Teacher Education Through E-Modules . Available at https://lor.instructure.com/resources/562b21d653624ec0a0ccf92738c38a35 . References Bargagliotti, A., Franklin, C., Arnold, P., Gould, R., Johnson, S., Perez, L., & Spangler, D. (2020). Pre-K-12 Guidelines for assessment and instruction in statistics education (GAISE) report II. American Statistical Association and National Council of Teachers of Mathematics. https://www.amstat.org/docs/default-source/amstat-documents/gaiseiiprek-12_full.pdf Education Development Center. (2014). Big-data-enabled specialists career profile. http://oceansofdata.org/our-work/profile-big-data-enabled-specialist . Franklin, C., Kader, G., Mewborn, D., Moreno, J., Peck, R., Perry, M., & Scheaffer, R. (2007). Guidelines for assessment and instruction in statistics education (GAISE) report. American Statistical Association. https://www.amstat.org/docs/default-source/amstat-documents/gaiseprek-12_full.pdf Graham, A. T. (1987). Statistical investigations in the secondary school. Cambridge University Press. International Data Science in Schools Project Curriculum Team (2019). Curriculum Frameworks for Introductory Data Science. http://idssp.org/files/IDSSP_Frameworks_1.0.pdf . Lee, H. S., & Tran, D. (2015). Statistical habits of mind. In Teaching Statistics Through Data Investigations MOOC-Ed, Friday Institute for Educational Innovation: NC State University, Raleigh, NC. https://s3.amazonaws.com/fi-courses/tsdi/unit_2/Essentials/Habitsofmind.pdf Lee, H. S., Mojica, G. F., Thrasher, E. P., & Baumgartner, P. (2022). Investigating data like a data scientist: Key practices and processes. Statistics Education Research Journal, Special Issue: Research on Data Science Education, 21(2). https://doi.org/10.52041/serj.v21i2.41 Tukey, J. (1980). We need both exploratory and confirmatory. The American Statistician, 34(1), p. 23-25. https://doi.org/10.2307/2682991 . Wild, C. J., & Pfannkuch, M. (1999). Statistical thinking in empirical enquiry. International Statistical Review , 67(3), 223-248. Images a vailable at website . Funding Acknowledgement The ideas described in this blog and the video artifacts were partially supported by the National Science Foundation under Grants DRL 1908760 and DUE 2141727 awarded to NC State University, DUE 2141716 awarded to Eastern Michigan University, and DUE 2141724 awarded to University of Southern Indiana. Any opinions, findings, and conclusions or recommendations expressed herein are those of the authors and do not necessarily reflect the views of the National Science Foundation.
- When Social Studies Takes the Mic on AI:
Highlights and insights from the NCSS–DS4E AI Ethics & Tech Conference Setting the Stage: Keynote Inspiration NCSS President Tina M. Ellsworth, Ph.D. kicks off the AI Ethics and Tech Conference When the national conversation turns to AI in schools, computer science and math often get the spotlight. But at the AI Ethics & Tech Conference in Chicago (a collaborative event hosted by NCSS and DS4E) it was civics and social studies teachers who took center stage. Their voices are critical to questions of technology, democracy, and ethics because teaching students how to use AI is not enough. We must also teach them how to question it. As NCSS President Tina M. Ellsworth, Ph.D., put it: “Incorporating data literacy into social studies helps students to think critically about data to make better informed decisions and to fully participate as responsible democratic citizens, while also encouraging them to consider diverse perspectives from various people.” Her words captured the spirit of the conference: social studies isn’t an “add-on” to the AI conversation—it’s where questions of power, citizenship, and ethics naturally live. By grounding data literacy in civic contexts, educators are equipping students not only to interpret numbers but to engage thoughtfully with the forces shaping democracy. Setting the Stage: Keynote Inspiration The conference opened with a keynote from Dr. Meredith Broussard, Data journalist and associate professor at the Arthur L. Carter Journalism Institute of New York University, who emphasized the importance of demystifying artificial intelligence in classrooms. She framed AI not just as a technical tool, but as a civic issue, prompting questions like: Who gets left out when algorithms make decisions about housing? How does misinformation spread through social platforms shaped by imperfect algorithms? These kinds of questions, deeply rooted in social studies, help students see AI’s social and political implications alongside its mathematical underpinnings. “AI is not magic, it’s math. When you start to unpack how these things work, it demystifies the process for students.” Her message was clear: students do not need advanced technical skills to begin engaging critically with AI. Even in middle school math, students can begin to unpack how algorithms function, how bias is built into systems, and how decisions made by technologists ripple outward into society. Thought-Provoking Panels Over the course of two days, participants joined panels that explored the ethical, political, and educational dimensions of AI. Peter Adams of the News Literacy Project warned that while students are often savvy with technology, they frequently fail to see the infrastructure and business models shaping their online experiences. Ilene Berson of the University of South Florida reminded educators that “we don’t want to outsource empathy and care to social agents in the form of AI,” stressing that human values must remain central in digital spaces. Shawn McCusker of the Bill of Rights Institute emphasized that civil discourse begins with inquiry and ends with reflection, two skills that are vital in an AI-driven society. Dr. Tamara Shreiner of the University of Michigan delivered one of the most resonant lines of the event: “Data are and always have been politically significant. People think that data just exist, but data are a series of choices.” Chéla S. Wallace, who facilitated the K–12 strand, offered encouragement that teachers already have the tools to lead in this space. “You have the ability to be innovative in your practice,” she told attendees, pointing to the ways social studies educators are already using technology, project-based learning, and inquiry-driven lessons as natural entry points for integrating data literacy and AI. Together, these conversations underscored how social studies classrooms are uniquely positioned to help students connect democratic participation with the ability to interrogate emerging technologies. Breakouts: From Conversation to Design The highlight of the conference came during breakout sessions where participants were asked to review and vote on 46 concepts from the K–12 Data Science Learning Progressions and the AI4K12 framework that might be integrated into social studies. Facilitators Chéla S. Wallace (K–12), Misha Jemison (Higher Education), and Dr. Thema Monroe-White (Policy/Nonprofit) guided the work as educators, policymakers, and researchers rolled up their sleeves. It was during these discussions that one participant quoted Dr. Ian Malcolm from Jurassic Park : “Scientists were so preoccupied with whether or not they could , they didn’t stop to think if they should .” That line captured the heart of the work. As attendees raised red and green cards to vote on proposed outcomes, they debated not only which skills and concepts fit into a social studies-specific progression, but also what role civic education should play in helping students weigh the implications of these emerging technologies. The debates centered on three essential questions: What should students know about AI by the time they graduate in 2030? What cognitive habits, skills, and civic dispositions will they need to navigate a world shaped by data? And where should computing tools, data analysis, and AI live inside social studies classrooms? As Dr. Thema Monroe-White reminded participants, data science is a social science first and foremost—because every dataset reflects human choices, contexts, and power structures. For example, decisions about how census data is collected can shape political representation, or the way algorithms flag “misinformation” can influence public discourse. The breakout sessions reflected both optimism and caution, with participants agreeing that social studies educators have a unique responsibility to anchor these conversations in democracy, ethics, and citizenship. Zarek Drozda, Executive Director of DS4E, emphasized that this work is only the beginning. Phase 2 of the learning progressions will move beyond the cross-disciplinary foundation to subject-specific versions. “Our convening goal is to build a concrete, specific, accessible learning progression for K–12 social studies. Because we want it to feel tailored, and we want it to feel specific to social studies.” This focus on customization underscored the central role of civics educators in shaping how the nation approaches data science and AI in schools. Looking Ahead The conference concluded with a collective sense of purpose. Participants left Chicago with a renewed commitment to prepare students not only to use data, but to evaluate its sources, question its implications, and apply it responsibly in civic life. The NCSS–DS4E AI Ethics & Tech Conference was a milestone moment for bridging data science and social studies. It reminded us that preparing young people for the future is not only about technical skills, but about civic responsibility, ethical reflection, and the courage to shape technology for the common good.
- 2025 Data Science Challenge Champions: After the AP Test, the Big Data Work Begins
What happens when thousands of high school students are given a real-world problem, a national dataset, and no single right answer? They rise to the challenge! This May, nearly 10,000 students from 436 classrooms and 45 states across the U.S. participated in the third annual After the AP Data Science Challenge , a national competition hosted by Skew the Script, Data Science 4 Everyone , CourseKata , and North Carolina State University’s Data Science and AI Academy . The challenge invites AP Statistics and AP Computer Science students to put their quantitative skills to the test by building predictive models around a question that’s important to many soon-to-be college students: What factors predict student loan default rates across U.S. colleges? Students win the challenge by submitting models with the highest accuracy (highest R² value ) in predicting student loan default rates on a hidden “test” set of colleges. Using R, Jupyter Notebooks, and real Department of Education data (26 variables from more than 4,400 schools), students analyzed, modeled, and submitted their predictions. Some students walked away with top scores in the challenge. The winners and runners up are listed at the end of this blog post . But, regardless of their place in the standings, every participant gained something even more valuable: a deeper understanding of how data connects to their future. Rising to the Challenge Brandon Thompson, an AP Statistics teacher at Summit High School in Oregon who participated in the challenge along with 160 of his students, saw that impact firsthand. “Most of my students did not have any prior programming experience, and at first many of them struggled with the precision of the coding, but the Notebooks are written in a way where the learning is effectively scaffolded and they learned to persevere as they learned the nuance of coding. By the end, nearly all of my students were proficient and would say things like, ‘I never knew I could code.’” Six of Thompson’s students placed in the national Top 12 . But what stood out to him most was the collaboration. “Though they were competing amongst each other for that elusive high R² value, they shared strategies with each other without sharing the specific code they used. The collaboration was fun to witness.” Anthony Olakangil At the top of this year’s leaderboard was Anthony Olakangil, a rising junior at Bellarmine College Preparatory in California. His model achieved an impressive R² value of 0.8848 when predicting student loan default rates on the hidden “test” set of colleges. But Anthony’s path to the top wasn’t easy. After spending 48 hours trying to brute-force every possible variable combination, he hit a wall, but that failure quickly transformed into a turning point. Anthony soon stepped back, tested new transformations, engineered interaction terms, and wrote a custom trimming function to avoid overfitting his model. “Initially, I thought data science and statistics were boring: abstract math and visualizations,” Anthony said. “Over time, however, I became more conscious that numbers don’t speak for themselves. Knowing the context behind each variable helped me ask better questions and interpret results more responsibly.” Anthony’s experience highlights what makes this challenge unique: it pushes students not just to get the 'right' answer, but to think deeply about what the data is actually telling them and how that insight can be shaped by the questions they ask. A Path Forward The After the AP Challenge gives students a rare opportunity to work with real-world data and wrestle with complex, open-ended questions. While college affordability is often reduced to national headlines, this challenge encourages students to go deeper, to investigate the patterns and predictors that shape student outcomes, and to think critically about what data can and cannot explain. The experience of digging deeper into the important issue of college debt - using real data science skills - inspired many students to think about pursuing data science in their future studies and careers. For Ken Lynch, a student from Essex High School in Vermont who placed in the top 12, the experience revealed an unexpected connection between two academic paths. “This challenge has opened my eyes to pursuing something in the computation/data science realm, and I look forward to the many further applications of this in my career." Owen Gassner, the second-place winner and one of Brandon Thompson’s students at Summit High School in Oregon, echoed that sense of discovery: “Competing in the After the AP Data Science Challenge made me realize how math and statistics can combine with computer programming. The combination of two of my favorite subjects has excited me to explore data science further.” The Challenge Champions Congratulations to all students who persevered after a long school year and AP testing season to take part in this year’s challenge! The competition was fierce, and we were so impressed by all the submissions. Most importantly, we were impressed by the learning and discovery that participants demonstrated along the way. We can’t wait to see how you bring these skills to your future studies and work in the years to come! Without further ado, check out the Top 12 student model submissions, the runners up, and all of their R² scores below! Note: Student model submissions were ranked according to their accuracy (R² scores) in predicting student loan default rates on a hidden “test” set of colleges. Top 12 Winners Runners Up We’d also like to congratulate the runners up of the challenge, who submitted highly predictive and accurate models for the same task, just barely missing the leaderboard: Raghav Gupta, Mission San Jose High School Brian Wang, Hunter College High School Benjamin Seguin, Menomonie High School Jerome Brown, Baltimore Polytechnic Institute Rohan Reddy, Mission San Jose High School Tarini Sajja, Mission San Jose High School Fan Lin, Baltimore Polytechnic Institute Sean Miao, Huron High School
- The After the AP Challenge Makes Room for Teachers to Learn Too
Mary Velez, Mathematics Teacher at Roy C. Ketcham High School in New York, has been using Skew the Script since it first launched—first integrating the lessons into her statistics classroom and eventually joining the inaugural After the AP Data Science Challenge . A collaborative effort between Data Science 4 Everyone (DS4E), the North Carolina State Data Science Academy, and Course Kata, the challenge is designed for the post-AP exam stretch in AP Statistics and invites students to explore real-world datasets, write code, and tell data-driven stories about issues that matter to them. For Mary, the transition from traditional curriculum to the challenge was a natural one—but not without its nerves. “It was intimidating at first because I hadn’t coded in some time,” Mary said. Still, she leaned into the opportunity and gave her students the freedom to choose their next step. “I gave them the choice,” she recalls. “I told them you guys can do the project that we thought we were going to do all year or you can do this, or you can do both. I actually had a couple of groups choose to do both.” Mary’s students were already big fans of the curriculum. “They were the ones who were actually very onboard because they loved the curriculum already.” Mary soon found herself learning alongside her students—and sometimes even relying on them to help her through the more technical parts of the challenge. “I had no problem going up to one of my kids and saying ‘I have no idea how to do that.’” That collaborative energy shaped the culture of her classroom. “It’s not you standing up in front, it’s you sitting down next to them,” she says. “Giving them that seminar feel that you hope college classes are going to look like for them. Ideally, a lot of these students are going to be going out and partnering with professors to do research so you’re kind of introducing them to that round table kind of scenario.” Building Confidence—Together Participating in the challenge—and later serving as an instructor for the Data Science and AI Bootcamp—gave Mary more confidence in leading future cohorts through the experience. After piloting the After the AP Challenge, she joined the bootcamp team to help guide other teachers through the process. “Now that I’ve also gone through the Data Science and AI Bootcamp and done the challenge, I feel much more confident.” But even before she felt fully prepared, Mary saw the value in learning alongside her students. “It’s okay not to be the most knowledgeable one in the room,” she says. “It’s okay to embrace it, just as I did that first year and say, ‘Alright guys, we’re learning this together, and there’s a high likelihood you’re going to learn it faster than me.’ Having that community already in your classroom where that’s already allowed and expected, that is a great atmosphere to learn in.” Mary also appreciates how the challenge reframes what it means to code—and who gets to do it. “We have this vision of coding and people learning how to code while sitting in a cubicle with no interaction, just staring at a screen. But that’s really not at all what this is about,” she says. “It’s about saying these are the statistics, these are the things we bring to the table, how can we use this technology to help us navigate this more effectively.” Even students in her class who weren’t initially excited about the prospect of coding ended up surprising themselves. “Not only did the challenge make coding accessible for students,” she says, “it made it a resource they can use as they move on. They wouldn’t have had that experience any other way.” Advice to Fellow Educators Mary’s advice for other teachers considering the challenge? “It’s very doable if you’re comfortable being vulnerable with your kids.” For Mary, being honest with students that you're learning alongside your students is the key to building community around the challenge. Mary’s students ultimately walked away from the After the AP Challenge proud—not just of their projects, but of their impact. “They loved knowing they were going to be providing feedback and knowing they could be involved in giving feedback to make the challenge even better for years to come.” Teachers interested in bringing the same experience to their students can sign up now for the 2025 After the AP Data Science Challenge . This free, two-week project is designed for AP Statistics and AP Computer Science classes to tackle a real-world problem using modern data science tools. This year’s challenge asks students to use Department of Education data to build predictive models and determine which colleges “pay off” the most in terms of student loan outcomes. The challenge begins on May 5, 2025 , and the teacher registration deadline is April 18 . An optional national competition closes on June 6 , recognizing the top student models from across the country. No prior coding experience is required—just curiosity, internet access, and a week or two after AP testing to dive into real-world learning.
- The Great Connector: Why Data Literacy is Vital to Students’ Future Success
In this insightful op-ed in The 74 Million Jon Deane, CEO of GreatSchools.org , argues that just as traditional literacy empowers individuals to navigate the world, data literacy equips students with the skills to thrive in an increasingly data-driven society. With AI shaping the future of education and beyond, Deane emphasizes the importance of integrating data literacy into K-12 curricula. From understanding everyday tasks like managing finances to critically engaging with AI technologies, data literacy is essential for all students, regardless of their career interests. Deane calls for a broader push to ensure every student gains these vital skills, laying the foundation for lifelong success in a connected world. Read the full article here: https://www.the74million.org/article/the-great-connector-why-data-literacy-is-vital-to-students-future-success/
- A Landmark Gathering: The First-Ever Data Science Education K-12 Research to Practice Conference
Ivonne Martinez presents at the DSE K-12 Conference What happens when you bring together educators, researchers, and advocates passionate about data science education? You get an explosion of ideas, dynamic discussions, and—of course—data collection! The first-ever Data Science Education K-12 Research to Practice Conference held from February 17-19, 2025 in San Antonio, Texas was a vibrant and inspiring gathering of over 265 attendees from (literally) all around the world. Over 265 people attended the first-ever DSE K-12 Research to Practice Conference Spanning 55 engaging sessions, the conference explored a wide range of topics from building data-rich lessons for elementary students to gamifying escape rooms. Over three action-packed days, participants delved into cutting-edge curriculum strategies, hands-on workshops, and discussions designed to make data science both accessible and meaningful for students of all ages and backgrounds. Between sessions, attendees even embraced their passion for data by collecting real-time insights on everything from favorite activities to caffeine consumption to the daunting number of unread emails in their inboxes—proving that data is truly everywhere (and this community loves analyzing it!). The conference made it clear that this community loves to collect data (and has very specific thoughts about inbox management). Setting the Stage: Opening Remarks and Early Sessions Zarek Drozda opens the DSE K-12 Conference The conference opened with Zarek Drozda, Executive Director of Data Science 4 Everyone setting an energizing tone for the whole event. In his remarks, Zarek underscored the importance of collaboration, innovation, and network-building in the data science education community, emphasizing that each attendee played a vital role in shaping the future of data literacy in K-12 classrooms. With enthusiasm high, the conference launched into a full day of hands-on learning and collaboration. Educators and policymakers engaged in deep discussions on how data science could be seamlessly integrated into existing curricula. Thought-provoking panels brought forth innovative instructional strategies, while hands-on workshops provided practical insights into incorporating data literacy across subjects and grade bands. Dr. Hollylynne Lee and Dr. Padhu Seshaiyer presenting at "National Perspectives on Supporting Data Science in K-12" One particularly compelling session, "National Perspectives on Supporting Data Science in K-12," brought together education leaders from California, Virginia, and North Carolina to discuss how they are building robust data science programs in their respective states . Panelists discussed the modernization of math education and the growing push for data science inclusion at both the state and district levels. These discussions reinforced that the demand for data literacy is growing across all grade levels, and many attendees left with concrete strategies for making data science education more accessible to students nationwide. Throughout the day, smaller breakout discussions allowed participants to share their personal experiences, challenges, and success stories. These conversations highlighted the real-world implications of data literacy, from professional development for teachers to building effective interdisciplinary lessons. A Night to Remember: The Boeing Museum Experience After an inspiring day of sessions, attendees gathered at the Boeing Museum of Science and Technology for an evening of connection and exploration. As they enjoyed a delicious buffet, conversations flowed freely, sparking new collaborations and strengthening professional networks. The evening wasn’t just about networking—it was also about having some fun in the world of science and technology! Some attendees tested their piloting skills in flight simulators, while others explored intricate exhibits like the sprawling Lego city. The highlight of the night came when a DJ coding music in real time took the stage, reinforcing that data science education is as much about creativity and curiosity as it is about numbers and equations (and the fact that many in our community know how to bust a move). Attendees enjoyed checking out the flight simulators the Boeing Museum had to offer They also enjoyed strolling through the Cyber City made of Lego Shaping the Future: Dr. Talitha Washington’s Keynote on Data Science Education Dr. Talitha Washington gives a moving keynote speech on Day 2 of the conference Day two began with a powerful keynote address from Dr. Talitha Washington, Executive Director of the Center for Applied Data Science and Analytics and the Sean McCleese Endowed Chair in Computer Science, Race, and Social Justice at Howard University. With humor, real-world applications, and compelling storytelling, Dr. Washington delivered a clear message: Data literacy is no longer optional—it’s essential for student success. Dr. Washington challenged educators to rethink traditional math instruction and embrace data science as a tool that transcends disciplines. Using personal finance, public health, and social justice as examples, she demonstrated the real-world impact of data literacy, leaving attendees inspired and equipped with actionable strategies for their classrooms. Dr. Washington also emphasized the importance of inclusion in data science education, calling on educators to ensure that students from all backgrounds have access to high-quality learning opportunities. Her call to action resonated deeply, sparking meaningful discussions that continued throughout the rest of the conference. Engaging Workshops and Hands-On Learning Kate Farrell and Jasmeen Kanwall during their workshop “Playing with Data: Gamification Through Escape Rooms, Immersive Online Challenges, and Card Games” The sessions that followed the keynote reinforced the conference’s mission of making data literacy both engaging and accessible. Attendees participated in hands-on experiences with coding tools, explored creative ways to integrate data into humanities and social sciences, and grappled with ethical considerations in data science. Some workshops showcased how data science can be leveraged for artistic expression, with one session guiding participants in using regression models and servo motors to create robotic art. Another encouraged educators to make environmental data more tangible, helping students turn real-world climate datasets into visual storytelling pieces. In discussions about data ethics and bias, teachers examined strategies for helping students critically evaluate information, question biases, and make informed decisions based on reliable data. These sessions weren’t just about acquiring knowledge—they were about equipping educators with the confidence and resources to bring data science into their own classrooms in ways that resonate with students from all backgrounds. Student Voices Take Center Stage Members of the "Title I Schools: Our Views on Teaching About Data" panel with moderator Julius Cervantes One of the most moving moments of the conference came when students themselves took the stage. The panel "Graduates of San Antonio Title I Schools: Our Views on Teaching About Data" featured students from San Antonio who shared how data literacy shaped their education and career aspirations. Their stories illustrated why access to data science education is more than just a policy issue—it’s about empowerment. As panelist Kaylin Hernandez reflected, "When Mr. Youngsaver asked us to draw a statistician, I drew a man who looked like Albert Einstein. Now, that person looks like me." This session left many in the audience visibly moved, reinforcing that data literacy isn’t just about skills—it’s about giving students the tools to navigate their futures with confidence and agency. The Grand Finale: Unveiling the K-12 Learning Progressions Kate Miller and Zarek Drozda present the first draft of the K-12 Data Science Learning Progressions As the conference neared its close, anticipation buzzed through the room for the unveiling of the Data Science K-12 Learning Progressions—a long-awaited framework designed to provide a structured roadmap for integrating data literacy from elementary through high school. This milestone marks a significant step toward ensuring that data science is not just an add-on, but a core component of K-12 education. The framework is currently under review by a select group of educators, researchers, and state leaders, with the final version slated for public release this summer. Leading the presentation, Zarek and Kate Miller, Research Associate from the Concord Consortium, took the stage to walk attendees through the draft framework. Both were deeply involved in shaping the learning progressions, working closely with educators, researchers, and industry professionals to craft a developmentally appropriate sequence of data science concepts. They were soon joined onstage by a panel of focus group participants—educators and education leaders who had contributed to the drafting process during an intensive convening at the University of Chicago last June . With humor and insight, panelists shared behind-the-scenes stories of how the progressions took shape, including the long hours (and strong coffee) that fueled their collaborative work. Their reflections highlighted the depth of thought and dedication poured into making this framework both practical and scalable. Members of the focus group panel discuss their experience crafting the learning progressions Following the presentation, attendees were invited to provide feedback on the draft progressions, ensuring that the final version reflects the diverse needs of classrooms nationwide. The energy in the room was palpable—many left this session eager to help refine and implement this transformative initiative in their own schools and districts. The excitement surrounding the Learning Progressions reinforced a key takeaway from the conference: educators, policymakers, and industry leaders are united in their belief that data literacy must be a fundamental part of K-12 education. With momentum building, this framework has the potential to set the foundation for a future where all students graduate with the data fluency needed to thrive in an increasingly data-driven world. A Conference That Sparked Action and Community Impact Jocelyn Foran interviews members of the "Makerspace Youth Data Challenge Panel" The energy from the conference lingered long after the final session. Attendees left San Antonio with new ideas, expanded professional networks, and a renewed sense of purpose. Whether they were classroom teachers, district leaders, or researchers, everyone walked away with concrete strategies to bring data science to students in engaging and meaningful ways. Some sessions didn’t just inspire action—they made an immediate impact on the local San Antonio community. One of the standout moments was the first-ever DSE-K12 Youth Data Challenge, led by Jocelyn Foran, Science Consultant from Tuva. This exciting initiative brought together students and teachers from Harlandale ISD, along with volunteers from The University of Texas at San Antonio, to dive into real-world data investigations. Together, middle school students worked in teams competing against one another to analyze and model data to propose solutions to real-world problems. This year’s challenge focused on the question: What adjustments should Harlandale Independent School District make to its existing maker spaces and programming, and how can they best utilize the remaining open areas to serve their community's needs? After analyzing data and crafting solutions, teams presented their findings to a panel of judges, demonstrating not only their technical skills but also their ability to use data to drive meaningful change in their schools and communities. At the end of the session, the high school and college students and teachers who volunteered to assist the middle school students in the challenge came up to share how the experience had impacted them. They highlighted the enthusiasm of the students, their growing confidence in using data, and the collaborative problem-solving that took place. One major success? By the end of the challenge, every student involved could confidently explain what data science is—an incredible win for data education even before the conference wrapped up. Looking Ahead: Continuing the Momentum As attendees packed their bags and exchanged contact information, one thing was clear: this conference was just the beginning. Educators and advocates left energized and ready to take action, focusing on refining the K-12 Learning Progressions, expanding student access to high-quality data science education, and ensuring that teachers have the support and resources they need to bring data literacy into their classrooms. With new ideas, fresh perspectives, and perhaps a few extra cups of coffee, the attendees of the Data Science Education K-12 Research to Practice Conference left San Antonio ready to transform data science education for the next generation. This may have been the first conference of its kind, but judging by its impact and enthusiasm, it certainly won’t be the last.
- The Hill: AI ready to hit its stride in schools in 2025
As the new year unfolds, experts predict 2025 will be a breakthrough year for artificial intelligence (AI) in K-12 education. Following a foundational year in 2024, where strides were made in professional training, data science curriculum development, and federal guidance, AI is poised to play a transformative role in classrooms nationwide. According to Zarek Drozda, Executive Director of Data Science 4 Everyone, the education system has moved past its initial "reaction mode" with AI. Teachers have logged over 71,000 hours of professional development in data science since 2020, and nearly 300 schools have added data science courses. Yet, only two states—California and Oregon—have seen more than 3% of their students enrolled in these courses, underscoring the work still to be done. Federal efforts, like the release of a comprehensive AI toolkit for schools, have equipped educators with resources to integrate AI responsibly. Teachers now use AI to streamline lesson planning and personalize instruction, but challenges remain, including addressing issues like AI misuse and misinformation. As AI becomes more mainstream in classrooms, experts like Pati Ruiz of Digital Promise emphasize the importance of AI literacy and the need to mitigate security risks. With greater awareness, schools can responsibly harness the power of AI, not only for teaching but also to combat pressing issues like misinformation. 2025 offers an opportunity to build on last year’s progress and push AI integration in education to new heights. For more insights, read the full article by Lexi Lonas Cochran on The Hill here.
- A Collaborative Vision for Data Science Education: Insights from Six-State Northeastern Summit
Educators, policymakers and leaders from across the Northeast gather to discuss developments in K-12 data science education in the region The inaugural Northeastern K-12 Data Science Summit convened educators, policymakers, and leaders in New Hampshire on October 23rd, igniting a regional collaboration to advance data science education across state lines. For many participants, this was a groundbreaking opportunity; while formal programs are still evolving, each state brought forward dynamic initiatives with the potential to create impactful change nationwide. Opening the summit, Frank Edelblut, Commissioner of the New Hampshire Department of Education, captured the urgency and shared mission that united attendees: “We are anxious to make sure our students are equipped with the mathematical reasoning and tools they need to be effective citizens in America today.” Dr. Brendan Kelly, Director of Introductory Mathematics at Harvard University, expanded on this vision, sharing his passion for bridging math and real-world relevance. “Effective pedagogy with relevant mathematics should empower students. It should feel like having a skill set that lets you tackle real-world problems you care about. That’s the aspiration we should all hold,” Kelly emphasized. For Kelly, connecting math to real-world challenges is key. Sharing an example from his classroom, he noted, “We’ve partnered with executives at L.L. Bean who shared real operational data with our students, and they took on the role of problem-solvers. Knowing they’d pitch their strategies to real industry leaders took engagement to a new level.” This relevance, he explained, helps students “imagine themselves as problem solvers” in meaningful careers. State to State Exchange: Many Paths for Data Science in the Northeast The summit provided an opportunity for each state to present its distinct journey in bringing data science and data literacy into K-12 classrooms, revealing a tapestry of unique approaches and shared challenges across the Northeast. Massachusetts: Embedding Data in Civic Engagement In Massachusetts, data science education has taken on a civic dimension, driven by collaborative efforts from leaders like Deborah Boisvert of CSforMA and Shereen Tyrrell from Burlington High School. A legislatively mandated civics project, required as part of high school social studies coursework, has given data science advocates an opportunity to establish the Innovation Pathways to Data Careers (IPDC). Launched in 2021, IPDC is designed to introduce data literacy and build a progression of data skills in high school students. The IPDC pathway includes the Civics+Data Module, a foundational component that allows students to use data to explore social issues within their communities. Other components include data modules in Algebra II/Math III, integrating data literacy into math coursework, and Visualization+Data and Python+Data courses, which provide students with visualization and programming skills essential for data analysis. Together, these elements form a comprehensive IPDC pathway, linking high school coursework to community college programs that prepare students for industry roles and to university programs that develop future data scientists. IPDC is designed to introduce data literacy and build a progression of data skills in high school students. This program has already impacted 55 teachers and over 1,000 students, testing a progression that guides students toward data science careers. Students have investigated issues like gentrification and housing inequality, analyzing data on housing prices and inheritance trends to understand impacts on their communities. Tyrrell highlighted the program’s interdisciplinary appeal: “Once we brought data [into the classroom], the science teachers joined us. The business teachers joined us. The social studies teachers joined us. It was not just for math majors.” This collaborative approach has transformed classrooms, fostering a community of engaged learners excited to use data as a tool for civic engagement. Five additional districts will field-test this pathway, further strengthening connections between high school and post-secondary programs and opening doors to data-centric careers across various sectors. New Hampshire: Expanding Math Pathways with Data Science Anne Wallace, Mathematics/STEM Content Specialist, and Kris Conmy, Program Director of Math Learning Communities, shared New Hampshire’s innovative approach to data science education. Their work centers on rethinking traditional math pathways to offer students fresh avenues for developing practical, data-focused skills. “In New Hampshire, we asked ourselves a fundamental question: Do all students need the same mathematics pathways in high school?” Recognizing the value of data literacy in today’s world, New Hampshire embarked on a mission to create flexible math pathways that cater to diverse student goals and interests. “We want to bring data analysis and data science into high school,” Wallace continued, “because we live in a data-driven culture, and students must be equipped to interpret and make sense of the data all around them.” The Math Learning Communities (MLC) program is a state-funded, innovative partnership initiative designed to enhance the mathematical understanding and success of high school students in New Hampshire. Wallace described how New Hampshire’s data literacy initiatives extend beyond individual classrooms, connecting schools through a statewide network of math learning communities. This shared framework provides resources and training that empower educators across the state. “We’re building math pathways that all emphasize data analysis,” she said. “Students need to examine data, grasp its meaning, and understand where it comes from—skills that are essential in our data-rich world.” New Hampshire’s approach serves as an ambitious model, showing how a state can expand data science education through collaborative professional development and purposeful curriculum design. Vermont: A Flexible Path to Data Science in High School In Vermont, data science is being positioned as a compelling alternative to traditional high school statistics courses, thanks to the efforts of educators like Jessica VanDriesen from Woodstock Union Middle and High School. VanDriesen shared how Vermont offers data science as a semester-based elective alongside trigonometry, providing students with a flexible approach to math. “The curriculum we chose is a year-long course, but we’ve broken it up into two semesters—Data Science 1 and Data Science 2,” VanDriesen explained. She noted that this structure allows students who may be unfamiliar with data science to try it without committing to a full year. “We thought they would be more nervous about signing up for a year of something they didn’t know,” she said, “so we made it accessible.” VanDriesen’s vision for Vermont is to make data science a standard offering in high schools. “Rather than requiring a full year of Algebra 2, we’re considering offering students a choice between trigonometry or electives like data science in the second semester of that Algebra 2 year,” she said. This flexibility, she believes, will encourage students to pursue math beyond the state’s three-year requirement. “My goal is to make data science a standard offering in high schools across Vermont—not just an elective that a few schools happen to have.” Maine: Data Science Rooted in Environmental Learning Maine’s approach to data science education reflects its rich tradition of place-based learning and environmental science, as presented by Franziska Peterson and Sara Lindsay from the University of Maine and the Maine Center for Research in STEM Education. Through partnerships with organizations like the Maine Lakes Association, Maine’s K-12 students engage in hands-on data projects that connect directly to their communities. “Maine has a long history of engaging citizens in community science,” Peterson explained. “The Maine Lakes Association has been collecting water quality data since 1970, and we have similar projects in forestry, agriculture, and marine science.” These projects allow students to work with real data, analyzing issues that impact their communities while developing quantitative skills. Maine is empowering educators through authentic, place-based learning with projects like this Coastal Tracers project. The Maine Data Literacy Project, an initiative by the RISE Center in collaboration with the Kutic Institute, has trained around 15 middle and high school teachers in data science. Teachers in the project developed tools, such as a “graph choice chart,” to help students analyze and interpret data effectively. “It’s a wonderful tool that we use with our college students as well,” Lindsay noted. The project has grown into an online repository and a blog library, providing resources to help teachers integrate data literacy into classrooms across grade levels. In closing their presentation, Peterson highlighted the sustained professional learning that Maine provides, building connections between science researchers and teachers over multiple years. “It’s not just one-and-done or even just a summer,” she said. “We work with teachers over several years to help them bring data literacy into their classrooms.” Maine’s model is rooted in research-practice partnerships that enable teachers and scientists to collaborate on curriculum design, ensuring that data science education is relevant, accessible, and impactful for students. Connecticut: Flexible STEM Pathways to Support Data Literacy Andrew Hill, a STEM Curriculum Specialist from Brookfield High School In Connecticut, flexible graduation requirements create opportunities for data science integration, even as the state grapples with significant educational disparities. Andrew Hill, a STEM Curriculum Specialist from Brookfield High School, discussed Connecticut’s recent efforts to broaden its high school math curriculum by reducing rigid course requirements. “Connecticut has one of the largest achievement gaps in the country,” Hill shared. “But recent changes in our graduation requirements give districts flexibility to customize STEM credits.” Students now need only three specific math credits to graduate, allowing schools to design broader math pathways that align with local needs. “The flexibility allows us to design a curriculum that meets students where they are, preparing them for real-world careers in a way that feels relevant." Connecticut’s efforts are shaped by a focus on equity, but challenges remain, including a shortage of qualified data science teachers and inconsistent math programming across districts. But Hill expressed optimism about the state’s career pathways initiative, which aims to retain more students in Connecticut by preparing them for in-state jobs. By connecting introductory data science opportunities to career-oriented skills, Connecticut hopes to address local workforce needs while equipping students with essential data literacy skills for a wide variety of post-secondary pathways. Rhode Island: Creating Modern Math Pathways Ben Hall, Math Specialist for the Rhode Island Department of Education In Rhode Island, the focus is on aligning high school math courses with students’ post-secondary goals, ensuring that data science is viewed as a valuable and rigorous pathway. Ben Hall, a Math Specialist from the Rhode Island Department of Education, shared how the state is building consensus around data science as a recognized course for college-bound students. “We’re thinking about high school math course-taking and how it aligns with what students will be doing long-term,” Hall said. By collaborating with universities to ensure that data science is respected as a college-prep pathway, Rhode Island aims to make data literacy a viable alternative to traditional math sequences like precalculus and calculus. “Our goal is to make sure that seeing ‘data science’ on a transcript means rigor, not just an easier or more fun option,” Hall explained. Rhode Island’s efforts to shift this mindset represent an essential step in fostering a culture that values data science alongside traditional mathematics. Building a Shared Vision for the Future of Data Science DS4E’s Zarek Drozda wrapped up the summit with reflections on the collective purpose that brought everyone together, emphasizing the importance of addressing key challenges and motivations to advance data science education, in the Northeast and beyond. As the summit concluded, participants discussed the primary barriers and motivators specific to each of their states and considered immediate actions they could implement by January 2025. “All of you showed up today because you’re either interested in this work or you’re already doing something,” Drozda noted. “There’s a lot happening under the surface that we now have to bring up, scale, and make accessible for students in all of your states. And so that’s why I’m really excited for today’s conversation.” “Whether you like it or not, you’re all part of the data science team now. We’re all learning this together.”
- Bringing Data to Life: How Loudoun County Teachers Are Empowering Students with Real World Data in the Classroom
Loudoun County Public Schools (LCPS) has long been at the forefront of innovative education, with a particular focus on engaging students in research and data literacy. Dr. Stephen Burton, Science Outreach Teacher at LCPS and Darielle Timothy, the Science Supervisor at LCPS, are passionate about helping students view the world through the lens of data. For this team of educators, data science isn’t just about numbers and graphs — it’s about teaching students to navigate the data-driven world we live in and learn how to unlock the stories that data can tell. “We are inundated daily with data summarized to communicate some idea,” Dr. Burton explains, pointing out just how much information students passively encounter on a regular basis. From election polls to weather forecasts, students are constantly processing information, but without the right skills, they risk being misled by how that data is presented. “Without data literacy to evaluate how the data is being communicated and understanding what the summary descriptive statistics or figures can and cannot communicate, a person can be subject to being misled into an incorrect summary.” -Dr. Stephen Burton, LCPS Science Outreach Teacher A Broader Focus on Research and Data Literacy LCPS’s commitment to fostering data literacy in students begins with its robust “research curriculum,” which includes research-focused courses in earth science, biology, and chemistry. This curriculum builds students’ critical skills and experiences in research and data literacy, preparing them to confidently explore their own research or engineering design projects. The culmination of the research curriculum is the unique Independent Science Research (ISR) class, which allows students across the county to pursue independent research and explore their own questions, giving them hands-on experience working with data that relates to their lived experiences. By the time they reach the ISR course, students have gained valuable experience with data analysis, making them more confident and capable in conducting their own research projects. “Science research requires data because you’re always using data to draw information from. So that was the primary goal at the very beginning,” Dr. Burton explains The resulting excitement about exploring data in the research curriculum courses has resulted in the ISR class expanding from four schools in 2011 to 16 high schools in the county today, with over 400 students currently enrolled in the ISR program. Using DataClassroom for Real-World Learning At LCPS, students are learning how to use their data literacy skills to analyze data critically and see beyond the numbers. Through a real-world data curriculum powered by DataClassroom , students engage with authentic datasets, develop critical thinking skills and learn how to solve problems in their daily lives. LCPS educator Jennifer Flynn updated a dataset in DataClassroom describing the Early Spring in Kyoto dataset , tracking 120 years of cherry blossom “full flowering dates” in Kyoto, Japan, to include cherry blossom bloom dates in nearby Washington D.C. This instantly rendered the dataset more locally meaningful to the LCPS students, allowing them to also explore how climate can impact ecosystems. Using this data, students are asked to explore why the bloom dates might look relatively similar when the two cities are located on opposite sides of the globe. Cherry blossom bloom dates for Kyoto, Japan from 1912–2020 (Magenta) and Washington, D.C. from 1920 to 2022 (Green). Bloom dates represent when 70% of the cherry blossoms are open and are known to be largely driven by temperatures in winter and early spring. The Bloom Date on the Y-axis is date converted to the number of days starting with January 1st as day 1. As winters have become milder, the starting bloom dates for cherry blossoms has advanced between 7 days (in D.C.) and 10 days (in Kyoto) earlier. “I think it’s important to capture how students are utilizing the resource to support their own scientific thinking and development.” -Darielle Timothy, LCPS Science Supervisor This hands-on curriculum allows students to move beyond textbook learning and actively engage in data analysis. Timothy emphasizes the importance of this approach in teaching students the nuts and bolts of real data collection and analysis: “I really like that [DataClassroom] takes students and teachers through the actual process of data analyses and displaying that data in the way that’s appropriate based on the research methods that were used or the best way to display the results of the student’s research or experiment.” Incorporating Authentic Data Collection LCPS has also embraced the power of authentic data collection in its classrooms. Dr. Burton explains LCPS managed to integrate local data into student learning on microhabitats, small areas within a larger habitat that have unique environmental conditions and support a distinct community of organisms. Outside LCPS classrooms, teachers are using data sensors to measure light and temperature, building a large database of readings spanning over five years that allows students to explore environmental patterns in the area. These tools give students a unique opportunity to analyze how local conditions impact the environment. Additionally, the integration of geographic information systems (GIS) allows students to map data to their surrounding ecosystems, making their learning experience both interactive and locally relevant. A comparison of three forest “islands” light levels at the Academies of Loudoun. Data were collected with hobo sensors were placed in 5 N-S 50 m long transects spaced approximately 20 m apart with hobo sensors every 10 m.. The central island is 50 m long in the N-S direction. The North and South “islands” start at a road (higher temperatures) and extend 50 m into the forest. In general the Central forest island shows higher light levels (with greater variability) compared to the North and South forest island. A comparison of the three forest “island” temperatures. The Central forest island has more edge compared to the North and South forest islands allowing more light to penetrate (Fig A). The result is that the Central forest island has higher overall temperature across the forest compared to the North and South forest islands (Fig B). “That’s the beauty of using real data in the classroom. Students can make global-local connections and engage with data that’s meaningful to them.” -Dr. Stephen Burton, LCPS Science Outreach Teacher Through programs like DataClassroom, LCPS encourages students to dive deeper into real-world phenomena. By providing access to real data and fostering authentic application, Loudoun County ensures students are not just learning science — they’re actively participating in it. This approach emphasizes the power of data literacy, equipping students with the skills they need to interpret the world around them and draw meaningful conclusions. Dr. Burton highlights how these real-world data opportunities help students better understand their environment: “Students can look at how quickly the air above the snow warms up, or compare micro-habitats within a specific area to see how the temperature varies due to different environmental conditions.” By teaching data collection through local projects, the school is making data-driven learning accessible, engaging, and impactful for students. Supporting Teachers Through Professional Development Loudoun County schools are not only focused on student learning but also on empowering their teachers through dynamic professional development opportunities. While there are designated district-wide PD days, the school district has found success in also offering smaller, focused “micro PD” sessions as well. These quick, targeted training sessions allow teachers to gradually build their comfort and confidence with programs like DataClassroom, especially those teachers who are new to the platform. Dr. Stephen Burton highlights how LCPS encourages peer learning: “It’s one thing for us at the administration to say, ‘Use this resource,’ but it’s another for teachers in the trenches to say, ‘I’m using this resource, and it’s really great.’” This peer-to-peer approach has empowered educators to take ownership of their learning, with some now presenting to fellow teachers on how they successfully use DataClassroom. Timothy adds that many teachers appreciate the ability to gradually integrate these new resources into their classrooms: “How do you eat an elephant? One bite at a time.” This philosophy underpins LCPS’s approach to PD, offering small, manageable steps that help teachers develop their skills in data collection and analysis. The district’s commitment to dynamic and flexible PD opportunities ensures teachers are supported at every stage of their journey, making the integration of data science tools seamless and effective. Building Data-Confident Students For Dr. Burton, building data literacy in students is key to preparing them for a data-driven future. “When our citizens have poor data literacy, they can be more easily misled,” he explains, emphasizing why the data curriculum at LCPS is so vital. By giving students the tools to think critically and question the data they encounter, LCPS is setting them up for success in any future career they may choose to pursue. “It’s exciting to see students so engaged with the material,” Dr. Burton says. And that’s exactly what LCPS aims for: creating empowered, data-confident students ready to take on the challenges of tomorrow. As these programs continue to expand, LCPS is preparing its students to confidently navigate a data-rich future, where they will be empowered to solve complex problems and make informed decisions. The commitment to integrating authentic data collection and analysis in the classroom ensures that Loudoun County students are not just learning about data — they’re shaping the future with it.
.jpg)











