MSDS Capstone Projects

 

Student with laptop

Master of Science capstone projects in the College of Information Science provide an opportunity for students to showcase what they have mastered in the program.

The capstone project is based on a project plan that includes project goals, master's competencies addressed by the project, system design, implementation schedule, assessment plan and milestones. The project contributes to the development and enforcement of the student's knowledge and skill sets in the fields of data science and information science.

The capstone project must exercise all competencies required for the master's degree and must also have a software development component.

Recent Capstone Projects

View recent College of Information Science master's capstone projects completed by students and student teams:

CAPSTONE PROJECT FACULTY ADVISOR

NumEval: Numeral-Aware Language Understanding and Generation
This project delves into enhancing NLP models by improving their ability to comprehend and generate numeric data within text, a crucial but often overlooked component of language processing. It aims to elevate machine understanding of numerical values in context, particularly through tasks like quantitative prediction and natural language inference.

Keywords: Numeracy, Natural Language Processing, Quantitative Inference

Steven Bethard

Visual Word Sense Disambiguation
This initiative focuses on training models to resolve word ambiguity by associating text with corresponding images, leveraging context to discern the correct meaning of homonyms. By improving how machines interpret ambiguous words through visual cues, this project aims to enhance semantic comprehension in NLP.

Keywords: Word Sense Disambiguation, Image Matching, Semantic Analysis

Steven Bethard

Plant Based Predictions
The goal is to analyze market trends in plant-based diets to identify gaps between consumer expectations and product offerings, while also considering sustainability and profitability. This study bridges dietary preferences with ethical and environmental trends to forecast market dynamics in the food industry.

GitHub Repository

Keywords: Market Research, Sustainability, Consumer Behavior

Sarah Bratt

Snorpheus
By combining audio and positional data, this project aims to detect snoring events and correlate them with body positions to better inform treatments for Sleep Related Breathing Disorders like Obstructive Sleep Apnea. The project addresses the need for an integrated system that visualizes these data streams to improve diagnosis and therapy recommendations.

Keywords: Audio-Positional Data, Sleep Disorder Detection, Health Informatics

Winslow Burleson

Brawl the Shrimp
Exploring the intersection between music preferences and political leanings, this project aims to predict users' political affiliations based on their Spotify data, offering insights into the cultural influences that shape political ideologies. It challenges existing cultural stereotypes by linking personal music tastes to larger societal trends.

GitHub Repository  |  Website

Keywords: Predictive Modeling, Cultural Analysis, Music Data

Greg Chism

Terrorism Observability
This project seeks to transform the Global Terrorism Database into an accessible, analytics-driven dashboard that provides insights into global terrorism risks. The aim is to facilitate informed decision-making for organizations by analyzing and presenting critical metrics such as temporal patterns and geographic hotspots.

GitHub Repository

Keywords: Data Visualization, Predictive Analytics, Global Terrorism

Greg Chism

The Crossroads of Risk: A Comprehensive Analysis of Traffic Accidents
Analyzing traffic accident data, this project aims to identify key factors that contribute to accidents, offering actionable insights to improve traffic safety and guide policy-making. By uncovering patterns, the goal is to better predict and mitigate future accidents for safer roads.

GitHub Repository

Keywords: Traffic Safety, Predictive Analytics, Risk Assessment

Hong Cui

Leveraging Generative AI for Business Applications
This project explores the application of generative AI in business, focusing on how AI-generated content can optimize operations, marketing, and customer engagement. It also addresses critical challenges like ethical concerns and data privacy in the rapidly growing field of AI-driven solutions.

Keywords: Generative AI, Business Applications, Ethics

Hong Cui

Analyzing Trader Sentiment through Cloud-Based Data Engineering
Developing a cloud-hosted data pipeline on AWS, this project aims to analyze trader sentiment from notes shared on a platform, delivering historical data insights to aid in rational decision-making. This system provides traders with actionable intelligence to better navigate market trends.

Keywords: Cloud Computing, Sentiment Analysis, Financial Markets

Hong Cui

Mini Twitter - Microblogging Service
This project builds a microblogging platform where users can post, comment, and interact, allowing for a streamlined social media experience with customizable account management. It incorporates fundamental web development skills in building a functional, scalable service.

Keywords: Web Development, Social Media, Microblogging

Hong Cui

Using Machine Learning to De-duplicate Massive Material Sample Records
This project tackles the challenge of identifying duplicate material samples in SESAR's repository using machine learning to match metadata and suggest potential duplicates, improving data integrity. With a focus on legacy data, it applies sophisticated pattern recognition to ensure accuracy in sample records.

Keywords: Machine Learning, Data Deduplication, Metadata Analysis

Hong Cui

iVoices
iVoices processes and analyzes technology-related stories contributed by students, providing data-driven insights for researchers to understand societal impacts of technological change. This project aims to disseminate findings through both digital and print formats, bridging academic research and public discourse.

Keywords: Data Analysis, Technology Impact, Research Dissemination

Diana Daly

PROJECT Apollo
This project employs topic modeling and network analysis to explore global research trends in Astronomy and Astrophysics, offering new insights into the evolving scientific landscape. With 40% completion, it already leverages cutting-edge models like STM and Snorkel, and aims to map the discipline’s knowledge base.

GitHub Repository

Keywords: Topic Modeling, Network Analysis, Astronomy Research

Charles Gomez

Identifying Leaf Phenology Patterns
This project aims to use unsupervised machine learning to predict leaf phenology in deciduous forests by analyzing PhenoCam images, elevating past research on color clustering and feature extraction. The goal is to improve our understanding of environmental changes through image-based pattern recognition.

GitHub Repository

Keywords: Image Segmentation, Unsupervised Learning, Environmental Analysis

Bryan Heidorn

Analyzing Federal Grant Programs: Insights from BERT
This research investigates shifts in federal research grant allocations over the past 25 years, using advanced NLP models like BERT and Doc2Vec to anticipate future trends. It broadens the scope beyond NSF data, encompassing agencies like NASA and DOE to provide a comprehensive overview of research funding.

Keywords: NLP, Grant Analysis, Predictive Modeling

Bryan Heidorn

Topic Analysis (NLP)
This project focuses on tracking the distribution of federal funding for drone projects, offering insights into the growing importance of drones and robotics. It provides key data to stakeholders on how investments are being funneled into the robotics industry, helping shape future developments.

GitHub Repository

Keywords: Topic Modeling, Drone Research, Funding Trends

Bryan Heidorn

Pima Animal Care Center - Animal Database and Adopt/Foster Look Up
By redesigning the Pima Animal Care Center's online database, this project aims to create an intuitive platform where potential foster or adoptive parents can easily search for animals based on specific traits. It addresses inefficiencies in the current system to streamline the adoption process and help more animals find homes.

Keywords: Web Development, Data Management, Animal Welfare

Bryan Heidorn

Analysis and Financial Modeling of Federal Grant Programs
This project seeks to unravel the complexities of scientific funding over the past quarter-century, employing cutting-edge techniques like Doc2Vec and BERT for nuanced analysis. By leveraging supercomputing, it optimizes scalability, offering insights into evolving research priorities and funding trends.

Keywords: Financial Modeling, NLP, Federal Funding

Bryan Heidorn

Analyzing Airline Reviews for Customer Feedback
Using sentiment analysis and topic modeling, this project automates the analysis of customer reviews on Skytrax Airline Quality to uncover key insights for improving airline services. It aims to streamline feedback processing, providing airlines with actionable data to enhance customer satisfaction.

Keywords: Sentiment Analysis, Topic Modeling, Customer Feedback

Xuan Lu

Maternal Higher Education and Child Nutrition Status: Case of Uzbekistan
This research examines the impact of maternal education on child nutrition in Uzbekistan, using newly available datasets like UNICEF’s Multiple Indicator Cluster Surveys. It aims to highlight gender and education’s role in addressing malnutrition, informing governmental efforts to reduce disparities.

Keywords: Social Impact, Health Outcomes, Educational Data

Xuan Lu

Sentiment Analysis on Social Media Comments: Evaluating Public Perception for Reforms in Uzbekistan
This project analyzes public sentiment on Uzbekistan's recent reforms using social media data, providing insights into the population's reception of policy changes. It aims to inform future governmental actions by highlighting areas where reforms are well-received and where more engagement is needed.

Keywords: Sentiment Analysis, Policy Evaluation, Social Media

Xuan Lu

Spatial Bayesian Network for the Timely Identification of Contamination Events in Water Distribution Networks
By integrating multiple data streams such as water quality monitoring, public health reports, and system maintenance data, this project aims to enhance the detection of contamination events in water distribution networks. The proposed framework improves the surveillance and response capabilities for waterborne threats.

Keywords: Bayesian Networks, Contamination Detection, Water Safety

Clayton Morrison

Deep Learning for Closed-Loop Communication Detection
This project aims to automate the detection of closed-loop communication (CLC) in team settings, using deep learning to enhance real-time feedback and improve team efficiency. By identifying CLC patterns, it seeks to optimize coordination and communication within various industries.

Keywords: Deep Learning, Communication Efficiency, Real-Time Detection

Adarsh Pyarelal

Rule-Based Detection of Closed-Loop-Communication in Multi-Parti Communication
The project focuses on creating a real-time system for detecting closed-loop communication in multi-party dialogues using rule-based methods, improving upon manual post-hoc analysis techniques. It aims to automate the identification of communicative events in complex conversations, enhancing dialogue systems.

Keywords: Rule-Based Systems, Multi-Party Dialogue, Communication Detection

Adarsh Pyarelal

Escherichia coli Predictions in the Upper Santa Cruz River
This project aims to improve predictive models for E. coli levels in the Upper Santa Cruz River, with the goal of better informing the surrounding community about water quality. By refining predictive accuracy, it hopes to contribute to environmental and public health efforts.

GitHub Repository

Keywords: Water Quality, Predictive Modeling, Environmental Health

Cristian Román-Palacios

Machine Learning for Phylogenetic Reconstructions
By integrating machine learning techniques, this project seeks to enhance phylogenetic reconstruction accuracy, optimizing tree search processes. The goal is to reduce computational complexity while maintaining precise evolutionary insights.

GitHub Repository

Keywords: Phylogenetics, Tree Search, Evolutionary Biology

Cristian Román-Palacios

Analysis of Online Course Performance and Activity Metrics
This project investigates the relationship between online student activity metrics and course performance, offering educators new insights into student engagement and learning outcomes. By analyzing data from platforms like D2L, it aims to enhance teaching effectiveness and student success.

Keywords: Educational Data, Student Performance, Learning Analytics

Cristian Román-Palacios

Identifying Environmentally Comparable Cities Across the Globe for Urban Evolutionary Ecology Research
This project aims to create a quantitative framework for comparing cities based on both human and environmental factors, advancing urban evolutionary ecology research. By understanding how urban features influence ecological patterns, it seeks to inform future urban planning and sustainability efforts.

Keywords: Urban Ecology, Comparative Analysis, Sustainability

Cristian Román-Palacios

The Color Palette of Neighborhoods
This project explores the color schemes of building facades in U.S. neighborhoods, analyzing the aesthetic and cultural insights that arise from these palettes. The findings can contribute to discussions around urban design and community identity.

Keywords: Urban Design, Color Analysis, Cultural Insights

Cristian Román-Palacios

Using Machine Learning Models to Support the Department's Student Admission Process and to Determine the most Effective Admissions Drivers
This project applies machine learning to streamline the admissions process by identifying key predictors of student success, reducing the manual workload for admission committees. It aims to improve the selection process while maintaining human oversight for complex cases.

Keywords: Machine Learning, Admissions Analytics, Predictive Modeling

Cristian Román-Palacios

Water Demand Dashboard for the City of Flagstaff, Arizona
This project aims to develop a predictive dashboard to help the City of Flagstaff manage future water use, based on current and projected consumption patterns. By providing real-time insights, it aids city planners in ensuring sustainable water resource management.

Keywords: Water Resource Management, Predictive Analytics, Sustainability

Cristian Román-Palacios

Effect of the Pandemic on Pollution Levels
This project compares pollution levels before, during, and after pandemic lockdowns, providing insights into the environmental impact of human activity reduction. The analysis seeks to quantify changes in air quality and other pollutant markers.

Keywords: Pollution Analysis, Pandemic Impact, Environmental Data

Cristian Román-Palacios

Applying Parallel Computing for Managing Customized Surveys Designed for Nested Mixed Method Study
This project leverages parallel computing to enhance the efficiency of processing survey data for a mixed-method study on insurgent governance, significantly reducing the time required to analyze large datasets. The use of parallelism improves the handling of complex, multi-dimensional data.

Keywords: Parallel Computing, Mixed Methods, Survey Data  

Cristian Román-Palacios

Systematic Review of Models Used for Clumped Isotope Thermometry
By evaluating various regression models, this project seeks to improve methodological accuracy in clumped isotope thermometry, providing insights into the robustness of models under different conditions. The findings contribute to better understanding of paleoclimate data.

GitHub Repository

Keywords: Regression Analysis, Thermometry, Paleoclimate Models

Cristian Román-Palacios

An Analysis of the 3D Shape of Cities
This project analyzes the 3D shapes of cities worldwide to explore how urban forms relate to regulatory policies and sustainable development. By leveraging global datasets, it contributes to the broader discourse on urbanization and city planning.

Keywords: Urban Analysis, 3D Modeling, Sustainability

Cristian Román-Palacios

The Effect of Type of Scientific Communication Method on Retaining Math Information
By investigating the effectiveness of different scientific communication methods, this project aims to identify the most impactful ways of retaining mathematical information, contributing to improved educational strategies. The study uses regression and visualization techniques to analyze trends across educational institutions.

Keywords: Scientific Communication, Education Analytics, Regression Analysis

Meaghan Wetherell

Environmental Racism: Superfund Sites and Native American Reservations
This project examines the relationship between Superfund sites and Native American reservations, investigating potential environmental injustices and health risks. It aims to highlight disparities in environmental contamination and propose solutions for marginalized communities.

Keywords: Environmental Racism, Health Disparities, Data Analysis

 

Ready to transform your future in data science?

Learn more about the Master of Science in Data Science by contacting us at infosci-grad@arizona.edu, or review the admissions process and begin your application now:

Start Your Application