Master of Science capstone projects in the College of Information Science provide an opportunity for students to showcase what they have mastered in the program.
The capstone project is based on a project plan that includes project goals, master's competencies addressed by the project, system design, implementation schedule, assessment plan and milestones. The project contributes to the development and enforcement of the student's knowledge and skill sets in the fields of data science and information science.
The capstone project must exercise all competencies required for the master's degree and must also have a software development component.
Recent Capstone Projects
View recent College of Information Science master's capstone projects completed by students and student teams (descriptions are provided by students):
PROJECT TITLE & DESCRIPTION |
FACULTY ADVISOR |
---|---|
NumEval: Numeral-Aware Language Understanding and Generation In previous SemEval competitions, the majority of tasks have primarily focused on analyzing words within a text, with scant consideration given to numerical data. But comprehension of numerical values can significantly enhance performance in certain tasks as numbers provide important information in words, especially to me working as a CPA. Numeracy seems one of the recent hot topics in Natural Language Processing and quantitative understanding in NLP is new to me. SemEval 2024 offers NumEval as one of the tasks and NumEval consists of 3 tasks, which then consists of various subtasks. I am particularly interested in Task 1, which is further divided into 3 subtasks: Quantitative Prediction (QP), Quantitative Natural Language Inference (QNLI), and Quantitative Question Answering (QQA). QP is the task of predicting the correct magnitude of the masked numeral while QNLI is the task of making natural language inferences based on quantitative clues and QQA is the other format for testing whether models can understand numerals and semantics. |
Steven Bethard |
Visual Word Sense Disambiguation The goal of this project is to train a machine learning model to match words to images based on their semantic meaning within the context of a phrase or sentence. Often times in Natural Language Processing, models are challenged when it comes to identical words having different meanings depending on the context. In this project we aim to show that the model can identify the correct semantic meaning of ambiguous word tokens by having it choose the correct image representation. |
Steven Bethard |
Plant Based Predictions |
Sarah Bratt |
Snorpheus Snoring is a symptom of Sleep Related Breathing Disorder (SRBD). 57% of men and 40% of women in the US snore. If left untreated it can lead to Obstructive Sleep Apnea (OSA), which one in five adults in the US suffer from some form of. Untreated OSA can result in a number of health problems including hypertension, stroke, arrhythmias, cardiomyopathy. For mild to moderate OSA, use of oral appliances designed by qualified dentists is considered the first line of therapy in management of OSA. Depending on the severity of snoring related to body positions (supine or on the side), positional therapy in combination with an oral appliance can be a useful treatment to help improve quality of life measures. Ability to identify and document snoring events is challenging for patients, especially for patients with no bed partners. Associating the snoring events with patients body position over the duration of sleep period can provide valuable information for recommending appropriate therapy. There is currently no solution that allows this combination of positional and audio data to be easily collected, and there is no software that allows this highly related data to be visualized together. |
Winslow Burleson |
Brawl the Shrimp |
Greg Chism |
Terrorism Observability |
Greg Chism |
The Crossroads of Risk: A Comprehensive Analysis of Traffic Accidents |
Hong Cui |
Leveraging Generative AI for Business Applications The project aims to explore the diverse applications of generative AI within business contexts. It will address the growing relevance of AI-generated content, its impact on business operations, marketing strategies, and customer engagement. Additionally, the project seeks to tackle challenges such as ethical considerations, data privacy, and the potential biases inherent in AI-generated content. |
Hong Cui |
Analyzing Trader Sentiment through Cloud-Based Data Engineering This capstone project will focus on the development of a data pipeline hosted on the AWS cloud platform to analyze trader sentiment based on notes shared on a note-taking platform. The primary objective is to empower traders to make rational decisions by presenting historical data insights on the project's frontend. |
Hong Cui |
Mini Twitter - Microblogging Service It creates a platform on which users can register account, post and interact with other posts. Registered users can comment, post, modify and delete posts. |
Hong Cui |
Using Machine Learning to De-duplicate Massive Material Sample Records SESAR is a repository hosting millions of earth science samples, like rocks, minerals, water samples from the ocean, etc. SESAR has a challenging task: Use metadata of a given sample to identify duplicates or suggest to a user that there may be a duplicate in a database. Many samples in data systems for legacy data will never have unique identifiers, so we need to use sample metadata, sample name, and related information such as authors or keywords of papers that have data for the sample to decide or at least suggest that it is the same sample. |
Hong Cui |
iVoices iVoices allows students to make stories about technology. We process it in some ways through research and data analysis and provide these information and data to other researchers or interested groups through digital or paper magazines. Through this information, we can better understand the changes of technology to our society and the social problems that arise, so as to better understand our society and technology, and provide some research results for better future development. |
Diana Daly |
PROJECT Apollo |
Charles Gomez |
Identifying Leaf Phenology Patterns |
Bryan Heidorn |
Analyzing Federal Grant Programs: Insights from BERT Federal research grant programs have undergone significant changes in their funding distributions across different topics in the past quarter-century. It's essential to understand these shifts, anticipate future topics of interest, and evaluate the effectiveness of these programs. This project expands on previous work that delved into National Science Foundation's data, broadening the horizon to encompass data from other influential agencies such as NASA and the Department of Energy. |
Bryan Heidorn |
Topic Analysis (NLP) |
Bryan Heidorn |
Pima Animal Care Center - Animal Database and Adopt/Foster Look Up I have fostered a dog from Pima Animal Care Center (PACC) by looking through their online database of over 500 dogs, cats, and other small animals. The database online is not user friendly and is not able to easily navigate especially if you are looking for specific animal traits and characteristics. I am proposing to create a new website/database that stores the animal information (such as breed, weight, gender, crate trained, house broken, etc.) so possible foster/adopt parents can find the animal they are looking for more efficiently. I would incorporate the skills I've learned in creating web pages (INFO 515 and 578), information presentation (INFO 578), and more with this project. The problems that it will tackle is beyond the capstone project, but also helping the community foster/adopt animals in the shelter. It will allow users easy access to animal profiles and potentially help fostering/adopting become more efficient. |
Bryan Heidorn |
Analysis and Financial Modeling of Federal Grant Programs This capstone project seeks to unravel the intricacies of scientific research funding over the past 25 years. With technological advancements and evolving research priorities shaping the landscape, understanding federal funding patterns is crucial. The project aims to address key questions, including shifts in funding over time, the distribution of funds across diverse research topics, and predictions for future funding trends. Through advanced modeling techniques such as Doc2Vec, BERT, and correlated topic models, the project intends to offer a nuanced analysis beyond traditional methods like Latent Dirichlet Allocation. Additionally, by leveraging supercomputing networks, the project optimizes scalability, enabling the exploration of larger datasets for a more comprehensive understanding. The significance of this endeavor lies in informing policymakers, researchers, and funding agencies, facilitating data-driven decisions, aligning research efforts with current priorities, and optimizing resource allocation for future innovation in scientific research. |
Bryan Heidorn |
Analyzing Airline Reviews for Customer Feedback Skytrax Airline Quality (airlinequality.com) is a platform where customers can submit their experiences with airlines they have used for travel. Airlines can use this information and determine how they can improve experiences and services that are offered. Since there reviews are constantly added, it can take time to go through each review manually and understand the issue overall. Instead, we can use Sentiment Analysis and Topic Modeling to come up with automated system that can generate charts and other important information for better insights and precise decision making for improving their services. |
Xuan Lu |
Maternal Higher Education and Child Nutrition Status: Case of Uzbekistan As a country that has historically been closed off, Uzbekistan has lacked extensive data collection by international organizations such as the World Bank and UN clusters. This scarcity of data has limited the understanding of various social and health dynamics within the country. The recent openness of Uzbekistan to the world and the subsequent publication of new datasets, such as the Multiple Indicator Cluster Surveys (MICS) by UNICEF, present a unique opportunity. This research is among the first to utilize these new and comprehensive data sources to explore critical social issues in Uzbekistan. Moreover, Understanding the impact of maternal education in a patriarchal society like Uzbekistan, where sons often receive preferential treatment in families, is crucial. This research can shed light on the broader social implications of gender and education on child health. By investigating the hypothesis that higher-educated mothers are more likely to provide healthy nutrition to their children, this research could support and inform government efforts to reduce gender disparities and improve child health outcomes. This is particularly relevant as the government is now actively working to bridge gender gaps influenced by cultural traits and systems. |
Xuan Lu |
Sentiment Analysis on Social Media Comments: Evaluating Public Perception for Reforms in Uzbekistan Over the last 7 years, Uzbekistan has undergone significant reforms in areas such as governance, economics, and social policy. While these reforms are critical at the policy level, their success is equally measured by the acceptance and perception of the general public. This project aims to analyze public sentiment on these reforms using comments extracted from social media platforms. This analysis would be beneficial for the areas of reforms that are well-received and those that need more public engagement. My primary research question is: "How does the online community perceive the reforms implemented in Uzbekistan?" |
Xuan Lu |
Spatial Bayesian Network for the Timely Identification of Contamination Events in Water Distribution Networks Approximately 18% of outbreaks that occurred in the European region between 2000 and 2010 were associated with water. However, the real burden of waterborne diseases is unknown given a lack of proper surveillance protocols, as well as limited laboratory capacity. Thus, the World Health Organization (2019) has encouraged the strengthening of surveillance systems around Acute Gastrointestinal Illness (AGIs) to better identify ongoing waterborne outbreaks. More specific to urban infrastructure, after 9/11, the deliberate introduction of harmful substances into Water Distribution Systems (WDS) became a threat given the potential for severe public health consequences. More recently, these concerns have been focused on unintentional events resulting from pathogenic, chemical, or microbial agents introduced into the network due to cross-contamination with non-potable water. The Surveillance Response Systems (SRS) have relied mainly on online water quality monitoring, measuring surrogate parameters that indicate an abnormal water quality. However, given this specificity, some contaminants may go undetected limiting the detection capabilities of the framework. Thus, my project proposes a framework to integrate multiple data streams that may indicate an AGI, including reports from public health, customer complaints from water utilities and the status of the system (failures and reported maintenance). The latter streams will supplement online water quality measurements to enhance and increase the detection capabilities of SRS. |
Clayton Morrison |
Deep Learning for Closed-Loop Communication Detection Good teamwork leads to a high level of productivity and job satisfaction. Effective communication among team members is crucial in facilitating cooperation, trust, and efficient problem-solving. One key aspect of effective team communication is closed-loop communication (CLC), which has been proposed in the literature as a coordinating mechanism for effective teamwork. CLC is a feedback process in which the receiver of a message sends a response or confirmation back to the sender. CLC has three components: call-out, check-back, and closing of the loop. The feedback process ensures that messages are accurately transmitted and understood and has been demonstrated to improve team efficiency in various domains. However, most existing research on CLC is conducted post-hoc, for example, by watching videos of sessions after they occur and recording only the parts that researchers are interested in (such as CLC categories and task completion time). There is a need for an automated method for detecting CLC. With the use of automated detection, real-time monitoring of communication can be achieved, allowing for immediate feedback and quick adjustments to be made and largely improving team communication. |
Adarsh Pyarelal |
Rule-based Detection of Closed-Loop-Communication in Multi-Parti Communication Closed-loop communication (CLC) is often recommended in the team research literature as a communication behavior that can guarantee the accuracy of information exchange. Currently, CLC in spoken dialogue is identified via retrospective analyses involving manual transcription and annotation. Currently, most real-time dialogue systems are limited to conversing with a single human at a time. On the other hand, there are numerous analyses of multi-participant spoken dialogue in the academic literature - however, these are primarily performed offline rather than in real-time, and the communicative events in their multi-party conversations are manually coded rather than automatically extracted using information extraction (IE) methods. To address this limitation, I propose to develop a separate downstream CLC detection component that utilizes the outputs of the existing dialog agent, but also reasons about context and state more deeply. |
Adarsh Pyarelal |
Escherichia coli Predictions in the Upper Santa Cruz River |
Cristian Román-Palacios |
Machine Learning for Phylogenetic Reconstructions |
Cristian Román-Palacios |
Analysis of Online Course Performance and Activity Metrics In 2020, a worldwide pandemic broke out. COVID-19 caused mass quarantines across the world, including in the United States. Without the ability to meet in-person, schools looked to online course structures and platforms to host their curriculum. At many universities, including the University of Arizona, the platform Desire 2 Learn, better known as D2L, is used as a central location for students to access, interact with, and submit class content. D2L also reports automated activity metrics to course instructors to be able to track the progress of not only the class, but individual students. I believe these activity metrics, some tracked across time, might be worthwhile indicator metrics to estimate the level of effort and presumed subsequent performance of individual students. Other uses of the analytics could pertain the effectiveness of different teaching styles/ content organization. If the analysis provides conclusive or suggestive results, this could spark more interest in the data and serve as a reliable tool for educators to keep track of. |
Cristian Román-Palacios |
Identifying Environmentally Comparable Cities Across the Globe for Urban Evolutionary Ecology Research The goal of the project is to define a quantitative framework for comparing urban areas based on climate- and human-related features. Defining these areas and their comparability is a key task in urban ecology, the study of ecosystems in and around humans and urbanizing landscapes. There have been efforts in the field of urban geography to classify and compare cities but work from an urban ecology perspective is lacking. Previous urban ecology studies in this area have focused on using specific species of plant and animal life to compare responses to urbanization in different localities. This work will use human features such as city size, population, and infrastructure, as well as climatic features such as temperature, precipitation, and other geographic traits to provide a more general framework for comparing cities across different regions. The resulting framework will enable researchers in urban ecology to better understand the relationships between drivers of ecological and evolutionary patterns in urban areas. |
Cristian Román-Palacios |
The Color Palette of Neighborhoods This project aims to extract and analyze the color palettes of building facades in US neighborhoods, addressing urban design, cultural, and environmental insights. |
Cristian Román-Palacios |
Using Machine Learning Models to Support the Department's Student Admission Process and to Determine the most Effective Admissions Drivers Machine learning algorithms have been used in the past on admissions data to enhance the admission process, making it more efficient, and this has the potential of improving the selection process. Instead of replacing human decision makers, these algorithms can instead be used to assist them. Human oversight is very crucial to address individual cases that may not fit the model. With new high demand programs developing, such as data science, Machine Learning and Artificial Intelligence, the School of Information has been witnessing an increased number of applications. This has greatly increased the time spent going through applicant information, and the fear is that the admissions staff could soon get overwhelmed by the number of applications to the graduate college, especially during peak intake seasons. Automation of this process using historical data could be done by studying the department decision making process. By evaluating previously admitted candidates and identifying the pivotal application metrics historically employed for admissions, it is possible to automate this process. Such data can further serve as a basis for predicting both the quantity and characteristics of prospective students likely to excel in the program, as well as estimating total enrollments per semester. By focusing a significant amount of attention on promising applications, this approach ultimately minimizes waste and enhances the efficiency of the selection process. |
Cristian Román-Palacios |
Water Demand Dashboard for the City of Flagstaff, Arizona As a provider of municipal water, the City of Flagstaff, Arizona is governed by the Adequate Water Supply Program as defined by the Arizona Department of Water Resources. Part of this designation is to include both Physical Water Availability, and Continuous Water Availability. To demonstrate these criteria, the city needs to be able to predict future water use based on current consumption versus future anticipated growth, including current water use, committed water use, projected water use, and future needs. In early 2021, the City of Flagstaff received a pro bono publico demonstration dashboard from a consulting firm (EHS-Support). The dashboard attempted to summarize the water consumption each water meter that the City of Flagstaff supplies water to for the past five years. The dashboard then attempted to assign an average water use, per meter, and summarized the average by multiple geographic factors. |
Cristian Román-Palacios |
Effect of the Pandemic on Pollution Levels I will look at multiple data sets of several pollutant markers and then differentiate them between the pandemic affected years with years before and after lockdowns took place. |
Cristian Román-Palacios |
Applying Parallel Computing for Managing Customized Surveys Designed for Nested Mixed Method Study The project is currently ongoing and is entitled as "a nested mixed-methods approach to armed non-state actor governance and the rule of law" (PI: Javier Osorio). Recently, conflict scholars advanced different theoretical framework classifying insurgent and criminal governance structures. As there are no existing validated measures, the project relies on an online survey of hundreds of local experts. To conduct the online survey, the research will rely on the institutional license of the online survey system Qualtrics. However, as the number of local experts increase, the time requires to process data becomes intensive (up to 16 hours). The current capstone project proposes application of the parallel computing to enhance the processing performances. |
Cristian Román-Palacios |
Systematic Review of Models Used for Clumped Isotope Thermometry |
Cristian Román-Palacios |
An Analysis of the 3D Shape of Cities This project focuses on analyzing the 3D shape of cities worldwide using a dataset that compiles building height information globally. By incorporating shapefiles, such as the one defining city limits, the project aims to conduct a regulation analysis, exploring the interplay between building shapes and urban. This project aims to contribute to the ongoing discourse on sustainable urbanization by providing a detailed examination of the 3D shapes of cities. By utilizing a globally sourced dataset and established methodologies, our exploration is poised to offer valuable insights into the intricate dynamics between urban forms, regulations, and sustainable urban development. This research is valuable because it provides a more detailed understanding of cities beyond traditional 2D studies, offering insights into three-dimensional aspects like building shapes and relationships for improved urban planning. |
Cristian Román-Palacios |
The Effect of Type of Scientific Communication Method on Retaining Math Information Colleges and high schools in the state want to know what factors are impacting graduation rates. I am going to explore on a county and individual high school level, graduation rates in public and private high schools in Virginia. I will look at factors such as race, household income, number below the poverty line, size of school, student/teacher ratio, and location. I will explore these relationships with data visualizations as well and using multiple linear regression, principal component analysis, and factor reduction analysis. I will create visualizations to explain these trends to the wider public as well and find an accurate way to predict high school graduation rate. |
Meaghan Wetherell |
Environmental Racism: Superfund Sites and Native American Reservations The global healthcare landscape is rapidly evolving, with a significant focus on home-based medical devices, particularly blood glucose monitors. In 2022, the market for these devices was valued at USD 12.5 billion, and it's projected to grow at an impressive 8.13% CAGR from 2023 to 2030. The primary drivers include the increasing incidence of diabetes and a growing aging population. The International Diabetes Federation warns that global diabetes cases are set to rise dramatically, from 537 million in 2021 to 643 million by 2030 and a staggering 783 million by 2045. Given this alarming trend, there's a need for a holistic application with visual capabilities to record data generated by home-based medical devices, especially for diabetes management. This project aims to provide a unified platform for individuals to collect, visualize, and manage their health data, ultimately improving the quality of life for millions affected by diabetes worldwide. This project goes beyond glucose tracking, enabling users to record their meals, mood, and personal notes in one place. It aims to simplify wellness management by offering an integrated solution, bridging the gap between data collection and actionable insights. Ultimately, our goal is to empower users to make informed decisions to improve their health. |
Ready to transform your future in information science?
Learn more about the Master of Science in Information Science by contacting us at infosci-grad@arizona.edu, or review the admissions process and begin your application now: