Biosemantic Research Group

Tree data diversity

This AI-generated image showcases what's possible. At the College of Information Science, you can learn how to analyze, manage and lead our transition into an AI-fueled future.

The Biosemantic Research Group is led by Dr. Hong Cui. It focuses on (1) converting factual information from biodiversity literature to computable data, covering research in information extraction, controlled vocabulary/ontology construction, and knowledge modeling, (2) enabling authors to write/record semantically clear phenotypic description/data so the data can be harvested at the time of publication, (3) contributing to data integration efforts under FAIR principles. 

Research Projects:

  • Collaborative Research: Frameworks: Internet of Samples: Toward an Interdisciplinary Cyberinfrasture for Material Samples. NSF-2004562. Aug, 2020 - July 2024.  In collaboration with Columbia University (System for Earth Sample Registration, SESAR), UC-Berkeley, University of Kansas, Open Context (opencontext.org), and Smithsonian National Museum of Natural History. PI's role was transferred to Dr. Thomer in 2022. 
  • ABI innovation: Authors in the driver's seat: fast, consistent, computable phenotype data and ontology production. NSF DBI-1661485. July 2017-Jun 2023. 
  • Collaborative Research: AVATOL - Next Generation Phenomics for the Tree of Life.  NSF DEB-1208567. May 2012- May 2017.
  • Collaborative Research: ABI Development: Exploring Taxon Concepts (ETC) through analyzing fine-grained semantic markup of descriptive literature. NSF DBI-1147266. 7/2012-6/2016. Link.
  • Collaborative Research: Building a Comprehensive Evolutionary History of Flagellate Plants NSF DEB-1541509. Jan 2016- Dec 2019.

Biosemantic Software Tools Online:

  • Character Recorder 
  • When authors create taxonomic descriptions, they first examine a set of specimens and document their characters in a spreadsheet (aka., taxon-by-character matrix). The Character Recorder is the first matrix editor with ontology support that allows the authors to create their characters and populate the matrix by selecting/creating ontology terms. This tool integrates research findings of Measurement Recorder, Add Terms to Ontology, and Color Palette Mining projects. Feedback from biology students and professionals has been positive and the cognitive demands of users using Character Recorder are comparable to those of MS Excel. This proof-of-concept system shows that authors can produce FAIR phenotype data from the beginning. We should not continue publishing new data in legacy formats and then convert data to semantic formats through costly and error-prone post-publication curation.   Demo: https://www.screencast.com/t/AO3paZka2DH 

     

  • Measurement Recorder
  • Allows authors to define and reuse characters that involve some measurements, for example, length of perigynium beak, among other ways, may be measured from the summit of achene to the summit of perigynium beak, including the perigynium teeth.  The software will add landmark terms the user needs to define this character and the defined character itself to a shared ontology so others can use these terms or the character for their characters. The goal of the software is to clearly define measurement methods and to encourage reuse and convergence among community users.

 

  • Description Editor
  • It shares the same goal as other tools for authors, but it focuses on writing taxonomic/non-measurement character descriptions. It utilizes our CharaParser Web API (https://github.com/biosemantics/charaparser-web) to convert user's character descriptions into a matrix format and add/link the entity and quality terms, or characters, to a given ontology. DE features description templates that are accessible to all users, with the goal of reducing redundant work and promoting parallelism in taxonomic descriptions.

 

  • Character Recorder 
  • When authors create taxonomic descriptions, they first examine a set of specimens and document their characters in a spreadsheet. The Character Recorder is a novel spreadsheet with ontology support. The user populates a spreadsheet by selecting ontology terms but also has the freedom to use free text constraints. Demo: https://www.screencast.com/t/AO3paZka2DH 

 

  • Add a term to an ontology experiment site 
  • To evaluate different ways to add terms to ontologies, a set of four methods are included in this experiment site, including the wizard we design and implemented. Demo:https://www.youtube.com/watch?v=6oMmKp4G1Js 

 

  • Explorer of Taxon Concept Toolkit (ETC): http://etc.cs.umb.edu/etcsite 
  • A Web-based application that ​Includes the following tools: (1) Text Capture (charaparser) that extract trait/phenotype characters from taxonomic descriptions of different taxon groups, (2) Ontology Building that facilitates the creation of a phenotype ontology using terms from taxonomic descriptions, (3)Matrix Generation that builds a taxon-by-character matrix from the extracted character data, (4) Key Generation that builds an interactive keys using characters, and (5) Taxonomy Comparison that compare taxon concepts using EULER tools and extracted characters.

 

  • Ontology Term Organizer (OTO): http://biosemantics.arizona.edu/OTO [temporarily offline]
  • A simple web application that allow multiple users to categorize a set of terms by drag and drop terms. This tool is meant to gather consensus from a group of users in order to support the development of a formal ontology. Relationships supported are is_a, part_of, and order (follows/precedes).

 

 

  • CharaParser+EQ: not maintained at this time.

Biosemantic Software Code Repository: