EventMustafa Enes Karaca

Efficient Querying of SBGN Maps Stored in a Graph Database

Graph visualization is an important research area that endeavors to make graphs more understandable and easier to analyze. In various domains, graph visualization techniques and standards are developed to effectively analyze underlying graph based data. Systems Biology Graphical Notation (SBGN) is a standard language for modeling biological processes and pathways through graph visualization. Information about SBGN maps can be stored in XML based SBGN-ML files. libSBGN is a Java/C++ library for reading, writing SBGN-ML and manipulating SBGN maps in an object-oriented manner.

Graph databases store data in terms of a graph structure consisting nodes and their relationships. Performing a computation on graph data stored in a graph database by traversals is more efficient than accessing tabled data in relational databases through costly join operations. Neo4j is a prominent graph database that provides a proprietary language named Cypher for querying stored graph data. Neo4j allows writing user defined procedures in Java as plugins to improve capabilities of Neo4j with third party Java libraries.

With this thesis, we enable modeling SBGN maps in Neo4j graph database with support for compound structures. Using this SBGN data model in Neo4j, we developed graph based user defined procedures in Java using libSBGN as a plugin to Neo4j. These procedures were used to implement graph query algorithms, such as neighborhood, common stream, and paths between, along with helper functions such as populating a database from an SBGN map and loading an SBGN map from a graph database. These user defined procedures are designed to produce or consume SBGN-ML; hence, they can be used by any visualization tool which can import/export SBGN-ML text. Newt, a web based editor for viewing and editing SBGN maps, is such a tool making use of these procedures and hosting a local Neo4j instance by providing a web service to execute Cypher statements.