The Brain Database
- A Proof of Concept
In 1966 Stephen Kuffler established the Department of Neurobiology at Harvard Medical School, and thus, by bringing together physiologists, biochemists and anatomists to focus their efforts on the nervous system, he helped found the discipline of neuroscience. Several decades later, more than 50,000 neuroscientists worldwide study everything from individual molecules to complex behaviors in species from nematode worms to humans [3]. Together they fill more than 300 journals, having created one of the largest, most unmanageable datasets in science.
"The sharing, distribution, organization, and managing of data, models, and tools are essential elements in the path from sheer information to knowledge and understanding in neuroscience" [1]. In catering to this need, the field of neuroinformatics has emerged. Neuroinformatics is the information science infrastructure of neuroscience. It relates to the tools, databases, models and mechanisms of information flow that serve all of the clinical and research efforts in this field. There have been several attempts to create neuroinformatic systems, but none seem to have reached wide spread adoption yet.
This is especially true in the in the area of cognitive modeling where the goal is to develop computational models of cognitive processes. When these models claim to parallel processes in the brain, it would be useful if this could be tested in an automatic way by invoking some form of neuroinformatic database. In principle, many structural and functional claims about a model could be automatically validated against neuroscientific data if both the model and the data were represented in form suitable for an inference engine.
Ikaros
In the Ikaros project at Lund University Cognitive Science [2], we aim to develop an open infrastructure for system-level modeling of the brain including databases of experimental data, models and structural and functional brain data. A core component is the Ikaros kernel that allows platform independent execution of models and communication with experimental databases as well as external devices such as sensors and actuators. Simulations are controlled by experiment files in XML format that contain structural information about the models in the form of individual components and interactions between them.
The Semantic Web
The Semantic Web is a vision in which Web resources are machine-processable, where the information can be shared and processed both by automated tools and by humans. Semantic mark-up is central to this sharing of information. For different agents to understand the terms used, ontologies are needed in which the terms are described. An ontology is basically a collection of definitions of concepts. Well-designed, well-defined, and Web-compatible ontology languages with supporting reasoning tools are needed. Recently, both the Resource Description Framework (RDF) and the Web Ontology Language (OWL) have become World Wide Web Consortium (W3C) recommendations. RDF is used to represent information and exchanging knowledge on the Web, while OWL is used for publishing and sharing ontologies.
The Semantic Web is the product of many different desires and influences, where the aim is to make better use of the Web. Marshall and Shipman [4] outline three main influences. One is anxiety over the disorder of digital documents. Another comes from Artificial Intelligence and its maturing sense of the kinds of computations that can take place given formal representations [4]. There is also a "utopian desire to offload the burden of information overload and the complexity of everyday life onto the computer" [4]. All these desires are justified, but which are realistic? What are appropriate expectations for the Semantic Web?
Testing Cognitive Models
To what extent can ideas and techniques from the possibly developing semantic web be used for testing cognitive models against neuroscientific data? Data will consist of neuroanatomical connection data from published articles, to be encoded in manner appropriate to the semantic web techniques. The first part of this work is to decide on how to encode the data. During and after that process, an suitable inference engine has to be found. Once the data is encoded and brought to cooperate with the inference engine, this technique will be tested with direct questions and with parts of cognitive models from the Ikaros project.
A combination of RDF and OWL seems to be an appropriate way of encoding the data. Protégé [5] , a knowledge-base program developed at Stanford medical, has an OWL-plugin that was used. The experiment files from Ikaros can be transformed to these formats using XSLT. Initially, several reasoners were looked into. However, these were abandoned due to technical problems and shortcomings. Instead, the Jena API [6] was seen as a good option.
Jena is a Java framework for building Semantic Web applications. It was decided that Jena would be a good choice for several reasons. It has methods for reading RDF and OWL and making an internal graph representation. It has inference support, and though it is only OWL Lite, it should be enough for this application. Further, adding rules is fairly easy. The RDQL (RDF Data Query Language) in Jena was also very useful. Finally, the finished product is more self-contained in this way.
This is an interesting and important topic of investigation for several reasons. One part of it is the relevance of testing and applying semantic web ideas in order to find out what the concept has to offer, what possible limitations exist, as well as what might be good areas of application. Further, what relevance might the semantic web have for neuroinformatics? And how can the techniques to be investigated further our understanding by showing strengths and weaknesses in cognitive models?
A Pilot Implementation
In this attempt to apply semantic web techniques for testing cogntive models against neuroscientific data, the ontology development process described above was used as a general framework in part of the implementation process. The questions of scope and aim are to some extent already answered in the problem formulation. The domain the ontology will cover is that of brain structures and connections between them, as well as who claims that these connections exist. The ontology is going to be used for finding what connections in cognitive models have support from research about connections. Given a file describing an model, the ontology should be able to output a list of the proposed connections along with references to articles that provide support for these connections. The question of who will use and maintain the ontology is not within the scope of this work. If a working system was presented, then it would be used by developers of cognitive models. The maintainment of it is a more complicated issue, which will be discussed later.
Reusing existing ontologies were considered, but none suitable for this work was found. There are ontologies describing biological systems and human anatomy in general, but these are very large and not tuned for this task. Upon deciding against reuse, important terms were enumerated. Of course, there are brain structures, such as the cortex, amygdala, hipposcampus, and hypothalamus. There are also the parts that these structures can be divided into. The list of brainstructures for this ontology is not exhaustive in any way, and largely tuned to some of the structures that the Ikaros project deals with. Also, there are articles and statements about connections.
The main classes defined are Brainstructure, ConnectionStatement, and Article. A Brainstructure can have the property of being part of another Brainstructure. A ConnectionStatement has the properties sourceModule and targetModule which are Brainstructures. There is also a property stating which Article that claims there is such a connection. The Article has a property stating what ConnectionStatements it makes. Most of the work has been creating the instances. For example, Amygdala is a BrainStructure and AmygdalaMedialNucleus is a Brainstructure which is part of Amygdala. Also, a ConnectionStatement with sourceModule AmygdalaBasallNucleus and targetModule HypothalamusLateral, claimed by the Article Pitkanen2000.
A combination of RDF and OWL was seen early on to be an appropriate way of encoding the data. At first, Prot eg e and its OWL-plugin were used for this. While it produced seemingly correct RDF, it was very disorganized and unpredictable. Protégé was therefore abandoned and the produced RDF was further edited in a text editor. Examples of parts of the OWL and RDF data used in the program can be seen below.
Figure 1. Overview of the implemented system.
Initially, both RACER and Jess were investigated for use as reasoners. However, RACER was abandoned due to technical difficulties and due to the fact that OWL Lite was decided to be enough for this initial application (RACER supports OWL DL). Jess was not used for several reasons. The query support was not as developed as was hoped, Prot eg e crashed more often when using Jess, and it was finally decided against using Prot eg e at all. Instead, the Jena API was seen as a good option.
Jena is, as mentioned above, a Java framework for building Semantic Web applications. A fairly thorough API is provided along with several tutorials. It was decided that Jena would be a good choice for several reasons. It has methods for reading RDF and OWL and making an internal graph representation. It has inference support, and though it is only OWL Lite, it should be enough for this application. Further, adding rules is fairly easy. The RDQL query language was also very useful. Finally, the finished product is more self-contained in this way. The program can be started at the command line with the models file as a parameter, and a list of statements with supporting articles is printed. The experiment files from Ikaros is transformed to the appropriate format using XSLT (eXtensible Stylesheet Language Transformations). The program consists of the classes BrainRuleReasoner and ModelValidator. A block schema of the process is shown in The BrainRuleReasoner class is an extension of the GeneralRuleReasoner interface with Jenas OWL rules with the addition of a few rules specific to the brain context. The most important of these rules are those stating that
if A isConnectedTo B and B isPartOf C then A isConnectedTo C
and similary for when A isPartOf C. The ModelValidator class creates an inference model from the reasoner and the OWL and RDF files described above. For each connection in a cognitive model file, RDQL is used to find if there is any support for the connection in the connection data collected from articles. For example, if the model has a connection from the amygdala to the hypothalamus, then this could be supported by an article claiming that there is a connection from a part of the amygdala to the hypothalamus, even though there is no explicit statement of a connection from the amygdala to the hypothalamus. Results for a proposed connection looks like this:
The statement that Amygdala is connected to HypothalamusLateral is FOUNDED, as: Pitkanen2000 claims that AmygdalaMedialNucleus is connected to HypothalamusLateral and AmygdalaMedialNucleus is part of Amygdala Pitkanen2000 claims that AmygdalaCentralNucleus is connected to HypothalamusLateral and AmygdalaCentralNucleus is part of Amygdala Pitkanen2000 claims that AmygdalaLateralNucleus is connected to HypothalamusLateral and AmygdalaLateralNucleus is part of Amygdala
If, on the other hand, no support is found, the following result is given:
The statement that Thalamus is connected to Amygdala is UNSUPPORTED by available data
It should be noted that in reality, there are connections between the thalamus and the amygdala. However, at the time of this printout, there was no data about any connection of that kind in the knowledge base.
Conclusion
Although we aimed to and developed a minimal system as a proof of concept, many questions remain unanswered. Who will enter data into the knowledge base for a full scale system? One possibility arising with Semantic Web techniques is that the researcher could publish meta data about their findings on their own, using some GUI adapted to the task, then either register it or have it discovered. Making this a reality would require a large scale agreement on the methods and formats used to represent data and there would, of course, be huge problems of verification and credit. Fortunately, these problems lies outside the scope of the current investigation.
Example Files
References
- G. Ascoli, E. De Schutter, and D. Kennedy. An Information Science Infrastructure for Neuroscience. Neuroinformatics, 1:1-2, 2003.
- C. Balkenius and J. Morén. From Isolated Components to Cognitive Systems. ERCIM News, p 16, April 2003.
- M. Chicurel. Databasing the Brain. Nature, 406:822, 2000.
- Catherine C. Marshall and Frank M. Shipman. Which Semantic Web? In Proceedings of the fourteenth ACM conference on Hypertext and hypermedia, 2003.
- Natalya F. Noy and Deborah L. McGuinness. Ontology Development 101: A Guide to Creating Your First Ontology. Stanford Knowledge Systems Laboratory Technical Report, 2001.
- Hewlett-Packard Labs Semantic Web Research. Jena. Retrieved February 20, 2004. http://www.hpl.hp.com/semweb/.
Publications
- Gustafsson, M. (2004). Using Semantic Web Techniques for Validation of Cognitive Models against Neuroscientific Data. Lund: Dept. of Computer Science. B Sci Thesis [PDF]
- Gustafsson, M., and Balkenius, C. (2004). Using semantic web techniques for validation of cognitive models against neuroscientific data (Abstract). In Proceedings of AILS '04. Lund: Dept. of Computer Science. [PDF]
blog comments powered by Disqus