Like most big biomedical research projects, DMDD is gathering huge amounts of data. The latest count is more than 5 million images of developing embryos, and thousands of abnormalities (or phenotypes) that have been identified within them.

So we need to be organised. All that time spent collecting and analysing images is wasted if the phenotypes we find aren’t described and recorded in a consistent, unambiguous way – it would be hard for us to share, search or analyse the results. To organise our data properly, we need an ontology.



Imagine your friend invites you over for tea. You might be expecting a hot drink, and perhaps a biscuit or two. But if you grew up in the north of England you could be forgiven for expecting an evening meal. It’s possible that you could even be imagining an afternoon tea of crust-less sandwiches and cream cakes. Without more information in advance it’s hard to tell, as the word ‘tea’ has several different meanings. Then add to the mix that tea as an evening meal can also be described as ‘dinner’ or ‘supper’, and there’s a lot of room for confusion.

If you’re now completely baffled by the intricacies of British mealtimes, Wikipedia has a good explanation! But this does highlight a problem with language. Sometimes the same word can have different meanings, and the same meaning can be expressed with different words.

We can deal with this problem by defining a ‘controlled vocabulary’ where each word has a single, specific meaning. We can also define the relationships between the words, creating a hierarchy of terms that become increasingly specific as we move down the chain. In our simple example, we might say that every time we eat during the day, we’ll call this sustenance. We can categorise sustenance as either a meal or a snack. But then we can then classify the meals even further, by saying that the possible options are called breakfast, lunch or dinner.



An ontology is then the language made up of words from the controlled vocabulary. If we consistently speak the language of the ontology, there’s no longer any room for confusion.



There are already lots of different ontologies out there, so fortunately we don’t need to define our own. For example, the Gene Ontology (GO) has been designed to describe the functions of genes, while the Disease Ontology (DO) exists to describe human diseases.

DMDD data is recorded using the Mammalian Phenotype (MP) ontology, a language developed by the Jackson Lab (JAX) to describe developmental abnormalities (phenotypes) found in mammals. The abnormalities can be described by high-level categories, such as ‘nervous system phenotype’ or ‘respiratory system phenotype’, which are then sub-categorised to give increasing detail. Moving down several levels in the chain we can describe abnormalities as specific as individual nerves that are either missing, misplaced or unusually thin.

Occasionally, we find phenotypes that aren’t described by MP in its current form. When this happens we work with JAX to incorporate the new terms into the ontology. That way the terms are there for everyone to use, and our data is still fully described by the MP ontology.

Without this approach, phenotypes identified by different DMDD members could be inconsistent, ambiguous or conflicting. By sticking to the language of MP we avoid these pitfalls, whilst also making it easy for others to search and analyse the data.