Assignment Type:
Research Report, 4 pages max.
Assignment:
Aligning ontologies is one of most important tasks in practical ontology engineering. We cannot enforce one view of the world, even in restricted domains such as geographical information. Instead we can try to “align” ontologies: specifying a set of semantic relations between ontologies (see the lecture). Such semantic relations are typically incomplete. The objective of this assignment is to get some practical experience with ontology alignment.
Two geographical vocabularies:
TGN
The Thesaurus of Geographic Names (TGN) represents a large information repository about places on earth. It is constructed as a big part-of structure with World
as the top node of the hierarchy.
Full technical information about the data model of TGN can be found at the download page of the three Getty thesauri. Sample data and the data model are available in XML and relation-table format. This information is not easy to read (but this is typical of most real-life situations). To make the assignment easier, a simplified annotated data model of TGN is included in the appendix below. This data model also shows the scope that the OWL specification should cover. More background information on the TGN can be found at the Getty website, but be aware that the scope of the documentation there is broader.
Geonames
Geonames is a large geographical vocabulary which is available in various formats, including RDF. The data model of Geonames can be derived fromhttp://www.geonames.org/ontology/ontology_v2.0_Lite.rdf and example records such as:
- the river Danube http://sws.geonames.org/791630/about.rdf
- Delft (populated place) http://sws.geonames.org/2757345/about.rdf, with parent features:
- Gemeente Delft: http://sws.geonames.org/2757344/about.rdf
- Provincie Zuid Holland http://sws.geonames.org/2743698/about.rdf
- Kingdom of the Netherlands http://sws.geonames.org/2750405/about.rdf
- Provincie Zuid Holland http://sws.geonames.org/2743698/about.rdf
- Gemeente Delft: http://sws.geonames.org/2757344/about.rdf
Assignment tasks:
- Study the information about the two vocabularies and construct/assemble two OWL models of the following information:
- places, place names, place types
- geographical positions
- part-of hierarchy(ies) of places
- other inter-place relations, if any
NOTE1: Concentrate the main geographical information in the vocabularies. You don’t have to model all the information in the vocabulary, only what is represented in Appendix A.
NOTE2: Focus on modeling the data model (i.e the information represented in Appendix A). You can add a few instances (e.g. Paris is part of Something) for clarity.
- Describe possible alignments between the two vocabularies on the “class” level. Indicate which information is present in one vocabulary and not in the other. Construct an OWL specification of the alignments you identified. Provide adequate explanatory text for the major decisions.
- Describe possible alignments between the two vocabularies on the “property” level. Indicate which information is present in one vocabulary and not in the other. Construct an OWL specification of the alignments you identified. Provide adequate explanatory text for the major decisions.
- Literature Question:
Discuss alignment at the instance level: how can we establish that two particular place (e.g., two “Paris” instances) are the same? How precise do you think an automatic instance-alignment technique would be?
Hints and tips:
- Think carefully about the nature of the part-whole hierarchies in the two vocabularies; consult the course information about such relationships.
- If you lack adequate information about the meaning of a particular construct, it is OK to make an “educated guess”:;motivate why you took a particular view and document the decision in the accompanying text document.
- It is is OK if you propose more appropriate names, but record this explicitly as a modeling decision.
Target Results:
A PDF file containing:
- OWL specification of TGN and Geonames.
- OWL specification of mappings between classes.
- OWL specification of mappings between properties.
- Text with explanation of the above three steps.
- Text with discussion of the literature question.
The OWL specifications can be send as files or as links.
Submission deadline:
Monday 26 November, 23:59 CET
Appendix A: Simplified data model of TGN:
This appendix contains a simplified data model of TGN, covering the part of TGN that is within the scope of this assignment. The record structure is a simplified form of the original record structure. We use the following conventions in describing the record structure:
[] optional element {}* list of 0-n elements {}+ list of 1-n elements ->; points to another record : datatype of the element
TGN consists of a large collection of Subject
(records each representing information about a particular place:
- Each place has a unique ID and a textual description with background information about the place.
- TGN uses the “Parent” relationship to represent a part-whole hierarchy. In fact, there are two of such hierarchies, one topological hierarchy between physical places and a nation/town-based hierarchy between administrative places. Some places only occur in the administrative or topological hierarchy, (e.g. North-Holland is administrative, the Mont Blanc is physical), other places can be both (e.g. Europe).
- Each place has a preferred term for referring to it (e.g. “Gorinchem”) and possibly alternative term (e.g. “Gorkum”).
- Each place has at least one
PlaceType
, such asNation
orRiver
. - Places can also be associated to other places. TGN has a possibility to type this relation, but you can omit this for the assignment.
Subject SubjectID : integer minInclusive 1000000 maxInclusive 699999999 DescriptiveNote : string PreferredParent ->; Subject {NonPreferredParent}* ->; Subject RecordType: {Administrative Physical Both} PreferredTerm ->; Term {NonPreferredTerm}* ->; Term PreferredPlaceType ->; PlaceType {NonPreferredPlaceType}* ->; PlaceType {AssociativeRelation}* ->; Subject StandardCoordinates ->; StandardCoordinates [BoundedCoordinates] ->; BoundedCoordinates
For each place there can be multiple terms to refer to it. TGN keeps information about both the terms used nowadays (Current
) and previously-used terms. For the latter a start/end date can be recorded. Both noun terms (“Europe”) as well as adjective terms (“European”) are provided by TGN. The attribute Vernacular
indicates whether this is a term from the native language of the place (e.g. “Den Haag” is vernacular; “The Hague” is not).
Term TermId : integer TermText : string TermType : {Noun Adjectival Both} HistoricFlag : {Current Historical} Language : ISO language code [StartDate] : year [EndDate] : year Vernacular: yes/no
Place types are ordered in a taxonomy:
PlaceType Parent ->; PlaceType
For each place standard coordinates are indicated, such as latitude and longitude. Optionally (see the Subject
record structure), information about the bounding coordinates is provided:
Standardcoordinates SatndardLatitude : decimal StandardLongitude : decimal [ElevationMeters] : decimal
BoundedCoordinates
BoundingLatitudeLeast : decimal
BoundingLatitudeMost : decimal
BoundingLongitudeLeast : decimal
BoundingLongitudeMost : decimal
Am I mad or is Appendix A missing?
Ok, I understand now. I thought that the appendix part which would tell us what to focus on in TGN was missing, but at a closer read it seems that we should focus on the place types only.
Can we use UML Class Diagrams instead of OWL Protégé specifications?
I believe this class is more about the “what/why” than the “how”, and trying to use OWL would force me to focus on the “how”, once more. Since my team lost most of its points on misuses of OWL and trouble with Protégé, I would like to use UML which I know far better than OWL… Is that allowed?
(I’ve hear that Willem said it was OK to a group, but I didn’t hear it myself and I’m not sure if it’s also true for Assignment 3)
Yes that’s allowed.
This exact comment has been sent to our teachers as an email. We also asked how alignement could then represented in UML.
Willem answered the following:
“I’m fine with UML.
Willem”
Oeps, didn’t see Marieke’s answer before sending my comment. Sorry!
The link for the Download page of TGN seems to be broken.
When I click the following link:
http://www.getty.edu/research/conducting_research/vocabularies/download.html
I get redirected to this adress:
http://www.getty.edu/research/tools/vocabularies/index.html
I found it here: http://www.getty.edu/research/tools/vocabularies/obtain/download.html
Whole lot of good that did me…
This link worked for us:
http://www.getty.edu/research/tools/vocabularies/tgn/tgn_xml_utf8_sample11.zip
Thanks!
Can we have a mini-tip on how to model the alignment with OWL/Protégé?
I feel like I finally understood how to use Protégé and create the two Ontologies (TGN and Geonames). Then, the part “Describe the alignments…” seems very doable, but I have no clue on how to tackle the part “Construct an OWL specification of the alignments…” ! 😦
Do we have to use a Protégé plugin?
Snorre (also in my group) found the NeON toolkit for ontology edition which apparently has a Alignment plugin… is that a good way to go?
I feel more confortable with UML than OWL for creating the model corresponding to each vocabulary… but I would then have the same issue: no idea how to “construct the alignments specification”.
I am a bit confused about where the hierarchies can be found! The zip file contains xml files. It these that we are going to use as a start for creating the ontology in Protégé? And regarding GeoNames, the rdf file highlighted in the description is it the one that we should use?
Here is what I did. Remember that I’m a student and might be completely wrong, but for the first question at least I feel like a finally got interesting results. I’m just explaining here how I did it to hopefully help you figure out something…
# For creating the OWL ontology corresponding to TGN:
I simply read the Appendix A in the assignment. The reading/understanding/selecting work has already been done for us there, so if you just apply the Appendix A explanation I believe that making the ontology will be quite straightforward.
The OWL classes (yellow in Protégé) are described: Subject, Term, PlaceType, StandardCoordinates and BoundedCoordinates ; and you are free to rename them or add one, if you explain your choice in the report. The “:” presents datatypes of elements, that is the data properties (green) of the classes. The “->;” presents a link to another record, that is the object properties (blue) of the classes. The rest of the syntax simply tells you if a property might be used more than once, is mandatory, etc.
To create instances to try and clarify my ontology, I looked for terms in the Thesaurus (http://www.getty.edu/research/tools/vocabularies/tgn/index.html) and tried to represent the “subjects” I would find within my ontology.
# For creating the OWL ontology corresponding to Geonames:
I started from the existing ontology given in the assignment text: http://www.geonames.org/ontology/ontology_v2.0_Lite.rdf If you download this file and open it in Protégé (Hint: when you have an ontology Open and select File > Open, you are asked if you want to open in the current window. Say “No” to get two Protégé windows with two ontologies opened at the same time!), you will be able to see the basic structure of the ontology. I removed a few classes which did not correspond to anything in the previous TGN ontology (e.g. Wikipedia articles), and created my own version of this ontology.
I could have used the other RDF examples given in the assignment, but I preferred to look for my own examples at this adress: http://www.geonames.org/ (because in both TGN and Geonames I wanted to represent the same examples…). Looking at the RDF files (in a text editor, Protégé did not help me with that) I could figure out some missing Data Properties and Object Properties in my Ontology.
Using the same RDF examples, I created all the instances I wanted in my ontology to test and clarify the model…
I hope this will help you (and others) continue on the assignment. I also hope that the teachers won’t consider that I gave away part of the answers, since I made a real effort on explaining only the “how”. This way we can all focus on the “why/what”, which is what Willem explained we should do!
About the alignments, my group is almost done one writing the report part about these (explain which classes (question 2), properties (question 3) or instances (question 4) could be aligned… but I still have no precise clue on how to construct the OWL ontologies for question 2 and 3. I guess we are supposed to create the alignment classes with the right properties, but some hints & tips from Willem or Marieke would be more than helpful…
thanks julien for this explanation
Dear Marieke and Willem,
Will assignment 4 be available soon? I am planning to make a start on it this weekend.
This book might be helpfull: http://centurion.dynalias.com/w/_media/programming/semantic_web/semantic_web_for_the_working_ontologist.pdf
When will the grades/feedback be out?
You will receive the feedback on Assignment 3 on Wednesday.
I’ve not received any feedback yet, did i miss this e-mail?
Same here, no feedback yet. Not in spam folder either.