Assignment 3: Alignment TGN-GeoNames

Assignment Type:

Research Report, 4 pages max.

Assignment:

Aligning ontologies is one of most important tasks in practical ontology engineering. We cannot enforce one view of the world, even in restricted domains such as geographical information. Instead we can try to “align” ontologies: specifying a set of semantic relations between ontologies (see the lecture). Such semantic relations are typically incomplete. The objective of this assignment is to get some practical experience with ontology alignment.

Two geographical vocabularies:

TGN

The Thesaurus of Geographic Names (TGN) represents a large information repository about places on earth. It is constructed as a big part-of structure with World as the top node of the hierarchy.

Full technical information about the data model of TGN can be found at the download page of the three Getty thesauri. Sample data and the data model are available in XML and relation-table format. This information is not easy to read (but this is typical of most real-life situations). To make the assignment easier, a simplified annotated data model of TGN is included in the appendix below. This data model also shows the scope that the OWL specification should cover. More background information on the TGN can be found at the Getty website, but be aware that the scope of the documentation there is broader.

Geonames

Geonames is a large geographical vocabulary which is available in various formats, including RDF. The data model of Geonames can be derived fromhttp://www.geonames.org/ontology/ontology_v2.0_Lite.rdf and example records such as:

Assignment tasks:

  1. Study the information about the two vocabularies and construct/assemble two OWL models of the following information:
    • places, place names, place types
    • geographical positions
    • part-of hierarchy(ies) of places
    • other inter-place relations, if any

    NOTE1: Concentrate the main geographical information in the vocabularies. You don’t have to model all the information in the vocabulary, only what is represented in Appendix A.

    NOTE2: Focus on modeling the data model (i.e the information represented in Appendix A). You can add a few instances (e.g. Paris is part of Something) for clarity.

  2. Describe possible alignments between the two vocabularies on the “class” level. Indicate which information is present in one vocabulary and not in the other. Construct an OWL specification of the alignments you identified. Provide adequate explanatory text for the major decisions.
  3. Describe possible alignments between the two vocabularies on the “property” level. Indicate which information is present in one vocabulary and not in the other. Construct an OWL specification of the alignments you identified. Provide adequate explanatory text for the major decisions.
  4. Literature Question:
    Discuss alignment at the instance level: how can we establish that two particular place (e.g., two “Paris” instances) are the same? How precise do you think an automatic instance-alignment technique would be?

Hints and tips:

  • Think carefully about the nature of the part-whole hierarchies in the two vocabularies; consult the course information about such relationships.
  • If you lack adequate information about the meaning of a particular construct, it is OK to make an “educated guess”:;motivate why you took a particular view and document the decision in the accompanying text document.
  • It is is OK if you propose more appropriate names, but record this explicitly as a modeling decision.

Target Results:

A PDF file containing:

  • OWL specification of TGN and Geonames.
  • OWL specification of mappings between classes.
  • OWL specification of mappings between properties.
  • Text with explanation of the above three steps.
  • Text with discussion of the literature question.

The OWL specifications can be send as files or as links.

Submission deadline:

Monday 26 November, 23:59 CET

Appendix A: Simplified data model of TGN:

This appendix contains a simplified data model of TGN, covering the part of TGN that is within the scope of this assignment. The record structure is a simplified form of the original record structure. We use the following conventions in describing the record structure:

   []       optional element
    {}*   list of 0-n elements
    {}+   list of 1-n elements
    ->;       points to another record
    :       datatype of the element

TGN consists of a large collection of Subject (records each representing information about a particular place:

  • Each place has a unique ID and a textual description with background information about the place.
  • TGN uses the “Parent” relationship to represent a part-whole hierarchy. In fact, there are two of such hierarchies, one topological hierarchy between physical places and a nation/town-based hierarchy between administrative places. Some places only occur in the administrative or topological hierarchy, (e.g. North-Holland is administrative, the Mont Blanc is physical), other places can be both (e.g. Europe).
  • Each place has a preferred term for referring to it (e.g. “Gorinchem”) and possibly alternative term (e.g. “Gorkum”).
  • Each place has at least one PlaceType, such as Nation or River.
  • Places can also be associated to other places. TGN has a possibility to type this relation, but you can omit this for the assignment.
 Subject
    SubjectID : integer minInclusive 1000000 maxInclusive 699999999
    DescriptiveNote : string
    PreferredParent ->; Subject
    {NonPreferredParent}* ->; Subject
    RecordType: {Administrative Physical Both}
    PreferredTerm ->; Term
    {NonPreferredTerm}* ->; Term
    PreferredPlaceType ->; PlaceType
    {NonPreferredPlaceType}* ->; PlaceType
    {AssociativeRelation}* ->; Subject
    StandardCoordinates ->; StandardCoordinates
    [BoundedCoordinates] ->; BoundedCoordinates

For each place there can be multiple terms to refer to it. TGN keeps information about both the terms used nowadays (Current) and previously-used terms. For the latter a start/end date can be recorded. Both noun terms (“Europe”) as well as adjective terms (“European”) are provided by TGN. The attribute Vernacular indicates whether this is a term from the native language of the place (e.g. “Den Haag” is vernacular; “The Hague” is not).

 Term
    TermId : integer
    TermText : string
    TermType : {Noun Adjectival Both}
    HistoricFlag : {Current Historical}
    Language : ISO language code
    [StartDate] : year
    [EndDate] : year
    Vernacular: yes/no

Place types are ordered in a taxonomy:

 PlaceType
    Parent ->; PlaceType

For each place standard coordinates are indicated, such as latitude and longitude. Optionally (see the Subject record structure), information about the bounding coordinates is provided:

 Standardcoordinates
    SatndardLatitude : decimal
    StandardLongitude : decimal
    [ElevationMeters] : decimal

BoundedCoordinates
BoundingLatitudeLeast : decimal
BoundingLatitudeMost : decimal
BoundingLongitudeLeast : decimal
BoundingLongitudeMost : decimal

20 Responses to Assignment 3: Alignment TGN-GeoNames

  1. Astrid says:

    Am I mad or is Appendix A missing?

  2. Astrid says:

    Ok, I understand now. I thought that the appendix part which would tell us what to focus on in TGN was missing, but at a closer read it seems that we should focus on the place types only.

  3. Julien L. says:

    Can we use UML Class Diagrams instead of OWL Protégé specifications?

    I believe this class is more about the “what/why” than the “how”, and trying to use OWL would force me to focus on the “how”, once more. Since my team lost most of its points on misuses of OWL and trouble with Protégé, I would like to use UML which I know far better than OWL… Is that allowed?

    (I’ve hear that Willem said it was OK to a group, but I didn’t hear it myself and I’m not sure if it’s also true for Assignment 3)

  4. Julien L. says:

    The link for the Download page of TGN seems to be broken.

    When I click the following link:
    http://www.getty.edu/research/conducting_research/vocabularies/download.html

    I get redirected to this adress:
    http://www.getty.edu/research/tools/vocabularies/index.html

  5. Julien L. says:

    Can we have a mini-tip on how to model the alignment with OWL/Protégé?

    I feel like I finally understood how to use Protégé and create the two Ontologies (TGN and Geonames). Then, the part “Describe the alignments…” seems very doable, but I have no clue on how to tackle the part “Construct an OWL specification of the alignments…” ! 😦

    Do we have to use a Protégé plugin?
    Snorre (also in my group) found the NeON toolkit for ontology edition which apparently has a Alignment plugin… is that a good way to go?

    I feel more confortable with UML than OWL for creating the model corresponding to each vocabulary… but I would then have the same issue: no idea how to “construct the alignments specification”.

  6. Anthony Georgiadis says:

    I am a bit confused about where the hierarchies can be found! The zip file contains xml files. It these that we are going to use as a start for creating the ontology in Protégé? And regarding GeoNames, the rdf file highlighted in the description is it the one that we should use?

  7. Julien L. says:

    Here is what I did. Remember that I’m a student and might be completely wrong, but for the first question at least I feel like a finally got interesting results. I’m just explaining here how I did it to hopefully help you figure out something…

    # For creating the OWL ontology corresponding to TGN:

    I simply read the Appendix A in the assignment. The reading/understanding/selecting work has already been done for us there, so if you just apply the Appendix A explanation I believe that making the ontology will be quite straightforward.

    The OWL classes (yellow in Protégé) are described: Subject, Term, PlaceType, StandardCoordinates and BoundedCoordinates ; and you are free to rename them or add one, if you explain your choice in the report. The “:” presents datatypes of elements, that is the data properties (green) of the classes. The “->;” presents a link to another record, that is the object properties (blue) of the classes. The rest of the syntax simply tells you if a property might be used more than once, is mandatory, etc.

    To create instances to try and clarify my ontology, I looked for terms in the Thesaurus (http://www.getty.edu/research/tools/vocabularies/tgn/index.html) and tried to represent the “subjects” I would find within my ontology.

    # For creating the OWL ontology corresponding to Geonames:

    I started from the existing ontology given in the assignment text: http://www.geonames.org/ontology/ontology_v2.0_Lite.rdf If you download this file and open it in Protégé (Hint: when you have an ontology Open and select File > Open, you are asked if you want to open in the current window. Say “No” to get two Protégé windows with two ontologies opened at the same time!), you will be able to see the basic structure of the ontology. I removed a few classes which did not correspond to anything in the previous TGN ontology (e.g. Wikipedia articles), and created my own version of this ontology.

    I could have used the other RDF examples given in the assignment, but I preferred to look for my own examples at this adress: http://www.geonames.org/ (because in both TGN and Geonames I wanted to represent the same examples…). Looking at the RDF files (in a text editor, Protégé did not help me with that) I could figure out some missing Data Properties and Object Properties in my Ontology.

    Using the same RDF examples, I created all the instances I wanted in my ontology to test and clarify the model…

    I hope this will help you (and others) continue on the assignment. I also hope that the teachers won’t consider that I gave away part of the answers, since I made a real effort on explaining only the “how”. This way we can all focus on the “why/what”, which is what Willem explained we should do!

    About the alignments, my group is almost done one writing the report part about these (explain which classes (question 2), properties (question 3) or instances (question 4) could be aligned… but I still have no precise clue on how to construct the OWL ontologies for question 2 and 3. I guess we are supposed to create the alignment classes with the right properties, but some hints & tips from Willem or Marieke would be more than helpful…

  8. Astrid says:

    Dear Marieke and Willem,
    Will assignment 4 be available soon? I am planning to make a start on it this weekend.

  9. Snorre Rubin says:

    When will the grades/feedback be out?

Leave a comment