Assignment 4: LOP

Assignment Type:

Research Report, 4 pages max.

Assignment

The goal of this assignment is to:

  • Get some hands-on experience with instance matching
  • Think about missing data, semantics and statistics

Tasks:

  1. Download, install and setup R version 2.15 (unfortunately this version of R is not yet available on the VU lab computers) and Rstudio. We have made a download package available for Windows and OS X at https://www.dropbox.com/s/6e71pzpkgtjmtdp/Setup.zip
  2. Download the tutorial and data from: https://github.com/wrvhage/LinkedScienceTutorial/archive/master.zip and do the Linked Open Piracy (LOP) tutorial. You will find additional instructions in the README.md file.
  3. Load the LOP data and extra GeoNames data about continents into a triple store, for example, Jena Fuseki.
  4. Create a SPARQL query that connects SEM event types described with WordNet 3.0 synsets to the continent containing the SEM place of the event, described in GeoNames.
  5. Create a visualization (e.g., pie chart) of all piracy events (WordNet synset wn30:synset-piracy-noun-1), aggregated per GeoNames continent (http://www.geonames.org/ontology#L.CONT).
  6. Think of a Linked Science experiment: The data set you just used is linked to WordNet and GeoNames (amongst others), can you think of other data sets that you could link to and a corresponding research question that you could then answer? Go into detail about which URIs would have to be aligned with which other URIs and which alignment relations should be used. Explain what the SPARQL query that answers your research question would look like if this alignment would be made.
  7. Not all piracy events have been reported to the International Chamber of Commerce, and therefore do not appear in the LOP data set. The missing events could be very similar or very different from the events currently in the data set. a) When and how would the schema of the LOP data set have to change should very different events be added? b) What kind of impact could such additions have on the conclusions drawn from the data set?

Submission deadline:

Monday 3 December, 23:59 CET

36 Responses to Assignment 4: LOP

  1. margreet says:

    HI, I am installing currently ‘r’.
    So far so good.

    But I am not sure what to do with

    ### OS X (without internet):
    Start RStudio

    install.packages(Sys.glob(“[path to tutorial]/Setup/R packages for OS X/*.tgz”), repos=NULL)

    Should I type this command in the terminal, which I am not used to use. Or should I use the console of R Studio? Also I find the Sys.glob in packages. Maybe I am doing something totally wrong.

    Thanks in advance for any tips.
    Margreet

    • Pim says:

      if you’ve started RStudio (the program which you have installed before) you can type in code in that program. Then you can just copy and paste the command into RStudio, change [path to tutorial] to the path to tutorial in your directory and press enter. If everything is done correctly it should install the packages.

    • Snorre Rubin says:

      In R studio, you have a menu called “Tools”, and under that “install packages”. Then you can choose “install from: “Package Archive File (.tgz)””

    • wrvhage says:

      You run that in the RStudio console.

  2. Mandy says:

    I started my journey on learning SPARQL and R, but I already bumped into the following problem. I tried to load a library and this happend:

    > library(ggmap)
    Loading required package: ggplot2
    Error in loadNamespace(i, c(lib.loc, .libPaths())) :
    there is no package called ‘digest’
    In addition: Warning message:
    package ‘ggmap’ was built under R version 2.15.2
    Error: package ‘ggplot2’ could not be loaded

    I tried to look for this digest, but I have no idea what it means. Anyone any idea?

    • wrvhage says:

      What I think happened is that you installed the package using the package manager without installing the dependencies. Apparently ggmap package needs the digest package, but this was not installed. What you can do is in the menu of RStudio, under Tools, select Install Packages, and select “digest”.
      By the way, I don’t know why it needs digest. Perhaps the ggmap people changed something in their package that requires digest.

  3. Pim says:

    I also got a problem when i tried to perform the following command:
    library(ggmap)
    it kept telling that he could not find certain packages. So i downloaded and installed all the packages which were missing, but it was very annoying. I’m not sure if I did something wrong, or that some packages were missing from the files we got. But anyway for anyone who wants to download the packages at once, i’ve put them online here for download: http://dl.dropbox.com/u/109741845/packages.zip
    The only problem that I got with it was that the map of the events did not have a legend, but that was also mentioned in the tutorial.

    • wrvhage says:

      Thanks Pim. I think the ggmap people failed to put some of the packages they need in the dependency field of their package. This must have happened last week, because I’ve never had problems with this before.

  4. Snorre Rubin says:

    I am getting a different error. When I try to load ggmap i get the error message: “Package ‘ggplot2’ could not be loaded”.

    I have installed it, and reinstalled it several times, all with the same result.
    Does anyone know of any dependencies that ggplot2 might have?

    • Snorre Rubin says:

      When trying to load the package ggplot2 itself, I get the error message that “shared object ‘digest.so’ is not found”.
      Beyond that everything is blank. No mentions of this problem when googling…

      • Mandy says:

        You should download the packages Pim made available and then load them into RStudio. If you want to know the packages you need, just run the library(ggmap) and library(SPARQL) and run it, Then it’ll say which package it needs. Just keep on loading the packages the ggmap askes for and you’ll be fine.

  5. wrvhage says:

    Does anybody else have problems with ggmap after installing the dependencies from Pim’s zip file? I can’t reproduce the error.
    Just in case, you can comment out the library(ggmap) line and the lines that use qmap.

  6. Denise says:

    A small notice, at the tutorial of “The Linked Open Piracy tutorial on semantic/statistical analysis of maritime piracy with the SPARQL Package for R.”, there is a small typing mistake. At the section of “Generalization using rdfs:subClassOf reasoning”, when you call the SPARQL command, you don’t get the right results but rather as a result “Character(0)”. this is because of the missing operator ‘&’ in this command. The right should be res <- SPARQL(endpoint,q,ns=prefix,extra=paste(options,"&entailment=rdfs"))$results.

  7. Jonathan says:

    how can i import geonames into R.Studio? already tried to import the rdf dump (10gig) downloaded from http://download.geonames.org/all-geonames-rdf.zip but without success.

  8. Snorre Rubin says:

    I am still having problems with the programs for assingment 4.
    I installed on another computer, and got a bit further. But now I can’t run fuseki… I have tried runnung “run.sh” in terminal, but that gives me the error message

    “line 2: ./fuseki-server: No such file or directory”.

    Any ideas on what this means?

    In the tutorial I can get as far as defining q, but when I try to define res I get another error, which I don’t know how to interpret:

    res <- SPARQL(endpoint,q,ns=prefix,extra=options)$results
    Error in function (type, msg, asError = TRUE) : couldn't connect to host

  9. Willem van Hage says:

    It means it tries to connect to the running Fuseki server, which is not running.

  10. Anthony Georgiadis says:

    I do have Windows!

    • k douvantzis says:

      Run file run.bat at folder ‘jena-fuseki’ from the Tutorials you downloaded. It should fire up jena server. If not something is wrong with your java installation

  11. Pim says:

    A more specific question about the assignment: Question 5 says that you have to create a visualization of all piracy events aggregated per GeoNames continent. What exactly is meant by this question? Do we have to make a pie chart (or something else), which shows how many piracy events took place in certain continents, or do we have to show how many times the different types of piracy events took place in certain continents?

  12. Kirsten says:

    Hi guys,
    we installed and ran the packages and ran Jena (we ran the batch file). But during the LOP tutorial we get the famous 404 error message anyone an idea how we can fix this?

    res <- SPARQL(endpoint,q,ns=prefix,extra=paste(options,"&entailment=rdfs"))$results
    Error: XML content does not seem to be XML, nor to identify a file name 'Error 404: Not Found

  13. Anthony Georgiadis says:

    Hi!
    I get this error in RStudio: “Error in as.graphicsAnnot(legend) : argument “legend” is missing, with no default”

    Did anyone have the same problem?

Leave a reply to Anthony Georgiadis Cancel reply