Assignment Type:
Research Report, 4 pages max.
Assignment
The goal of this assignment is to:
- Get some hands-on experience with instance matching
- Think about missing data, semantics and statistics
Tasks:
- Download, install and setup R version 2.15 (unfortunately this version of R is not yet available on the VU lab computers) and Rstudio. We have made a download package available for Windows and OS X at https://www.dropbox.com/s/6e71pzpkgtjmtdp/Setup.zip
- Download the tutorial and data from: https://github.com/wrvhage/LinkedScienceTutorial/archive/master.zip and do the Linked Open Piracy (LOP) tutorial. You will find additional instructions in the README.md file.
- Load the LOP data and extra GeoNames data about continents into a triple store, for example, Jena Fuseki.
- Create a SPARQL query that connects SEM event types described with WordNet 3.0 synsets to the continent containing the SEM place of the event, described in GeoNames.
- Create a visualization (e.g., pie chart) of all piracy events (WordNet synset wn30:synset-piracy-noun-1), aggregated per GeoNames continent (http://www.geonames.org/ontology#L.CONT).
- Think of a Linked Science experiment: The data set you just used is linked to WordNet and GeoNames (amongst others), can you think of other data sets that you could link to and a corresponding research question that you could then answer? Go into detail about which URIs would have to be aligned with which other URIs and which alignment relations should be used. Explain what the SPARQL query that answers your research question would look like if this alignment would be made.
- Not all piracy events have been reported to the International Chamber of Commerce, and therefore do not appear in the LOP data set. The missing events could be very similar or very different from the events currently in the data set. a) When and how would the schema of the LOP data set have to change should very different events be added? b) What kind of impact could such additions have on the conclusions drawn from the data set?
Submission deadline:
Monday 3 December, 23:59 CET
HI, I am installing currently ‘r’.
So far so good.
But I am not sure what to do with
### OS X (without internet):
Start RStudio
install.packages(Sys.glob(“[path to tutorial]/Setup/R packages for OS X/*.tgz”), repos=NULL)
Should I type this command in the terminal, which I am not used to use. Or should I use the console of R Studio? Also I find the Sys.glob in packages. Maybe I am doing something totally wrong.
Thanks in advance for any tips.
Margreet
if you’ve started RStudio (the program which you have installed before) you can type in code in that program. Then you can just copy and paste the command into RStudio, change [path to tutorial] to the path to tutorial in your directory and press enter. If everything is done correctly it should install the packages.
In R studio, you have a menu called “Tools”, and under that “install packages”. Then you can choose “install from: “Package Archive File (.tgz)””
You run that in the RStudio console.
I started my journey on learning SPARQL and R, but I already bumped into the following problem. I tried to load a library and this happend:
> library(ggmap)
Loading required package: ggplot2
Error in loadNamespace(i, c(lib.loc, .libPaths())) :
there is no package called ‘digest’
In addition: Warning message:
package ‘ggmap’ was built under R version 2.15.2
Error: package ‘ggplot2’ could not be loaded
I tried to look for this digest, but I have no idea what it means. Anyone any idea?
What I think happened is that you installed the package using the package manager without installing the dependencies. Apparently ggmap package needs the digest package, but this was not installed. What you can do is in the menu of RStudio, under Tools, select Install Packages, and select “digest”.
By the way, I don’t know why it needs digest. Perhaps the ggmap people changed something in their package that requires digest.
I also got a problem when i tried to perform the following command:
library(ggmap)
it kept telling that he could not find certain packages. So i downloaded and installed all the packages which were missing, but it was very annoying. I’m not sure if I did something wrong, or that some packages were missing from the files we got. But anyway for anyone who wants to download the packages at once, i’ve put them online here for download: http://dl.dropbox.com/u/109741845/packages.zip
The only problem that I got with it was that the map of the events did not have a legend, but that was also mentioned in the tutorial.
Thanks Pim. I think the ggmap people failed to put some of the packages they need in the dependency field of their package. This must have happened last week, because I’ve never had problems with this before.
I am getting a different error. When I try to load ggmap i get the error message: “Package ‘ggplot2’ could not be loaded”.
I have installed it, and reinstalled it several times, all with the same result.
Does anyone know of any dependencies that ggplot2 might have?
When trying to load the package ggplot2 itself, I get the error message that “shared object ‘digest.so’ is not found”.
Beyond that everything is blank. No mentions of this problem when googling…
You should download the packages Pim made available and then load them into RStudio. If you want to know the packages you need, just run the library(ggmap) and library(SPARQL) and run it, Then it’ll say which package it needs. Just keep on loading the packages the ggmap askes for and you’ll be fine.
I did that already
Does anybody else have problems with ggmap after installing the dependencies from Pim’s zip file? I can’t reproduce the error.
Just in case, you can comment out the library(ggmap) line and the lines that use qmap.
A small notice, at the tutorial of “The Linked Open Piracy tutorial on semantic/statistical analysis of maritime piracy with the SPARQL Package for R.”, there is a small typing mistake. At the section of “Generalization using rdfs:subClassOf reasoning”, when you call the SPARQL command, you don’t get the right results but rather as a result “Character(0)”. this is because of the missing operator ‘&’ in this command. The right should be res <- SPARQL(endpoint,q,ns=prefix,extra=paste(options,"&entailment=rdfs"))$results.
Thanks!
how can i import geonames into R.Studio? already tried to import the rdf dump (10gig) downloaded from http://download.geonames.org/all-geonames-rdf.zip but without success.
We have the same problem, we have been trying for some time now with no luck
Everything you need is already in the Fuseki store. There’s a selection of GeoNames in there that is sufficient for you to do what you need to do. No need to load all of GeoNames. This selection is also on http://semanticweb.cs.vu.nl/lop/ so you can browse and search through it too, if you like.
Thanks! Browsing through the dataset made it for me a whole lot easier!
I am still having problems with the programs for assingment 4.
I installed on another computer, and got a bit further. But now I can’t run fuseki… I have tried runnung “run.sh” in terminal, but that gives me the error message
“line 2: ./fuseki-server: No such file or directory”.
Any ideas on what this means?
In the tutorial I can get as far as defining q, but when I try to define res I get another error, which I don’t know how to interpret:
res <- SPARQL(endpoint,q,ns=prefix,extra=options)$results
Error in function (type, msg, asError = TRUE) : couldn't connect to host
It means it tries to connect to the running Fuseki server, which is not running.
In my case, while I am trying to get Fuseki run I get:
> cd .\Tutorials\jena-fuseki-0.2.4
Error: unexpected symbol in “cd .”
Any ideas?
If you run Mac or Linux you have to use / instead of \. This is a Windows command.
You need to put this in your command window, not RStudio or something else. Maybe that’s the problem?
I do have Windows!
Run file run.bat at folder ‘jena-fuseki’ from the Tutorials you downloaded. It should fire up jena server. If not something is wrong with your java installation
Thanx Klearche, it was a problem with java.
A more specific question about the assignment: Question 5 says that you have to create a visualization of all piracy events aggregated per GeoNames continent. What exactly is meant by this question? Do we have to make a pie chart (or something else), which shows how many piracy events took place in certain continents, or do we have to show how many times the different types of piracy events took place in certain continents?
We used a stacked bar chart (Second example in the lop tutorial). The only other way is to make a pie chart for every continent…
Hi guys,
we installed and ran the packages and ran Jena (we ran the batch file). But during the LOP tutorial we get the famous 404 error message anyone an idea how we can fix this?
res <- SPARQL(endpoint,q,ns=prefix,extra=paste(options,"&entailment=rdfs"))$results
Error: XML content does not seem to be XML, nor to identify a file name 'Error 404: Not Found
we seemed tohave fixed it, thankyou Chun, by removing the uppercase letters.
I have the same problem… How exactly did you fix this ?
I wouldn’t advice Jena Fuseki because it can be slower than http://semanticweb.cs.vu.nl/lop/sparql/
In the localhost link LOP is spelled with uppercase letters, when we changed it to lower case it worked, I think. Because we tried loads and loads of things before we tried that. I’m on facebook now if you want to ask more questions.
Hi!
I get this error in RStudio: “Error in as.graphicsAnnot(legend) : argument “legend” is missing, with no default”
Did anyone have the same problem?
**I get it when I am trying to create a barchart