OLiA

From NLP2RDF-Wiki
Jump to: navigation, search

OLiA transformations will also be available as part of the Infrastructure for NIF

This is the current OLiA homepage: http://purl.org/olia

Contents

Transforming Tags into URIs

  1. OLiA Annotation Models provide properties such as hasTag and hasTagStartingWith, that can be used to map tags to URIs. For a full list of such properties look at http://purl.org/olia/system.owl
  2. You need to download the correct annotation model for your tag set
  3. An overview can be found here: http://purl.org/olia

Example: OpenNLP, Penn Tag Set and linking to OLiA individuals

Given:

If you have a look at the Penn-OLiA Linking Model you will see, that it imports both other models:


  ...
  ...
  
  

Your tool will assign tags like this:

My/PRP$ dog/NN also/RB likes/VBZ eating/VBG sausage/NN ./.

For NIF you will have to resolve PRP$, NN, RB, VBZ, VBG and NN to the OLiA individuals found in the Annotation Model of the Penn Tag Set: http://purl.org/olia/penn.owl

Option: Using Unix command line

This solution is fast and easy, but might have side effects and anomalies. You can download a script here from the OLiA repository.

rapper -i rdfxml http://purl.org/olia/penn.owl  | \
grep '>\s\s\"' |\
sed 's/\^\^//' | \
cut -f1,3 -d '>' | \
sed 's//;s/> "/\t/;s/" .//' | \
awk 'BEGIN {FS=OFS="\t"}{t=$1;$1=$2;$2=t;print}'

Output (tab separated):

CC      http://purl.org/olia/penn.owl#CC
CD      http://purl.org/olia/penn.owl#CD
DT      http://purl.org/olia/penn.owl#DT
EX      http://purl.org/olia/penn.owl#EX
PP$     http://purl.org/olia/penn.owl#PPpossessive
PRP     http://purl.org/olia/penn.owl#PRP
PRP$    http://purl.org/olia/penn.owl#PRPpossessive
...

Option: Apache Jena

  1. Load the annotation model into http://jena.apache.org/
  2. Iterate over all triples and find the ones with the property: http://purl.org/olia/system.owl#hasTag
  3. Add the object to a HashMap:
Map<String,String>() penn = new HashMap<String,String>() ;
penn.add("PRP$", "http://purl.org/olia/penn.owl#PRPpossessive") ;
Note: You can also use the pregenerated Java Class: http://olia.nlp2rdf.org/owl/Penn.java

Option: via SPARQL

In the future, we will provide a SPARQL endpoint for this. See Infrastructure.

 SELECT ?s FROM <http://purl.org/olia/penn.owl> WHERE { ?s <http://purl.org/olia/system.owl#hasTag> "PRP$" }

Output

# Sentence was: My/PRP$ dog/NN also/RB likes/VBZ eating/VBG sausage/NN ./.

 sso:posTag "PRP$" .
 sso:olialink <http://purl.org/olia/penn.owl#PRPpossessive> .
 sso:posTag "NN" .
 sso:olialink <http://purl.org/olia/penn.owl#NN> .

 # Sentence was: My/PRP$ dog/NN also/RB likes/VBZ eating/VBG sausage/NN ./.
  sso:posTag "PRP$" .
  sso:olialink  .
  sso:posTag "NN" .
  sso:olialink  .

If your tool now assigns a Tag "CC" to an "and" in a sentence (Note: CC is a Coordinating Conjunction in Penn), you can load the Linking Model and its imports (libs like Jena will take care to fetch the transitive closure ).

TODO add Jena code

you will get: http://purl.org/olia/penn.owl#CC and can attach it to the subject:

Retrieving types for OLiA Annotation Individuals

The previous step has shown you how to get URIs for your tags, the next step is to get all types for these URIs.

General approach:

  1. Variant a) Load the OLiA Linking Model into a reasoner and query it
  2. Variant b) Use a reasoner to extract all type once and then index them in a hashmap
Note: we will focus on variant b) here as it is much more efficent. 

Unix script

You will need to download and install Pellet-CLI.

You can extract all kind of inferences, but you only need "ClassAssertion".

#!/bin/bash
#URI for linking model should be http://purl.org/olia/penn-link.rdf
LINKINGMODEL=http://purl.org/olia/penn-link.rdf 
# filter only returns classes from http://purl.org/olia/olia.owl
FILTER=http://purl.org/olia/olia.owl
pellet extract -v -s "ClassAssertion" $LINKINGMODEL > extracted.owl
rapper -i rdfxml extracted.owl | grep $FILTER > filtered.nt
cat filtered.nt | cut -f1,3 -d '>' | sed 's/> \t/;s//'

or if you prefer one line:

 pellet extract -v -s "ClassAssertion" http://purl.org/olia/penn-link.rdf | rapper -i rdfxml -I - - file | grep 'http://purl.org/olia/olia.owl' | cut -f1,3 -d '>' | sed 's/> \t/;s//'

How to create a new OLiA Annotation Model

  1. Have a look at the code repository at http://sourceforge.net/projects/olia/
  2. Write an email to nlp2rdf -@- lists.informatik.uni-leipzig.de so Christian Chiarcos can help you
  3. Find a documentation of the tag set you wish to transform, e.g. for Penn:
  4. Create an Annotation Model
  5. Create a Linking Model
  6. Write an email to Christian and nlp2rdf -@- lists.informatik.uni-leipzig.de and we will host it for you.
Personal tools
Namespaces

Variants
Actions
Back to main:
NIF 2.0 Draft
Documentation
ToDo - Help Wanted
Navigation
Toolbox