Example 04: Using the Parser to Extract SVO Triples

In this example, we will use the parser to obtain a syntactic analysis of each sentece, and then process the obtained trees to detect some relevant patterns, such as Subject-verb-object triples describing who did what to who.

We will use code from Example 3 as a starting point, and add code to create a parser instance.

// Create dependency parser & SRL
freeling::dep_treeler parser(lpath+L"dep_treeler/dependences.dat");

Here we use a statistical dependency parser, though FreeLing has other parsers (one rule-based dependency parser and one rule-based constituency parser) that you can use.

We also must add a call to the parser after the other modules.

// parse sentences
parser.analyze(ls);

After calling the parser, sentence objects are enriched with a dependency tree we can traverse. Next step is modifying our ProcessSentence function search for SVO triples and output them:

void ProcessSentences(const list<freeling::sentence> &ls) {

  // for each sentence in list
  for (list<freeling::sentence>::const_iterator s=ls.begin(); s!=ls.end(); ++s) {
    for  (freeling::dep_tree::const_iterator p=s->get_dep_tree().begin(); p!=s->get_dep_tree().end(); p++) {
      // if it is a verb, check dependants
      if  (p->get_word().get_tag()[0]==L'V') {
        wstring ssubj, lsubj;
        wstring sdobj, ldobj;
        for (freeling::dep_tree::const_sibling_iterator ch = p.sibling_begin(); ch!=p.sibling_end(); ch++) { 
          if (ch->get_label()==L"SBJ") 
            extract_lemma_and_sense(ch->get_word(), lsubj, ssubj);

          else if (ch->get_label()==L"OBJ") 
            extract_lemma_and_sense(ch->get_word(), ldobj, sdobj);
        }
        // if we found a SVO triple, output it
        if (lsubj!=L"" and ldobj!=L"") {
          wstring spred, lpred;
          extract_lemma_and_sense(p->get_word(), lpred, spred);

          wcout << L"SVO : (pred:    " << lpred << L" [" << spred << L"]" << endl;
          wcout << L"       subject: " << lsubj << L" [" << ssubj << L"]" << endl; 
          wcout << L"       dobject: " << ldobj << L" [" << sdobj << L"]" << endl;
          wcout << L"      )" << endl;
        }
      }
    }
  }
}

The function extract_lemma_and_sense is just to make code nicer. and looks like this

//---------------------------------------------
// Extract lemma and sense of word 'w' and store them
// in 'lem' and 'sens' respectively
//---------------------------------------------

void extract_lemma_and_sense(const freeling::word &w, wstring &lem, wstring &sens) {
   lem = w.get_lemma();
   if (not w.get_senses().empty()) sens = w.get_senses().begin()->first;
   else sens=L"";
}

Code

Find here the whole code:

Example

Assuming the input file contains the following sentences:

The big cat eats fresh fish. My neighbour's dog chased the cat.
The dog is eating meat. Some mice were hunted by the cat.

The output would be:

SVO : (pred:    eat [01168468-v]
       subject: cat [02121620-n]
       dobject: fish [02512053-n]
      )
SVO : (pred:    chase [02001858-v]
       subject: dog [02084071-n]
       dobject: cat [02121620-n]
      )

Remarks