Power-Law Distributions for Paraphrases Extracted from Bilingual Corpora

Publication Type  Conference Paper
Author  Martzoukos S., Monz C.
Year of Publication  2012
Conference Name  European Chapter of the Association for Computational Linguistics
Month Published  April
Abstract  

We describe a novel method that extracts paraphrases from a bitext, for both the source and target languages.
In order to reduce the search space, we decompose the phrase-table into sub-phrase-tables and construct separate clusters for source and target phrases.
We convert the clusters into graphs, add smoothing/syntactic-information-carrier vertices, and compute the similarity between phrases with a random walk-based measure, the commute time. The resulting phrase-paraphrase probabilities are built upon the conversion of the commute times into artificial co-occurrence counts with a novel technique.
The co-occurrence count distribution belongs to the power-law family.

Export  BibTex
Full paper  PDF (1.22 MB)
AttachmentSize
sm_eacl2012_camera_ready.pdf1.22 MB