| Abstract | | We describe a novel method that extracts paraphrases from a bitext, for both the source and target languages.
In order to reduce the search space, we decompose the phrase-table into sub-phrase-tables and construct separate clusters for source and target phrases.
We convert the clusters into graphs, add smoothing/syntactic-information-carrier vertices, and compute the similarity between phrases with a random walk-based measure, the commute time. The resulting phrase-paraphrase probabilities are built upon the conversion of the commute times into artificial co-occurrence counts with a novel technique.
The co-occurrence count distribution belongs to the power-law family.
|