TREC Entity Track // Searching for entities and properties of entities

Archive for the ‘Conference’ Category

Plans for Entity 2011

We plan to run two main tasks in 2011: related entity finding (REF) and entity list completion (ELC). In addition, there will be a Linked Open Data (LOD) variant of the REF task. Groups may choose to participate in any of these tasks.

Related entity finding (REF)

  • Task: return a ranked list of entities of a specified type that engage in a given relationship with a given source entity.
  • Collection: ClueWeb09 English
  • Entity identification: Homepage
  • Topics: 50 new topics for 2011
  • Changes compared to 2010
    • Only primary homepages are accepted, i.e., relevance is binary
    • For each answer, a (single) supporting document is required; answers without supporting evidence will be judged incorrect
    • Wikipedia pages are (still) not accepted as entity homepages, but they can be supporting documents
    • Instead of the four high-level entity types (person, organization, location, product) a more fine-grained target type will be defined using the DBPedia Ontology; i.e., possible types are classes within this ontology

REF LOD variant

The task, collection, and topics are the very same as for the REF task, but there is one important difference:

  • Entity identification: LOD URI (given a LOD crawl, which is the same as used for the ELC task)

Entity list completion (ELC)

The ELC task will be the same as in 2010, only with more topics and possibly on a larger Linked Open Data crawl.

  • Task: same as REF, but topic definition includes a set of examples entities (identified by their LOD URI)
  • Collection: Linked Open Data crawl (BTC-2009 or BTC-2010 or other)
  • Entity identification: LOD URI
  • Topics: based on REF 2010 topics (and possibly some more)

Questions, comments, feedback are welcome!

Report from TREC 2010

November 29, 2010Conference2 Comments

In 2010, the Entity track ran for the second time. Given that the track featured only a pilot related entity finding (REF) task in 2009, this year was its first “complete” edition. With 15 participating groups, Entity was one of the most popular tracks at TREC 2010.
The main task investigated was, again, REF, this time with 50 topics and on the English portion of the ClueWeb09 collection. Additionally, we introduced an entity list completion (ELC) task which ran as a pilot. ELC addresses the same task as REF, but the collection is a sample of Linked Open Data and entities are identified by their URI.

Slides from the track’s overview talk are available here.

Four groups presented their work during the Entity session:

  • Marc Bron (University of Amsterdam, ILPS) proposed to build an entity repository based on Freebase. This repository contains 12 million entities with name variants and type information for each entity. For 1 million entities, the associated homepage is also available. Based on an initial investigation of the 2010 qrels, Freebase seems to cover about two-third of the relevant entities. (slides)
  • Ludovic Bonnefoy (University of Avignon) presented a QA approach to the REF task. The overall goal is to build a system that is able to handle any type of named entity (both broad: “person” and specific: “teammate”), as opposed to QA systems that deal with only a handful of types. Therefore, fine-grained type extraction is performed. Language models of named entities and entity types are compared. Various methods are considered for determining the final ranking of entities, based on compacity, KL-divergence, and combinations of the two. (slides)
  • Michael Leben (Hasso Plattner Institute/SAP Research) introduced an online approach to REF: a web search engine is used to retrieve snippets/documents, from which candidate entities are extracted, which are then put through a de-duplication step. Ranking considers both local (distance between source and target entities in the document) and global (web search engine position of documents) contexts. Finally, homepage finding is based on 17 different features; their weights are estimated using a genetic algorithm. (slides)
  • Olga Vechtomova (University of Waterloo) proposed an unsupervised approach to the REF task, using NLP tools. An initial candidate list of entities is extracted from top ranked documents retrieved using a web search engine. This list is then refined based on the similarity of candidate entities and so-called seed entities, which are instances of the target category; entities are represented by feature vectors created out of grammatical relationships. Category names are extracted from the narrative, using POS tagging and NP chunking. (slides)

Seven groups presented their work during the poster session:

  • Design Choices in Related Entity Finding
    University of Amsterdam, ILPS
    (poster)
  • ECIR - a Lightweight approach for Entity-centric Information Retrieval
    SAP Research / Hasso Plattner Institute
    (poster)
  • Entity List Completion Using Set Expansion Techniques
    Carnegie Mellon University
    (poster)
  • NiCT at TREC 2010: Related Entity Finding
    National Institute of Information and Communications Technology
    (poster)
  • Related Entity Finding Task
    Beijing Institute of Technology
    (poster)
  • Related Entity Finding: University of Waterloo at TREC 2010
    University of Waterloo
    (poster)
  • Searching for Entities When Retrieval Meets Extraction
    University of Pittsburgh
    (poster)

Slides from the planning session are available here. The plans for Entity 2011 are summarized in a separate post.

TREC 2010 Conference

November 17, 2010ConferenceNo Comments

Below is an overview of the Entity track related sessions during the 2010 TREC conference.

Date Time Event Location
Wed, Nov 17 12:30-13:00 Entity Overview talk Green Auditorium
Wed, Nov 17 16:00-17:30 Poster session Green Audit. area
Thu, Nov 18 16:45-17:30 Entity 2011 Planning Session Portrait Room
Fri, Nov 19 09:00-10:30 Entity track Paper Session Portrait Room

Conference summary and plans for 2010

In this post we summarize the Entity track related events from the TREC 2009 conference, and outline our plans for the 2010 edition of the track.

Conference summary

The 1st Entity Workshop was opened with a talk by Ian Soboroff, who shared “the NIST experience” on topic development as well as some modest changes proposed for 2010 (slides). Ian’s talk was followed by four boaster talks:

  • Richard McCreadie, University of Glasgow (slides)
  • Yi Fang, Purdue University (slides)
  • Vinod Vydiswaran, University of Illinois at Urbana-Campaign (slides)
  • Rianne Kaptein, University of Amsterdam (Amst. group) (slides)

The workshop continued with a discussion of changes and possible pilot tasks planned for 2010 (details follow below).

The main conference featured an Entity track Overview talk, given by Krisztian Balog (slides), and presentations from four groups:

  • Pavel Serdyukov, Delft University of Technology (slides)
  • Youzheng Wu, National Institute of Information and Communications Technology (NiCT)
  • Wei Zheng, University of Delaware/Singapore Management University
  • Rianne Kaptein, University of Amsterdam (Amst. group) (slides)

Finally, the 2nd Entity Workshop was devoted to planning for 2010: finalizing changes to be made to the related entity finding task (see a list of these below), and discussing alternatives for a second (sub)task, running as a pilot in 2010. Possible new tasks included:

  • Summary text generation: “counterpart of snippets for document search” (any piece of text, does not need to come from a single document)
  • Support document generation: given two entities and their relation, return a document that provides evidence
  • Aspect identification: offer additional aspects for entities (”see also” or query suggestion feature)

Out of these tasks, support document generation seemed the most feasible, as evaluation of the other two is problematic. Support document generation, however, is not an especially interesting one, therefore we suggest a new option below.

Plans for 2010

The track will run related entity finding as the main task:

Given an input entity, by its name and homepage, the type of the target entity, as well as the nature of their relation, described in free text, find homepages of related entities that are of target type, standing in the required relation to the input entity.

The following changes are planned to be implemented (compared to 2009 edition):

  • More topics (ideally 50, created by NIST)
  • Use CategoryA of ClueWeb
  • Single-record submission format (topic docno rank score runtag name)
  • No special treatment of Wikipedia pages
  • Definition of “relevant” in this context: the page that tells you a correct entity, but isn’t an entity homepage
  • Judge names for primary pages only
  • Tweaking of evaluation measures (NDCG should reward primary pages more, use P@R instead of P@10)

We propose a semantic entity search subtask for 2010: return URIs of related entities, instead of their homepages. We are planning to enrich topics with URIs of the input entities. URIs need to come from a predefined set of semantic data sources (which will include DBPedia and Freebase, at least).
Since this subtask was not discussed at TREC, we are putting it up for discussion now (simply comment on this post).

Feedback/comments/suggestions are not only very welcome, but highly encouraged!

Tags:

Conference agenda

November 10, 2009ConferenceNo Comments

Below is an overview of entity track related events over the 2009 TREC conference.

Date Time Event Location
Tue, Nov 17 13:30-17:30 Entity Track Workshop I Lecture Room D
Wed, Nov 18 12:30-13:00 Entity Track Overview Main Auditorium
Thu, Nov 19 14:10-15:30 Entity Track Session Green Auditorium
Fri, Nov 20 09:00-10:30 Entity Track Workshop II Lecture Room D

Tags: , ,