The timeline for the 2011 edition of the track has been set.
The guidelines are available at this address: http://bit.ly/entity2011-guidelines. Please follow the mailing list for the related discussion.
We plan to run two main tasks in 2011: related entity finding (REF) and entity list completion (ELC). In addition, there will be a Linked Open Data (LOD) variant of the REF task. Groups may choose to participate in any of these tasks.
Related entity finding (REF)
- Task: return a ranked list of entities of a specified type that engage in a given relationship with a given source entity.
- Collection: ClueWeb09 English
- Entity identification: Homepage
- Topics: 50 new topics for 2011
- Changes compared to 2010
- Only primary homepages are accepted, i.e., relevance is binary
- For each answer, a (single) supporting document is required; answers without supporting evidence will be judged incorrect
- Wikipedia pages are (still) not accepted as entity homepages, but they can be supporting documents
- Instead of the four high-level entity types (person, organization, location, product) a more fine-grained target type will be defined using the DBPedia Ontology; i.e., possible types are classes within this ontology
REF LOD variant
The task, collection, and topics are the very same as for the REF task, but there is one important difference:
- Entity identification: LOD URI (given a LOD crawl, which is the same as used for the ELC task)
Entity list completion (ELC)
The ELC task will be the same as in 2010, only with more topics and possibly on a larger Linked Open Data crawl.
- Task: same as REF, but topic definition includes a set of examples entities (identified by their LOD URI)
- Collection: Linked Open Data crawl (BTC-2009 or BTC-2010 or other)
- Entity identification: LOD URI
- Topics: based on REF 2010 topics (and possibly some more)
Questions, comments, feedback are welcome!
In 2010, the Entity track ran for the second time. Given that the track featured only a pilot related entity finding (REF) task in 2009, this year was its first “complete” edition. With 15 participating groups, Entity was one of the most popular tracks at TREC 2010.
The main task investigated was, again, REF, this time with 50 topics and on the English portion of the ClueWeb09 collection. Additionally, we introduced an entity list completion (ELC) task which ran as a pilot. ELC addresses the same task as REF, but the collection is a sample of Linked Open Data and entities are identified by their URI.
Slides from the track’s overview talk are available here.
Four groups presented their work during the Entity session:
- Marc Bron (University of Amsterdam, ILPS) proposed to build an entity repository based on Freebase. This repository contains 12 million entities with name variants and type information for each entity. For 1 million entities, the associated homepage is also available. Based on an initial investigation of the 2010 qrels, Freebase seems to cover about two-third of the relevant entities. (slides)
- Ludovic Bonnefoy (University of Avignon) presented a QA approach to the REF task. The overall goal is to build a system that is able to handle any type of named entity (both broad: “person” and specific: “teammate”), as opposed to QA systems that deal with only a handful of types. Therefore, fine-grained type extraction is performed. Language models of named entities and entity types are compared. Various methods are considered for determining the final ranking of entities, based on compacity, KL-divergence, and combinations of the two. (slides)
- Michael Leben (Hasso Plattner Institute/SAP Research) introduced an online approach to REF: a web search engine is used to retrieve snippets/documents, from which candidate entities are extracted, which are then put through a de-duplication step. Ranking considers both local (distance between source and target entities in the document) and global (web search engine position of documents) contexts. Finally, homepage finding is based on 17 different features; their weights are estimated using a genetic algorithm. (slides)
- Olga Vechtomova (University of Waterloo) proposed an unsupervised approach to the REF task, using NLP tools. An initial candidate list of entities is extracted from top ranked documents retrieved using a web search engine. This list is then refined based on the similarity of candidate entities and so-called seed entities, which are instances of the target category; entities are represented by feature vectors created out of grammatical relationships. Category names are extracted from the narrative, using POS tagging and NP chunking. (slides)
Seven groups presented their work during the poster session:
- Design Choices in Related Entity Finding
University of Amsterdam, ILPS
(poster)
- ECIR - a Lightweight approach for Entity-centric Information Retrieval
SAP Research / Hasso Plattner Institute
(poster)
- Entity List Completion Using Set Expansion Techniques
Carnegie Mellon University
(poster)
- NiCT at TREC 2010: Related Entity Finding
National Institute of Information and Communications Technology
(poster)
- Related Entity Finding Task
Beijing Institute of Technology
(poster)
- Related Entity Finding: University of Waterloo at TREC 2010
University of Waterloo
(poster)
- Searching for Entities When Retrieval Meets Extraction
University of Pittsburgh
(poster)
Slides from the planning session are available here. The plans for Entity 2011 are summarized in a separate post.
Below is an overview of the Entity track related sessions during the 2010 TREC conference.
| Date |
Time |
Event |
Location |
| Wed, Nov 17 |
12:30-13:00 |
Entity Overview talk |
Green Auditorium |
| Wed, Nov 17 |
16:00-17:30 |
Poster session |
Green Audit. area |
| Thu, Nov 18 |
16:45-17:30 |
Entity 2011 Planning Session |
Portrait Room |
| Fri, Nov 19 |
09:00-10:30 |
Entity track Paper Session |
Portrait Room |
The submission site for the pilot (ELC) task is available.
Submissions are accepted until Sunday Oct 17, 23.59 PST.
The checker script is available from here.
You can go forward to the next page