SEA: Search Engines Amsterdam talks are usually held on the last Friday of the month, at Science Park 904. Two talks in a row, one industrial, the other academic, 25+5 minutes each. No marketing. Just algorithms. Followed by drinks.

SEA talks are brought to you by Amsterdam Data Science.

Sign up at Meetup


  • Please see Meetup
  • January 30, 2014 (16-17hrs, Room C1.112)
    • Michiel Hildebrand (Spinque) — Strategies for searching Linked Data
    • Maria-Hendrike Peetz (University of Amsterdam) – Active Learning for Filtering of Streaming Documents
  • December 19, 2014 (16-17hrs, Room A1.14)
    • Giovanni Lanzani (GoDataDriven) — Real time data driven applicationsFor data to be the fuel of the 21th century, and for data science to live up to its promise as a driver of innovation, their application should not be confined to dashboards and static analyses. Instead they should be the driver of real applications that support the organisations that own or generates the data. Most of these applications require real-time access to the data. However, many Big Data analyses and tools are inherently batch-driven and not well suited for secure, real-time and performance critical connections with applications. Trade-offs become often inevitable, especially when mixing multiple tools and data sources.
      In this talk we will describe our journey to build a data driven application at a large Dutch financial institution. We will dive into the issues we faced, our considerations and the technical choices we made in order to perform data analyses but also drive a web-based, real-time applications. We considered and used Impala, Hbase, and MongoDB, but also conventional SQL databases such as MySQL and PostgreSQL. Important aspects in our journey were, among others, the handling of geographical data, the access to hundreds of millions of records as well as the real time analysis of millions or data points.
    • Artem Grotov (University of Amsterdam) — User models: what are they good for?User models have attracted a lot of attention from the IR community in the recent years. They have been used to understand the way users interact with search engine result pages, predict clicks, to measure relevance from observed clicks and to learn and to evaluate ranking functions. Understanding the users is important to make them more satisfied: click models provide a formal way to reason about the users behaviour, also they make it possible to evaluate hypotheses about the user behaviour from the observed user behaviour. User models can be used to simulate the users interactions in the absence of users: this can be useful to pre-evaluate the learning algorithms before deploying them and to get insights how the users would interact with new result pages. User models are attractive because they bridge user interaction data with document relevance. Raw click data is noisy and biased: the top results receive a lot of clicks just because they are on top and therefore a if one document receives more clicks it does not mean it is more relevant. User models help to get rid of this bias and infer relevance from user feedback. User models can be used to design IR metrics. For example metrics like NDCG discount the relevance of documents depending on its rank, however the precise way that they do it is often based on the designers intuition about the users behaviour. User models provide a way to derive the discounting method from the observed user behaviour. User models can be used to improve the rankings serving both as features or targets for learning rankers. In this talk I will cover these applications of user models and discuss their successes and shortcomings and the future directions of user modelling.


  • October 31, 2014 (16-17hrs, Room A1.10)
    • Edgar Meij (Yahoo Labs) — Web Scale Semantic SearchMost web search engine users are increasingly expecting direct and contextually relevant answers to their information needs rather than mere links to documents. In order to arrive at such answers we need to tackle several issues, including (but not limited to) entity linking, entity retrieval, entity reconciliation, intent classification, and personalization, all without losing sight of efficiency. In this talk I will give some background on how such an end-to-end pipeline for semantic search is being implemented and improved at Yahoo.
    • Ridho Reinanda (UvA) — Semantic Search for the 1%What if our users need in-depth information about people and organizations within a specific domain? This might include needs that are not (directly) satisfied yet by a general-purpose knowledge base/search engine. Addressing this requires employing information extraction techniques, such as fine-grained entity recognition, entity relation extraction, temporal relation extraction; and combining them in a semantic search system. I will talk about components of such a system, and challenges that arise in each task in the context of searching and browsing the 1%.


  • June 27, 2014 (16-17hrs, Room D1.112)
    • Jan Willem Mathijssen ( — Struggling with precision in an increasingly ambiguous catalogue
    • David Graus (UvA) — Entity search


  • May 23, 2014 (16-17hrs, Room A1.04)
    • Nieko Maatjes (Yandex) — My News – Personalisation at Yandex.NewsI’ll be explaining how My News leverages results from the user’s social network accounts and Yandex.News’ own functionality to supply the user with a clustered, annotated and enriched view of the news shared and/or published by his friends and trusted news sources.
    • Manos Tsagkias (UvA) — Twitter-based music recommendationI will discuss algorithms for music recommendation based on self-reported music listening behavior on Twitter.


    • Apr 25, 2014 (16-17hrs, Room A1.04)
      • Erik van Oosten (Marktplaats) — Interleaving at Marktplaats
        This presentation shows the algorithms that uses to 1) present listings from several systems, and 2) prevent sellers from taking too much exposure in search results.
      • Ilya Markov (UvA) — Reducing Uncertainty in Resource Selection
        In federated search vertical results, such as products, news, images, etc., are blended into the general web search results. Two problems arise in this process: (i) resource or vertical selection is concerned with identifying relevant sources of information for a given user’s query; (ii) results presentation is concerned with positioning vertical results on a page. This talk is focused on the former problem, i.e. resource selection. We will show that the process of identifying relevant sources is uncertain, i.e. non-deterministic, due to a number of factors. We will discuss how this uncertainty can be measured and present a general way of reducing it. Several implementations of this general idea will be discussed showing merits in reducing the uncertainty within the resource selection process.


    • Mar 28, 2014 (16-17hrs, Room B0.209)
      • Marco Hollenberg (Sanoma/Kieskeurig) — Kieskeurig and SOLR relevancy is a Dutch product comparison website which uses SOLR as its search engine. I discuss the challenges we faced in using SOLR, focussing on relevancy sorting.
      • Daan Odijk (UvA) — Online Query Modeling: Generating Queries from Streams
        In this talk I will discuss ongoing work on algorithms for generating queries from a streaming textual source, such as television subtitles, to effectively retrieve related content. The consumption of television content may trigger queries in which users look for related information or content. We consider the task of online query modeling in a live television setting, where a user is interested in finding archived news videos related to a specific news broadcast item. So far, approaches to query modeling have been proposed to generate short queries from descriptive queries or long documents, but no existing approach considers the dynamics of streaming sources. Using live television subtitles, we learn to model queries with high retrieval effectiveness, optimized using an online learning approach on a combined query and retrieval model. Finally, our online query modeling approach is sufficiently efficient that it can be performed in real-time in a live TV setting.


    • Feb 28, 2014 (16-17hrs, Room B0.203)
      • Henning Rode (Textkernel) — Multi-Entity Auto-Suggestion
      • Shangsong Liang (UvA) — Fusion Helps Diversification


    • Jan 31, 2014 (16-17hrs, Room B0.209)
      • Boaz Leskes (Elastic Search) — Staying ahead of Time
      • Wouter Weerkamp (904Labs) — Streamwatchr: Watch the music. While it’s playing