IPM
Editorial Board/Publications information
Publication year: 2012
Source:Information Processing & Management, Volume 48, Issue 4
Source:Information Processing & Management, Volume 48, Issue 4
Categories: Journals we follow
Investigating effectiveness and user acceptance of semantic social tagging for knowledge sharing
Publication year: 2012
Source:Information Processing & Management, Volume 48, Issue 4
Shiu-Li Huang, Sheng-Cheng Lin, Yung-Chun Chan
Social tagging systems enable users to assign arbitrary tags to various digital resources. However, they face vague-meaning problems when users retrieve or present resources with the keyword-based tags. In order to solve these problems, this study takes advantage of Semantic Web technology and the topological characteristics of knowledge maps to develop a system that comprises a semantic tagging mechanism and triple-pattern and visual searching mechanisms. A field experiment was conducted to evaluate the effectiveness and user acceptance of these mechanisms in a knowledge sharing context. The results show that the semantic social tagging system is more effective than a keyword-based system. The visualized knowledge map helps users capture an overview of the knowledge domain, reduce cognitive effort for the search, and obtain more enjoyment. Traditional keyword tagging with a keyword search still has the advantage of ease of use and the users had higher intention to use it. This study also proposes directions for future development of semantic social tagging systems.
Source:Information Processing & Management, Volume 48, Issue 4
Shiu-Li Huang, Sheng-Cheng Lin, Yung-Chun Chan
Social tagging systems enable users to assign arbitrary tags to various digital resources. However, they face vague-meaning problems when users retrieve or present resources with the keyword-based tags. In order to solve these problems, this study takes advantage of Semantic Web technology and the topological characteristics of knowledge maps to develop a system that comprises a semantic tagging mechanism and triple-pattern and visual searching mechanisms. A field experiment was conducted to evaluate the effectiveness and user acceptance of these mechanisms in a knowledge sharing context. The results show that the semantic social tagging system is more effective than a keyword-based system. The visualized knowledge map helps users capture an overview of the knowledge domain, reduce cognitive effort for the search, and obtain more enjoyment. Traditional keyword tagging with a keyword search still has the advantage of ease of use and the users had higher intention to use it. This study also proposes directions for future development of semantic social tagging systems.
Categories: Journals we follow
Matching meaning for cross-language information retrieval
Publication year: 2012
Source:Information Processing & Management, Volume 48, Issue 4
Jianqiang Wang, Douglas W. Oard
This article describes a framework for cross-language information retrieval that efficiently leverages statistical estimation of translation probabilities. The framework provides a unified perspective into which some earlier work on techniques for cross-language information retrieval based on translation probabilities can be cast. Modeling synonymy and filtering translation probabilities using bidirectional evidence are shown to yield a balance between retrieval effectiveness and query-time (or indexing-time) efficiency that seems well suited large-scale applications. Evaluations with six test collections show consistent improvements over strong baselines.
Source:Information Processing & Management, Volume 48, Issue 4
Jianqiang Wang, Douglas W. Oard
This article describes a framework for cross-language information retrieval that efficiently leverages statistical estimation of translation probabilities. The framework provides a unified perspective into which some earlier work on techniques for cross-language information retrieval based on translation probabilities can be cast. Modeling synonymy and filtering translation probabilities using bidirectional evidence are shown to yield a balance between retrieval effectiveness and query-time (or indexing-time) efficiency that seems well suited large-scale applications. Evaluations with six test collections show consistent improvements over strong baselines.
Categories: Journals we follow
Egocentric analysis of co-authorship network structure, position and performance
Publication year: 2012
Source:Information Processing & Management, Volume 48, Issue 4
Alireza Abbasi, Kon Shing Kenneth Chung, Liaquat Hossain
In this study, we propose and validate social networks based theoretical model for exploring scholars’ collaboration (co-authorship) network properties associated with their citation-based research performance (i.e., g-index). Using structural holes theory, we focus on how a scholar’s egocentric network properties of density, efficiency and constraint within the network associate with their scholarly performance. For our analysis, we use publication data of high impact factor journals in the field of “Information Science & Library Science” between 2000 and 2009, extracted from Scopus. The resulting database contained 4837 publications reflecting the contributions of 8069 authors. Results from our data analysis suggest that research performance of scholars’ is significantly correlated with scholars’ ego-network measures. In particular, scholars with more co-authors and those who exhibit higher levels of betweenness centrality (i.e., the extent to which a co-author is between another pair of co-authors) perform better in terms of research (i.e., higher g-index). Furthermore, scholars with efficient collaboration networks who maintain a strong co-authorship relationship with one primary co-author within a group of linked co-authors (i.e., co-authors that have joint publications) perform better than those researchers with many relationships to the same group of linked co-authors.
Source:Information Processing & Management, Volume 48, Issue 4
Alireza Abbasi, Kon Shing Kenneth Chung, Liaquat Hossain
In this study, we propose and validate social networks based theoretical model for exploring scholars’ collaboration (co-authorship) network properties associated with their citation-based research performance (i.e., g-index). Using structural holes theory, we focus on how a scholar’s egocentric network properties of density, efficiency and constraint within the network associate with their scholarly performance. For our analysis, we use publication data of high impact factor journals in the field of “Information Science & Library Science” between 2000 and 2009, extracted from Scopus. The resulting database contained 4837 publications reflecting the contributions of 8069 authors. Results from our data analysis suggest that research performance of scholars’ is significantly correlated with scholars’ ego-network measures. In particular, scholars with more co-authors and those who exhibit higher levels of betweenness centrality (i.e., the extent to which a co-author is between another pair of co-authors) perform better in terms of research (i.e., higher g-index). Furthermore, scholars with efficient collaboration networks who maintain a strong co-authorship relationship with one primary co-author within a group of linked co-authors (i.e., co-authors that have joint publications) perform better than those researchers with many relationships to the same group of linked co-authors.
Categories: Journals we follow
Cost-effective on-demand associative author name disambiguation
Publication year: 2012
Source:Information Processing & Management, Volume 48, Issue 4
Adriano Veloso, Anderson A. Ferreira, Marcos André Gonçalves, Alberto H.F. Laender, Wagner Meira
Authorship disambiguation is an urgent issue that affects the quality of digital library services and for which supervised solutions have been proposed, delivering state-of-the-art effectiveness. However, particular challenges such as the prohibitive cost of labeling vast amounts of examples (there are many ambiguous authors), the huge hypothesis space (there are several features and authors from which many different disambiguation functions may be derived), and the skewed author popularity distribution (few authors are very prolific, while most appear in only few citations), may prevent the full potential of such techniques. In this article, we introduce an associative author name disambiguation approach that identifies authorship by extracting, from training examples, rules associating citation features (e.g., coauthor names, work title, publication venue) to specific authors. As our main contribution we propose three associative author name disambiguators: (1) EAND (Eager Associative Name Disambiguation), our basic method that explores association rules for name disambiguation; (2) LAND (Lazy Associative Name Disambiguation), that extracts rules on a demand-driven basis at disambiguation time, reducing the hypothesis space by focusing on examples that are most suitable for the task; and (3) SLAND (Self-Training LAND), that extends LAND with self-training capabilities, thus drastically reducing the amount of examples required for building effective disambiguation functions, besides being able to detect novel/unseen authors in the test set. Experiments demonstrate that all our disambigutators are effective and that, in particular, SLAND is able to outperform state-of-the-art supervised disambiguators, providing gains that range from 12% to more than 400%, being extremely effective and practical.
Source:Information Processing & Management, Volume 48, Issue 4
Adriano Veloso, Anderson A. Ferreira, Marcos André Gonçalves, Alberto H.F. Laender, Wagner Meira
Authorship disambiguation is an urgent issue that affects the quality of digital library services and for which supervised solutions have been proposed, delivering state-of-the-art effectiveness. However, particular challenges such as the prohibitive cost of labeling vast amounts of examples (there are many ambiguous authors), the huge hypothesis space (there are several features and authors from which many different disambiguation functions may be derived), and the skewed author popularity distribution (few authors are very prolific, while most appear in only few citations), may prevent the full potential of such techniques. In this article, we introduce an associative author name disambiguation approach that identifies authorship by extracting, from training examples, rules associating citation features (e.g., coauthor names, work title, publication venue) to specific authors. As our main contribution we propose three associative author name disambiguators: (1) EAND (Eager Associative Name Disambiguation), our basic method that explores association rules for name disambiguation; (2) LAND (Lazy Associative Name Disambiguation), that extracts rules on a demand-driven basis at disambiguation time, reducing the hypothesis space by focusing on examples that are most suitable for the task; and (3) SLAND (Self-Training LAND), that extends LAND with self-training capabilities, thus drastically reducing the amount of examples required for building effective disambiguation functions, besides being able to detect novel/unseen authors in the test set. Experiments demonstrate that all our disambigutators are effective and that, in particular, SLAND is able to outperform state-of-the-art supervised disambiguators, providing gains that range from 12% to more than 400%, being extremely effective and practical.
Categories: Journals we follow
Integer linear programming for Constrained Multi-Aspect Committee Review Assignment
Publication year: 2012
Source:Information Processing & Management, Volume 48, Issue 4
Maryam Karimzadehgan, ChengXiang Zhai
Automatic review assignment can significantly improve the productivity of many people such as conference organizers, journal editors and grant administrators. A general setup of the review assignment problem involves assigning a set of reviewers on a committee to a set of documents to be reviewed under the constraint of review quota so that the reviewers assigned to a document can collectively cover multiple topic aspects of the document. No previous work has addressed such a setup of committee review assignments while also considering matching multiple aspects of topics and expertise. In this paper, we tackle the problem of committee review assignment with multi-aspect expertise matching by casting it as an integer linear programming problem. The proposed algorithm can naturally accommodate any probabilistic or deterministic method for modeling multiple aspects to automate committee review assignments. Evaluation using a multi-aspect review assignment test set constructed using ACM SIGIR publications shows that the proposed algorithm is effective and efficient for committee review assignments based on multi-aspect expertise matching.
Source:Information Processing & Management, Volume 48, Issue 4
Maryam Karimzadehgan, ChengXiang Zhai
Automatic review assignment can significantly improve the productivity of many people such as conference organizers, journal editors and grant administrators. A general setup of the review assignment problem involves assigning a set of reviewers on a committee to a set of documents to be reviewed under the constraint of review quota so that the reviewers assigned to a document can collectively cover multiple topic aspects of the document. No previous work has addressed such a setup of committee review assignments while also considering matching multiple aspects of topics and expertise. In this paper, we tackle the problem of committee review assignment with multi-aspect expertise matching by casting it as an integer linear programming problem. The proposed algorithm can naturally accommodate any probabilistic or deterministic method for modeling multiple aspects to automate committee review assignments. Evaluation using a multi-aspect review assignment test set constructed using ACM SIGIR publications shows that the proposed algorithm is effective and efficient for committee review assignments based on multi-aspect expertise matching.
Categories: Journals we follow
Assessing user-specific difficulty of documents
Publication year: 2012
Source:Information Processing & Management
Mari-Sanna Paukkeri, Marja Ollikainen, Timo Honkela
On the web, a huge variety of text collections contain knowledge in different expertise domains, such as technology or medicine. The texts are written for different uses and thus for people having different levels of expertise on the domain. Texts intended for professionals may not be understandable at all by a lay person, and texts for lay people may not contain all the detailed information needed by a professional. Many information retrieval applications, such as search engines, would offer better user experience if they were able to select the text sources that best fit the expertise level of the user. In this article, we propose a novel approach for assessing the difficulty level of a document: our method assesses difficulty for each user separately. The method enables, for instance, offering information in a personalised manner based on the user’s knowledge of different domains. The method is based on the comparison of terms appearing in a document and terms known by the user. We present two ways to collect information about the terminology the user knows: by directly asking the users the difficulty of terms or, as a novel automatic approach, indirectly by analysing texts written by the users. We examine the applicability of the methodology with text documents in the medical domain. The results show that the method is able to distinguish between documents written for lay people and documents written for experts.
Source:Information Processing & Management
Mari-Sanna Paukkeri, Marja Ollikainen, Timo Honkela
On the web, a huge variety of text collections contain knowledge in different expertise domains, such as technology or medicine. The texts are written for different uses and thus for people having different levels of expertise on the domain. Texts intended for professionals may not be understandable at all by a lay person, and texts for lay people may not contain all the detailed information needed by a professional. Many information retrieval applications, such as search engines, would offer better user experience if they were able to select the text sources that best fit the expertise level of the user. In this article, we propose a novel approach for assessing the difficulty level of a document: our method assesses difficulty for each user separately. The method enables, for instance, offering information in a personalised manner based on the user’s knowledge of different domains. The method is based on the comparison of terms appearing in a document and terms known by the user. We present two ways to collect information about the terminology the user knows: by directly asking the users the difficulty of terms or, as a novel automatic approach, indirectly by analysing texts written by the users. We examine the applicability of the methodology with text documents in the medical domain. The results show that the method is able to distinguish between documents written for lay people and documents written for experts.
Categories: Journals we follow
A split-list approach for relevance feedback in information retrieval
Publication year: 2012
Source:Information Processing & Management
H.C. Wu, R.W.P. Luk, K.F. Wong, J.Y. Nie
In this paper we present a new algorithm for relevance feedback (RF) in information retrieval. Unlike conventional RF algorithms which use the top ranked documents for feedback, our proposed algorithm is a kind of active feedback algorithm which actively chooses documents for the user to judge. The objectives are (a) to increase the number of judged relevant documents and (b) to increase the diversity of judged documents during the RF process. The algorithm uses document-contexts by splitting the retrieval list into sub-lists according to the query term patterns that exist in the top ranked documents. Query term patterns include a single query term, a pair of query terms that occur in a phrase and query terms that occur in proximity. The algorithm is an iterative algorithm which takes one document for feedback in each of the iterations. We experiment with the algorithm using the TREC-6, -7, -8, -2005 and GOV2 data collections and we simulate user feedback using the TREC relevance judgements. From the experimental results, we show that our proposed split-list algorithm is better than the conventional RF algorithm and that our algorithm is more reliable than a similar algorithm using maximal marginal relevance.
Source:Information Processing & Management
H.C. Wu, R.W.P. Luk, K.F. Wong, J.Y. Nie
In this paper we present a new algorithm for relevance feedback (RF) in information retrieval. Unlike conventional RF algorithms which use the top ranked documents for feedback, our proposed algorithm is a kind of active feedback algorithm which actively chooses documents for the user to judge. The objectives are (a) to increase the number of judged relevant documents and (b) to increase the diversity of judged documents during the RF process. The algorithm uses document-contexts by splitting the retrieval list into sub-lists according to the query term patterns that exist in the top ranked documents. Query term patterns include a single query term, a pair of query terms that occur in a phrase and query terms that occur in proximity. The algorithm is an iterative algorithm which takes one document for feedback in each of the iterations. We experiment with the algorithm using the TREC-6, -7, -8, -2005 and GOV2 data collections and we simulate user feedback using the TREC relevance judgements. From the experimental results, we show that our proposed split-list algorithm is better than the conventional RF algorithm and that our algorithm is more reliable than a similar algorithm using maximal marginal relevance.
Categories: Journals we follow
Social relation extraction from texts using a support-vector-machine-based dependency trigram kernel
Publication year: 2012
Source:Information Processing & Management
Maengsik Choi, Harksoo Kim
We propose a social relation extraction system using dependency-kernel-based support vector machines (SVMs). The proposed system classifies input sentences containing two people’s names on the basis of whether they do or do not describe social relations between two people. The system then extracts relation names (i.e., social-related keywords) from sentences describing social relations. We propose new tree kernels called dependency trigram kernels for effectively implementing these processes using SVMs. Experiments showed that the proposed kernels delivered better performance than the existing dependency kernel. On the basis of the experimental evidence, we suggest that the proposed system can be used as a useful tool for automatically constructing social networks from unstructured texts.
Source:Information Processing & Management
Maengsik Choi, Harksoo Kim
We propose a social relation extraction system using dependency-kernel-based support vector machines (SVMs). The proposed system classifies input sentences containing two people’s names on the basis of whether they do or do not describe social relations between two people. The system then extracts relation names (i.e., social-related keywords) from sentences describing social relations. We propose new tree kernels called dependency trigram kernels for effectively implementing these processes using SVMs. Experiments showed that the proposed kernels delivered better performance than the existing dependency kernel. On the basis of the experimental evidence, we suggest that the proposed system can be used as a useful tool for automatically constructing social networks from unstructured texts.
Categories: Journals we follow
A novel term weighting scheme based on discrimination power obtained from past retrieval results
Publication year: 2012
Source:Information Processing & Management
Sa-kwang Song, Sung Hyon Myaeng
Term weighting for document ranking and retrieval has been an important research topic in information retrieval for decades. We propose a novel term weighting method based on a hypothesis that a term’s role in accumulated retrieval sessions in the past affects its general importance regardless. It utilizes availability of past retrieval results consisting of the queries that contain a particular term, retrieved documents, and their relevance judgments. A term’s evidential weight, as we propose in this paper, depends on the degree to which the mean frequency values for the relevant and non-relevant document distributions in the past are different. More precisely, it takes into account the rankings and similarity values of the relevant and non-relevant documents. Our experimental result using standard test collections shows that the proposed term weighting scheme improves conventional TF * IDF and language model based schemes. It indicates that evidential term weights bring in a new aspect of term importance and complement the collection statistics based on TF * IDF. We also show how the proposed term weighting scheme based on the notion of evidential weights are related to the well-known weighting schemes based on language modeling and probabilistic models.
Source:Information Processing & Management
Sa-kwang Song, Sung Hyon Myaeng
Term weighting for document ranking and retrieval has been an important research topic in information retrieval for decades. We propose a novel term weighting method based on a hypothesis that a term’s role in accumulated retrieval sessions in the past affects its general importance regardless. It utilizes availability of past retrieval results consisting of the queries that contain a particular term, retrieved documents, and their relevance judgments. A term’s evidential weight, as we propose in this paper, depends on the degree to which the mean frequency values for the relevant and non-relevant document distributions in the past are different. More precisely, it takes into account the rankings and similarity values of the relevant and non-relevant documents. Our experimental result using standard test collections shows that the proposed term weighting scheme improves conventional TF * IDF and language model based schemes. It indicates that evidential term weights bring in a new aspect of term importance and complement the collection statistics based on TF * IDF. We also show how the proposed term weighting scheme based on the notion of evidential weights are related to the well-known weighting schemes based on language modeling and probabilistic models.
Categories: Journals we follow
Assessing the quality of textual features in social media
Publication year: 2012
Source:Information Processing & Management
Flavio Figueiredo, Henrique Pinto, Fabiano Belém, Jussara Almeida, Marcos Gonçalves, David Fernandes, Edleno Moura
Social media is increasingly becoming a significant fraction of the content retrieved daily by Web users. However, the potential lack of quality of user generated content poses a challenge to information retrieval services, which rely mostly on textual features generated by users (particularly tags) commonly associated with the multimedia objects. This paper presents what, to the best of our knowledge, is currently the most comprehensive study of the relative quality of textual features in social media. We analyze four different features, namely, title, tags, description and comments posted by users, in four popular applications, namely, YouTube, Yahoo! Video, LastFM and CiteULike. Our study is based on an extensive characterization of data crawled from the four applications with respect to usage, amount and semantics of content, descriptive and discriminative power as well as content and information diversity across features. It also includes a series of object classification and tag recommendation experiments as case studies of two important information retrieval tasks, aiming at analyzing how these tasks are affected by the quality of the textual features. Classification and recommendation effectiveness is analyzed in light of our characterization results. Our findings provide valuable insights for future research and design of Web 2.0 applications and services.
Source:Information Processing & Management
Flavio Figueiredo, Henrique Pinto, Fabiano Belém, Jussara Almeida, Marcos Gonçalves, David Fernandes, Edleno Moura
Social media is increasingly becoming a significant fraction of the content retrieved daily by Web users. However, the potential lack of quality of user generated content poses a challenge to information retrieval services, which rely mostly on textual features generated by users (particularly tags) commonly associated with the multimedia objects. This paper presents what, to the best of our knowledge, is currently the most comprehensive study of the relative quality of textual features in social media. We analyze four different features, namely, title, tags, description and comments posted by users, in four popular applications, namely, YouTube, Yahoo! Video, LastFM and CiteULike. Our study is based on an extensive characterization of data crawled from the four applications with respect to usage, amount and semantics of content, descriptive and discriminative power as well as content and information diversity across features. It also includes a series of object classification and tag recommendation experiments as case studies of two important information retrieval tasks, aiming at analyzing how these tasks are affected by the quality of the textual features. Classification and recommendation effectiveness is analyzed in light of our characterization results. Our findings provide valuable insights for future research and design of Web 2.0 applications and services.
Categories: Journals we follow
A hybrid approach to managing job offers and candidates
Publication year: 2012
Source:Information Processing & Management
Rémy Kessler, Nicolas Béchet, Mathieu Roche, Juan-Manuel Torres-Moreno, Marc El-Bèze
The evolution of the job market has resulted in traditional methods of recruitment becoming insufficient. As it is now necessary to handle volumes of information (mostly in the form of free text) that are impossible to process manually, an analysis and assisted categorization are essential to address this issue. In this paper, we present a combination of the E-Gen and Cortex systems. E-Gen aims to perform analysis and categorization of job offers together with the responses given by the candidates. E-Gen system strategy is based on vectorial and probabilistic models to solve the problem of profiling applications according to a specific job offer. Cortex is a statistical automatic summarization system. In this work, E-Gen uses Cortex as a powerful filter to eliminate irrelevant information contained in candidate answers. Our main objective is to develop a system to assist a recruitment consultant and the results obtained by the proposed combination surpass those of E-Gen in standalone mode on this task.
Source:Information Processing & Management
Rémy Kessler, Nicolas Béchet, Mathieu Roche, Juan-Manuel Torres-Moreno, Marc El-Bèze
The evolution of the job market has resulted in traditional methods of recruitment becoming insufficient. As it is now necessary to handle volumes of information (mostly in the form of free text) that are impossible to process manually, an analysis and assisted categorization are essential to address this issue. In this paper, we present a combination of the E-Gen and Cortex systems. E-Gen aims to perform analysis and categorization of job offers together with the responses given by the candidates. E-Gen system strategy is based on vectorial and probabilistic models to solve the problem of profiling applications according to a specific job offer. Cortex is a statistical automatic summarization system. In this work, E-Gen uses Cortex as a powerful filter to eliminate irrelevant information contained in candidate answers. Our main objective is to develop a system to assist a recruitment consultant and the results obtained by the proposed combination surpass those of E-Gen in standalone mode on this task.
Categories: Journals we follow
Modeling, encoding and querying multi-structured documents
Publication year: 2012
Source:Information Processing & Management
Pierre-Édouard Portier, Noureddine Chatti, Sylvie Calabretto, Elöd Egyed-Zsigmond, Jean-Marie Pinon
The issue of multi-structured documents became prominent with the emergence of the digital Humanities field of practices. Many distinct structures may be defined simultaneously on the same original content for matching different documentary tasks. For example, a document may have both a structure for the logical organization of content (logical structure), and a structure expressing a set of content formatting rules (physical structure). In this paper, we present MSDM, a generic model for multi-structured documents, in which several important features are established. We also address the problem of efficiently encoding multi-structured documents by introducing MultiX, a new XML formalism based on the MSDM model. Finally, we propose a library of Xquery functions for querying MultiX documents. We will illustrate all the contributions with a use case based on a fragment of an old manuscript.
Source:Information Processing & Management
Pierre-Édouard Portier, Noureddine Chatti, Sylvie Calabretto, Elöd Egyed-Zsigmond, Jean-Marie Pinon
The issue of multi-structured documents became prominent with the emergence of the digital Humanities field of practices. Many distinct structures may be defined simultaneously on the same original content for matching different documentary tasks. For example, a document may have both a structure for the logical organization of content (logical structure), and a structure expressing a set of content formatting rules (physical structure). In this paper, we present MSDM, a generic model for multi-structured documents, in which several important features are established. We also address the problem of efficiently encoding multi-structured documents by introducing MultiX, a new XML formalism based on the MSDM model. Finally, we propose a library of Xquery functions for querying MultiX documents. We will illustrate all the contributions with a use case based on a fragment of an old manuscript.
Categories: Journals we follow
Editorial Board/Publications information
Publication year: 2012
Source:Information Processing & Management, Volume 48, Issue 3
Source:Information Processing & Management, Volume 48, Issue 3
Categories: Journals we follow
Soft approaches to information access on the Web: An introduction to the special issue
Publication year: 2012
Source:Information Processing & Management, Volume 48, Issue 3
Enrique Herrera-Viedma, Guy De Tré, Slawomir Zadrozny, Jose Angel Olivas
Soft Computing (SC) tools present a great potential in real-life problems related with engineering, industrial applications, medicine, finances, etc. In this special issue we present a set of seven papers that report original research about the use of SC techniques to solve the problems in the field of information access on the Web.
Source:Information Processing & Management, Volume 48, Issue 3
Enrique Herrera-Viedma, Guy De Tré, Slawomir Zadrozny, Jose Angel Olivas
Soft Computing (SC) tools present a great potential in real-life problems related with engineering, industrial applications, medicine, finances, etc. In this special issue we present a set of seven papers that report original research about the use of SC techniques to solve the problems in the field of information access on the Web.
Categories: Journals we follow
Bipolar queries in textual information retrieval: A new perspective
Publication year: 2012
Source:Information Processing & Management, Volume 48, Issue 3
Sławomir Zadrożny, Janusz Kacprzyk, Guy De Tré
A new concept of a bipolar query against collections of textual documents, i.e. in the context of information retrieval (IR), is introduced using recent developments in bipolar information modeling and bipolar database queries. Specifically, a particular approach to bipolar queries with an explicit “and possibly” type of an aggregation operator is used. An effective and efficient processing of such bipolar queries using standard IR data structures is briefly discussed. The bipolar queries proposed combine a flexibility provided by fuzzy logic with a more sophisticated representation of user preferences and intentions. This combination can make the search of vast resources of textual document, notably those available via the Internet, more intelligent.
Source:Information Processing & Management, Volume 48, Issue 3
Sławomir Zadrożny, Janusz Kacprzyk, Guy De Tré
A new concept of a bipolar query against collections of textual documents, i.e. in the context of information retrieval (IR), is introduced using recent developments in bipolar information modeling and bipolar database queries. Specifically, a particular approach to bipolar queries with an explicit “and possibly” type of an aggregation operator is used. An effective and efficient processing of such bipolar queries using standard IR data structures is briefly discussed. The bipolar queries proposed combine a flexibility provided by fuzzy logic with a more sophisticated representation of user preferences and intentions. This combination can make the search of vast resources of textual document, notably those available via the Internet, more intelligent.
Categories: Journals we follow
Hierarchical web resources retrieval by exploiting Fuzzy Formal Concept Analysis
Publication year: 2012
Source:Information Processing & Management, Volume 48, Issue 3
Carmen De Maio, Giuseppe Fenza, Vincenzo Loia, Sabrina Senatore
In recent years, knowledge structuring is assuming important roles in several real world applications such as decision support, cooperative problem solving, e-commerce, Semantic Web and, even in planning systems. Ontologies play an important role in supporting automated processes to access information and are at the core of new strategies for the development of knowledge-based systems. Yet, developing an ontology is a time-consuming task which often needs an accurate domain expertise to tackle structural and logical difficulties in the definition of concepts as well as conceivable relationships. This work presents an ontology-based retrieval approach, that supports data organization and visualization and provides a friendly navigation model. It exploits the fuzzy extension of the Formal Concept Analysis theory to elicit conceptualizations from datasets and generate a hierarchy-based representation of extracted knowledge. An intuitive graphical interface provides a multi-facets view of the built ontology. Through a transparent query-based retrieval, final users navigate across concepts, relations and population.
Source:Information Processing & Management, Volume 48, Issue 3
Carmen De Maio, Giuseppe Fenza, Vincenzo Loia, Sabrina Senatore
In recent years, knowledge structuring is assuming important roles in several real world applications such as decision support, cooperative problem solving, e-commerce, Semantic Web and, even in planning systems. Ontologies play an important role in supporting automated processes to access information and are at the core of new strategies for the development of knowledge-based systems. Yet, developing an ontology is a time-consuming task which often needs an accurate domain expertise to tackle structural and logical difficulties in the definition of concepts as well as conceivable relationships. This work presents an ontology-based retrieval approach, that supports data organization and visualization and provides a friendly navigation model. It exploits the fuzzy extension of the Formal Concept Analysis theory to elicit conceptualizations from datasets and generate a hierarchy-based representation of extracted knowledge. An intuitive graphical interface provides a multi-facets view of the built ontology. Through a transparent query-based retrieval, final users navigate across concepts, relations and population.
Categories: Journals we follow
Disambiguated query suggestions and personalized content-similarity and novelty ranking of clustered results to optimize web searches
Publication year: 2012
Source:Information Processing & Management, Volume 48, Issue 3
Gloria Bordogna, Alessandro Campi, Giuseppe Psaila, Stefania Ronchi
In this paper, we face the so called “ranked list problem” of Web searches, that occurs when users submit short requests to search engines. Generally, as a consequence of terms’ ambiguity and polysemy, users engage long cycles of query reformulation in an attempt to capture relevant information in the top ranked results. The overall objective of the proposal is to support the user in optimizing Web searches, by reducing the need for long search iterations. Specifically, in this paper we describe an iterative query disambiguation mechanism that follows three main phases. (1) The results of a Web search performed by the user (by submitting a query to a search engine) are clustered. (2) Clusters are ranked, based on a personalized balance of their content-similarity to the query and their novelty. (3) From each cluster, a disambiguated query that highlights the main contents of the cluster is generated, in such a way the new query is potentially capable to retrieve new documents, not previously retrieved; the disambiguated queries are suggestions for possibly new and more focused searches. The paper describes the proposal, illustrating a sample application of the mechanism. Finally, the paper presents a user’s evaluation experiment of the proposed approach, comparing it with common practice based on the direct use of search engines.
Source:Information Processing & Management, Volume 48, Issue 3
Gloria Bordogna, Alessandro Campi, Giuseppe Psaila, Stefania Ronchi
In this paper, we face the so called “ranked list problem” of Web searches, that occurs when users submit short requests to search engines. Generally, as a consequence of terms’ ambiguity and polysemy, users engage long cycles of query reformulation in an attempt to capture relevant information in the top ranked results. The overall objective of the proposal is to support the user in optimizing Web searches, by reducing the need for long search iterations. Specifically, in this paper we describe an iterative query disambiguation mechanism that follows three main phases. (1) The results of a Web search performed by the user (by submitting a query to a search engine) are clustered. (2) Clusters are ranked, based on a personalized balance of their content-similarity to the query and their novelty. (3) From each cluster, a disambiguated query that highlights the main contents of the cluster is generated, in such a way the new query is potentially capable to retrieve new documents, not previously retrieved; the disambiguated queries are suggestions for possibly new and more focused searches. The paper describes the proposal, illustrating a sample application of the mechanism. Finally, the paper presents a user’s evaluation experiment of the proposed approach, comparing it with common practice based on the direct use of search engines.
Categories: Journals we follow
Fuzzy ILP Classification of web reports after linguistic text mining
Publication year: 2012
Source:Information Processing & Management, Volume 48, Issue 3
Jan Dědek, Peter Vojtáš, Marta Vomlelová
In this paper we study the problem of classification of textual web reports. We are specifically focused on situations in which structured information extracted from the reports is used for classification. We present an experimental classification system based on usage of third party linguistic analyzers, our previous work on web information extraction, and fuzzy inductive logic programming (fuzzy ILP). A detailed study of the so-called ‘Fuzzy ILP Classifier’ is the main contribution of the paper. The study includes formal models, prototype implementation, extensive evaluation experiments and comparison of the classifier with other alternatives like decision trees, support vector machines, neural networks, etc.
Source:Information Processing & Management, Volume 48, Issue 3
Jan Dědek, Peter Vojtáš, Marta Vomlelová
In this paper we study the problem of classification of textual web reports. We are specifically focused on situations in which structured information extracted from the reports is used for classification. We present an experimental classification system based on usage of third party linguistic analyzers, our previous work on web information extraction, and fuzzy inductive logic programming (fuzzy ILP). A detailed study of the so-called ‘Fuzzy ILP Classifier’ is the main contribution of the paper. The study includes formal models, prototype implementation, extensive evaluation experiments and comparison of the classifier with other alternatives like decision trees, support vector machines, neural networks, etc.
Categories: Journals we follow
An approach to web-based Personal Health Records filtering using fuzzy prototypes and data quality criteria
Publication year: 2012
Source:Information Processing & Management, Volume 48, Issue 3
Francisco P. Romero, Ismael Caballero, Jesus Serrano-Guerrero, Jose A. Olivas
Nowadays, new ways of managing and accessing to health-care information are continuously appearing. Web-based Personal Health Records (web PHRs) have the potential to make data about health-care available to clinicians, researchers and students in different medical contexts and applications. Therefore, the amount of web PHRs accessible through Internet has grown enormously and as a result health-care professionals are currently burdened with more and more data. It’s probable that these data, unfortunately, have not always the adequate levels of quality, making that their work cannot always be as successful as expected. As a way of alleviating this fact, the present work is focused on improving the document filtering results in the context of web PHRs management. To achieve this goal, a new kind of document filtering model is proposed. This model is based on fuzzy prototypes which are defined by means of conceptual prototypes. These prototypes are obtained by using a data quality analysis of documents. This analysis guarantees that filtered information will be relevant enough for the information user. The complete model provides an efficient strategy of document filtering that can be very useful when it is necessary to deal with a constant flow of new information.
Source:Information Processing & Management, Volume 48, Issue 3
Francisco P. Romero, Ismael Caballero, Jesus Serrano-Guerrero, Jose A. Olivas
Nowadays, new ways of managing and accessing to health-care information are continuously appearing. Web-based Personal Health Records (web PHRs) have the potential to make data about health-care available to clinicians, researchers and students in different medical contexts and applications. Therefore, the amount of web PHRs accessible through Internet has grown enormously and as a result health-care professionals are currently burdened with more and more data. It’s probable that these data, unfortunately, have not always the adequate levels of quality, making that their work cannot always be as successful as expected. As a way of alleviating this fact, the present work is focused on improving the document filtering results in the context of web PHRs management. To achieve this goal, a new kind of document filtering model is proposed. This model is based on fuzzy prototypes which are defined by means of conceptual prototypes. These prototypes are obtained by using a data quality analysis of documents. This analysis guarantees that filtered information will be relevant enough for the information user. The complete model provides an efficient strategy of document filtering that can be very useful when it is necessary to deal with a constant flow of new information.
Categories: Journals we follow
