An efficient information retrieval system using evolutionary algorithms

Mhawi, Doaa N., Oleiwi, Haider W., Saeed, Nagham ORCID: https://orcid.org/0000-0002-5124-7973 and Al-Taie, Heba L. (2022) An efficient information retrieval system using evolutionary algorithms. Network, 2 (4). pp. 583-605.

[thumbnail of network-02-00034.pdf]
Preview
PDF
network-02-00034.pdf - Published Version
Available under License Creative Commons Attribution.

Download (847kB) | Preview

Abstract

When it comes to web search, information retrieval (IR) represents a critical technique as web pages have been increasingly growing. However, web users face major problems; unrelated user query retrieved documents (i.e., low precision), a lack of relevant document retrieval (i.e., low recall), acceptable retrieval time, and minimum storage space. This paper proposed a novel advanced document-indexing method (ADIM) with an integrated evolutionary algorithm. The proposed IRS includes three main stages; the first stage (i.e., the advanced documents indexing method) is preprocessing, which consists of two steps: dataset documents reading and advanced documents indexing method (ADIM), resulting in a set of two tables. The second stage is the query searching algorithm to produce a set of words or keywords and the related documents retrieving. The third stage (i.e., the searching algorithm) consists of two steps. The modified genetic algorithm (MGA) proposed new fitness functions using a cross-point operator with dynamic length chromosomes with the adaptive function of the culture algorithm (CA). The proposed system ranks the most relevant documents to the user query by adding a simple parameter (∝) to the fitness function to guarantee the convergence solution, retrieving the most relevant user’s document by integrating MGA with the CA algorithm to achieve the best accuracy. This system was simulated using a free dataset called WebKb containing Worldwide Webpages of computer science departments at multiple universities. The dataset is composed of 8280 HTML-programed semi-structured documents. Experimental results and evaluation measurements showed 100% average precision with 98.5236% average recall for 50 test queries, while the average response time was 00.46.74.78 milliseconds with 18.8 MB memory space for document indexing. The proposed work outperforms all the literature, comparatively, representing a remarkable leap in the studied field.

Item Type: Article
Identifier: 10.3390/network2040034
Keywords: culture algorithm; document indexing method; evolutionary algorithm; genetic algorithm; information retrieval systems
Subjects: Computing > Innovation and user experience > Usability
Related URLs:
Depositing User: Nagham Saeed
Date Deposited: 28 Oct 2022 18:32
Last Modified: 19 Sep 2024 13:15
URI: https://repository.uwl.ac.uk/id/eprint/9576

Downloads

Downloads per month over past year

Actions (login required)

View Item View Item

Menu