Toggle Main Menu Toggle Search

Open Access padlockePrints

SIB: Sorted-Integers-Based Index for Compact and Fast Caching in Top-Down Logic Rule Mining Targeting KB Compression

Lookup NU author(s): Professor Raj Ranjan

Downloads


Licence

This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).


Abstract

© 2025 The Author(s). Software: Practice and Experience published by John Wiley & Sons Ltd.Background: Mining logic rules from structured knowledge bases is the basis of knowledge engineering. Due to the NP-hardness of the rule mining problem, logic rules cannot be efficiently induced from knowledge bases, especially large-scale ones. Idea: In this article, we propose a compact and efficient index structure for the maintenance of the intermediate data during top-down rule mining, such that the memory consumption can be reduced and mining efficiency can be improved. Developing Points: The index is based on a mapping from constant symbols to integers and the sorting of the mapped integers. Index update has been dissembled into four basic operations. Moreover, the index itself acts as the cache during top-down mining. Value: Most contributions in existing works employ algorithmic and architectural optimizations to improve efficiency. Data-oriented optimizations have also been explored to some extent, but the data efficiency is relatively low, and the memory consumption is thus becoming a new challenge for state-of-the-art systems. We tackle this challenge in this article, and our technique has been proven more efficient than state-of-the-art systems. We evaluate our method on six datasets which contain up to 160 K records and are frequently used as benchmarks in tasks related to knowledge engineering. The experimental results show that the proposed technique speeds up the rule mining procedure by (Formula presented.) on average and reduces memory consumption by up to 70%. The space overhead of the data structure is about twice that of the indexed records, which is more than 80% lower than that of the state-of-the-art technique.


Publication metadata

Author(s): Wang R, Wong R, Sun D, Ranjan R

Publication type: Article

Publication status: Published

Journal: Software - Practice and Experience

Year: 2025

Pages: epub ahead of print

Online publication date: 07/02/2025

Acceptance date: 21/01/2025

Date deposited: 17/02/2025

ISSN (print): 0038-0644

ISSN (electronic): 1097-024X

Publisher: John Wiley and Sons Ltd

URL: https://doi.org/10.1002/spe.3415

DOI: 10.1002/spe.3415

Data Access Statement: The data that support the findings of this study are openly available in the public GitHub repository of SIB at https://github.com/TramsWang/SIB/ tree/main


Altmetrics

Altmetrics provided by Altmetric


Share