Toggle Main Menu Toggle Search

Open Access padlockePrints

Semantic-aware blocking for entity resolution

Lookup NU author(s): Dr Huizhi Liang

Downloads

Full text for this publication is not currently held within this repository. Alternative links are provided below where available.


Abstract

In this work we propose a semantic-aware blocking framework for entity resolution (ER). The proposed framework is built using locality-sensitive hashing (LSH) techniques to efficiently unify both textual and semantic features into an ER blocking process. In order to understand how similarity metrics may affect the effectiveness of ER blocking, we study the robustness of similarity metrics and their properties in terms of LSH families. We further discuss how the semantic similarity of records can be captured, measured, and integrated with LSH techniques over multiple similarity spaces. We have evaluated our proposed framework over two real-world data sets, and compared it with the state-of-the-art blocking techniques. The experimental study shows that using a combination of semantic features and textual features can considerably improve the quality of blocking. Due to the probabilistic nature of LSH, this semantic-aware blocking framework also enables us to build fast and reliable blocking for performing entity resolution tasks in a large-scale data environment.


Publication metadata

Author(s): Wang Q, Cui M, Liang H

Publication type: Article

Publication status: Published

Journal: Transactions on Knowledge and Data Engineering

Year: 2016

Volume: 28

Issue: 1

Pages: 166-180

Print publication date: 01/01/2016

Online publication date: 14/08/2015

Acceptance date: 07/08/2015

ISSN (print): 1041-4347

ISSN (electronic): 1558-2191

Publisher: IEEE

URL: https://doi.org/10.1109/TKDE.2015.2468711

DOI: 10.1109/TKDE.2015.2468711


Altmetrics

Altmetrics provided by Altmetric


Share