Toggle Main Menu Toggle Search

Open Access padlockePrints

Ferry: Toward Better Understanding of Input/Output Space for Data Wrangling Scripts

Lookup NU author(s): Dr Xinhuan ShuORCiD

Downloads


Licence

This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).


Abstract

Understanding the input and output of data wrangling scripts is crucial for various tasks like debugging code and onboarding new data. However, existing research on script understanding primarily focuses on revealing the process of data transformations, lacking the ability to analyze the potential scope, i.e., the space of script inputs and outputs. Meanwhile, constructing input/output space during script analysis is challenging, as the wrangling scripts could be semantically complex and diverse, and the association between different data objects is intricate. To facilitate data workers in understanding the input and output space of wrangling scripts, we summarize ten types of constraints to express table space and build a mapping between data transformations and these constraints to guide the construction of the input/output for individual transformations. Then, we propose a constraint generation model for integrating table constraints across multiple transformations. Based on the model, we develop Ferry, an interactive system that extracts and visualizes the data constraints describing the input and output space of data wrangling scripts, thereby enabling users to grasp the high-level semantics of complex scripts and locate the origins of faulty data transformations. Besides, Ferry provides example input and output data to assist users in interpreting the extracted constraints and checking and resolving the conflicts between these constraints and any uploaded dataset. Ferry's effectiveness and usability are evaluated through two usage scenarios and two case studies, including understanding, debugging, and checking both single and multiple scripts, with and without executable data. Furthermore, an illustrative application is presented to demonstrate Ferry's flexibility.


Publication metadata

Author(s): Luo Z, Xiong K, Zhu J, Chen R, Shu X, Weng D, Wu Y

Publication type: Article

Publication status: Published

Journal: IEEE Transactions on Visualization and Computer Graphics

Year: 2025

Volume: 31

Issue: 1

Pages: 1202-1212

Print publication date: 01/01/2025

Online publication date: 10/09/2024

Acceptance date: 15/07/2024

Date deposited: 17/11/2024

ISSN (print): 1077-2626

ISSN (electronic): 1941-0506

Publisher: IEEE Computer Society

URL: https://doi.org/10.1109/TVCG.2024.3456328

DOI: 10.1109/TVCG.2024.3456328

ePrints DOI: 10.57711/ncw7-kp48


Altmetrics

Altmetrics provided by Altmetric


Funding

Funder referenceFunder name
2022YFE0137800
2023C01120
Collaborative Innovation Center of Artificial Intelligence by MOE and Zhejiang Provincial Government (ZJU)
Key “Pioneer” R&D Projects of Zhejiang Province
National Key R&D Program of China
NSFC
U22A2032

Share