DCAI Workshop

WWW’24 Workshop Proposal DCAI Data-centric Artificial Intelligence

Workshop Image


The emergence of Data-centric AI (DCAI) represents a pivotal shift in AI development, redirecting focus from model refinement to prioritizing data quality. This paradigmatic transition emphasizes the critical role of data in AI. While past approaches centered on refining models, they often overlooked potential data imperfections, raising questions about the true potential of enhanced model performance. DCAI advocates the systematic engineering of data, complementing existing efforts and playing a vital role in driving AI success. This transition has spurred innovation in various machine learning and data mining algorithms and their applications on the Web. Therefore, we propose the DCAI Workshop at WWW’24, which offers a platform for academic researchers and industry practitioners to showcase the latest advancements in DCAI research and their practical applications in the real world.


Data-centric AI (DCAI) is a burgeoning concept that shifts our attention from advancing model design towards the pursuit of data excellence, which marks a significant surge in recognizing the crucial importance of data in the realm of AI. In the past, AI was predominantly viewed through a model-centric lens, with a primary focus on refining model designs to boost AI performance using fixed datasets. However, this approach tends to neglect potential data imperfections, such as missing values, incorrect labels, and anomalies. This raises the critical question of whether the numerous enhancements in model performance authentically reflect the model’s true potential, or if they are merely a consequence of overfitting to the dataset. DCAI represents an emerging frontier that complements existing efforts, underscoring the systematic engineering of data in AI development, and plays an increasingly important role in propelling AI to success.

The transition to DCAI has ignited a wave of innovation in machine learning and data mining algorithms. These encompass a spectrum of cutting-edge techniques such as graph learning, trustworthy machine learning, and Large Language Models (LLMs). These advancements have found applications not only in computer science but have also permeated diverse domains including finance, information systems, mechanical engineering, robotics, and beyond. In this workshop, our objective is to delve into the recent advances in both the theoretical underpinnings and practical applications of DCAI.


We enthusiastically invite submissions that focus on recent breakthroughs in the research and development of DCAI, coupled with their real-world applications. Contributions in the form of theory, methodology, and application papers are encouraged from areas including but not limited to:

  • Data Augmentation Methods
  • Efficient Data Labeling Methods
  • Data Collection Strategies
  • Anomaly Detection and Label Correction Methods
  • Data Engineering Methods (e.g., Data Cleaning, Data Imputation, Feature Selection, Dimension Reduction)
  • Data-centric Methods for Graphs (e.g., Graph Structure Learning, Graph Augmentation, Graph Condensation)
  • Data-centric Methods for Machine Learning Robustness, Security, Interpretability, and Fairness.
  • Data-centric Methods for Large Language Models (e.g., Training Data Curation, Evaluation Data Construction, and Prompt Engineering)
  • Data-centric Methods in other domains (e.g., finance, information systems, mechanical engineering, robotics, etc.)
  • Proposing New Datasets or Benchmarks

The relevant topics are not confined to those listed above. We welcome all contributions that are pertinent to the WWW community, with a focus on developing, iterating, and maintaining data to drive its advancement and help build better AI algorithms.

Submission Details

We welcome submissions of papers ranging from 4 to 8 pages as main content, with up to 2 additional pages containing references and an optional appendix. All submissions must be in PDF format and formatted according to the new ACM format published in ACM guidelines (e.g., using the ACM LaTeX template on Overleaf here) and selecting the “sigconf” sample. Following the WWW’24 conference submission policy, reviews are double-blind, and author names and affiliations should NOT be listed. Submitted works will be assessed based on their novelty, technical quality, potential impact, and clarity of writing (and should be in English). For papers that primarily rely on empirical evaluations, the experimental settings and results should be clearly presented and repeatable. We encourage authors to make data and code available publicly when possible.

Accepted papers will be posted on this workshop website. By default, accepted papers will not appear in the WWW’24 proceedings and are thus non-archival. This allows authors to submit works that are concurrently under review elsewhere or published. However, authors also have the option to choose archival status, in which case the paper will be included in the official proceeding. Please send us an email in advance if you would like to include the accepted paper in the official proceeding.

The best paper (according to the reviewers’ ratings and organizing committee) will be announced at the end of the workshop.

All submissions must be uploaded electronically to EasyChair at: Click this link.

At least one of the authors of the accepted workshop papers must register for the workshop and be present on the day of the workshop.

For questions regarding submissions, please contact us at: liangliangz6v6@gmail.com and mohammad.hashemi@emory.edu

Important Dates

The important dates of the workshop should not be later than:

  • Workshop paper submission: February 10 February 15, 2024
  • Workshop paper notification: March 4, 2024
  • Workshop paper camera-ready: March 11, 2024 (FIRM)

All submission deadlines are end-of-day in the Anywhere on Earth (AoE) time zone.