The proposed DCAI Workshop is highly relevant to WWW’24, with its explicit focus on integrating DCAI techniques with the World Wide Web. By offering a dedicated platform for discussions and presentations, the workshop sheds light on how data-centric approaches enrich the Web’s technical infrastructure and deepen our comprehension of it. Additionally, our workshop’s aim of constructing better data aligns with the conference’s scope of democratizing access to Web content and technologies, thereby promoting inclusivity, fairness, and accountability.

The DCAI Workshop is anticipated to attract a diverse audience, including but not limited to researchers in the fields of AI, machine learning, and data science, and practitioners interested in integrating DCAI solutions into their applications. Given the recent surge in interest in DCAI, we anticipate substantial participation, with an estimated 50 submissions and an attendance of 100 individuals.


We have a distinguished history of organizing impactful tutorials on Data-centric AI, with our recent endeavor at KDD2023 serving as a testament to our commitment and expertise in the field. Our tutorials, combined with our comprehensive surveys on Data-centric AI, highlight our dedication and influential stance in the rapidly evolving landscape of Data-centric AI. Our proven track record ensures that we possess the capability to garner attention and engage a wide audience. To further our outreach, we have plans in place to launch a dedicated Twitter account for the workshop, ensuring real-time updates and engagement with potential attendees. Beyond this, our advertising strategy is multifaceted. We intend to leverage various platforms, from mainstream social media like Facebook and LinkedIn to specialized platforms pertinent to our domain. Moreover, our consistent engagement with the community through various channels, and the upcoming creation of our workshop’s Twitter handle, will act as a beacon to attract interest and high-quality submissions. By integrating all these avenues, we aim to ensure maximum visibility and foster an environment conducive to knowledge-sharing and collaborative growth in the realm of Data-centric AI.

Previous workshops have addressed data-centric approaches in AI. The DMLR workshop at ICML 2023 highlighted the importance of data quality, bringing together researchers to discuss data generation, labeling, and governance. The NeurIPS 2021 Data-Centric AI Workshop focused on practical challenges in data, including collection, preprocessing, and quality evaluation. The DataPerf workshop at ICML 2022 introduced benchmarks for evaluating ML datasets, emphasizing the role of data in AI research. Our workshop differs by aiming to provide a deeper exploration of both the theoretical and practical aspects of Data-centric AI. We invite contributions ranging from Data Augmentation techniques to data-centric methods for graphs, offering a more holistic examination of DCAI for the academic community.


We have connected with a diverse group of scholars interested in serving as program committee members for our workshop. These individuals come from both academia and industry, and our selection includes those from underrepresented backgrounds in STEM, such as females and African Americans.

  • Jamell Dacon,, Morgan State University
  • Mohammad Hashemi,, Emory University
  • Jay Revolinsky,, Michigan State University
  • Harry Shomer,, Michigan State University
  • Geri Skenderi,, University of Verona
  • Lu Lin,, the Pennsylvania State University
  • Carl Yang,, Emory University
  • Simon Liu,, Emory University
  • Jie Ren,, Michigan State University
  • Xiaorui Liu,, North Carolina State University
  • Tyler Derr,, Vanderbilt University
  • Hua Liu,, Shandong University
  • Tong Zhao,, Snap Inc.
  • Xianfeng Tang,, Amazon
  • Haomining Jiang,, Amazon
  • Zheng Li,, Amazon
  • Neil Shah,, Snap Research
  • Tong Zhao,, Snap Research
  • William Shiao,, University of California Riverside
  • Yozen Liu, Snap Research

In addition, we will also invite senior PhD students from Emory University, University of Illinois Urbana-Champaign, New York University Shanghai, Rensselaer Polytechnic Institute, University of Wisconsin Madison, and University of Washington.


Team Diversity: Our organizing team combines academic scholars and industry practitioners, fostering diversity of thought and experience. We are proud to include two accomplished female scholars, further emphasizing our commitment to diversity and inclusivity.

We recognize that diverse participant backgrounds contribute to richer discussions at our workshop. As a result, we especially encourage participants from backgrounds that are under-represented in the WWW community to participate our workshop. To attract diverse participants, we plan to implement the following strategies:

Reaching Out to Universities: Our team is composed of members from diverse universities who have experience collaborating with people from different regions. We plan to leverage our expertise to partner with universities in various regions and promote the workshop to their students. We can reach out to faculty members who teach data science and related courses to encourage their students to participate in this workshop.

Leveraging Social Media: Social media platforms such as Facebook, Twitter, and LinkedIn can be powerful tools for promoting the competition. Notably, many of our team members have thousands of followers on the Internet. We will use these platforms to target different regions with specific messaging that resonates with their interests and motivations.

Collaborating with ACM’s Women in Computing (ACM-W): We plan to collaborate with ACM-W, a professional organization for women in computing, to encourage more female participation in the competition and promote gender diversity and inclusivity. Moreover, we can invite some ACM-W members to give talks at our workshop. This will help create a welcoming and supportive environment for women in computing and provide them with resources to succeed in the field.