[DL] Semantic Web Challenge CfP (@ISWC2020)

Ziqi Zhang ziqi.zhang at sheffield.ac.uk
Tue Mar 3 13:20:00 CET 2020

Apologies for cross-posting!

Call for Participation:

Mining the Web of HTML-embedded Product Data

(co-located with ISWC2020)

1. Overview

The Semantic Web Challenge on Mining the Web of HTML-embedded Product Data
is co-located with the 19th International Semantic Web Conference (
https://iswc2020.semanticweb.org/, 2-6 Nov 2020 at Athens, Greece). The
challenge organises two shared tasks related to product data mining on the
Web: (1) product matching and (2) product classification. This event is
organised by The University of Sheffield, The University of Mannheim and
Amazon, and is open to anyone. Systems successfully beating the baseline of
the respective task, will be invited to write a paper describing their
method and system and present the method as a poster (and potentially also
a short talk) at the ISWC2020 conference. Winners of each task will be
awarded 500 euro as prize (partly sponsored by Peak Indicators,

2. Challenge website

For details of the challenge please visit

3. Important dates

02 March 2020: Google support group open. Please join the group at
https://groups.google.com/forum/#!forum/mwpd2020 if you wish to take part
in this event

16 March 2020: Release of the training and validation sets

01 June 2020: Release of the test set (without ground truth)

15 June 2020: Submission of system output

08 July 2020: Publication of system results and notification of acceptance
for presentation

4. Task and dataset brief

The challenge organises two tasks, product matching and product

i) Product Matching deals with identifying product offers on different
websites that refer to the same real-world product (e.g., the same iPhone X
model offered using different names/offer titles as well as different
descriptions on various websites). A multi-million product offer corpus
(16M) containing product offer clusters is released for the generation of
training data. A validation set containing 1.1K offer pairs and a test set
of 600 offer pairs will also be released. The goal of this task is to
classify if the offer pairs in these datasets are match (i.e., referring to
the same product) or non-match.

ii) Product classification deals with assigning predefined product category
labels (which can be multiple levels) to product instances (e.g., iPhone X
is a ‘SmartPhone’, and also ‘Electronics’). A training dataset containing
10K product offers, a validation set of 3K product offers and a test set of
3K product offers will be released. Each dataset contains product offers
with their metadata (e.g., name, description, URL) and three classification
labels each corresponding to a level in the GS1 Global Product
Classification taxonomy. The goal is to classify these product offers into
the pre-defined category labels.

All datasets are built based on structured data that was extracted from the
Common Crawl (https://commoncrawl.org/) by the Web Data Commons project (

5. Resources and tools

The challenge will also release utility code (in Python) for processing the
above datasets and scoring the system outputs. In addition, the following
language resources for product-related data mining tasks:


   A text corpus of 150 million product offer descriptions

   Word embeddings trained on the above corpus

6. Organizing committee


   Dr Ziqi Zhang (Information School, The University of Sheffield)

   Prof. Christian Bizer (Institute of Computer Science and Business
   Informatics, The Mannheim University)

   Dr Haiping Lu (Department of Computer Science, The University of

   Dr Jun Ma (Amazon Inc. Seattle, US)

   Prof. Paul Clough (Information School, The University of Sheffield &
   Peak Indicators)

   Ms Anna Primpeli (Institute of Computer Science and Business
   Informatics, The Mannheim University)

   Mr Ralph Peeters (Institute of Computer Science and Business
   Informatics, The Mannheim University)

   Mr. Abdulkareem Alqusair (Information School, The University of

7. Contact
To contact the organising committee please use the Google discussion group

Kind regards
Dr Ziqi Zhang
Lecturer in Social Media, Exams Officer
Room 323a, Regent Court, 211 Portobello, Information School, University of
Tel: +44 (0)114 222 2657
Other information and forms of contact: my iSchool webpage
<https://www.sheffield.ac.uk/is/staff/zhang>, personal website
<https://ziqizhang.github.io/>, LinkedIn
<https://www.linkedin.com/in/ziqi-zhang-68109615/>, Twitter
<https://twitter.com/ziqizhang_zz>, ORCID
<https://orcid.org/0000-0002-8587-8618>, Google Scholar

*Voted number one for student experience in the Russell Group and number
three in the UKTimes Higher Education Student Experience Survey 2017*

Find us on Facebook <http://www.facebook.com/ischoolsheffield>, follow us
on Twitter <http://www.twitter.com/infoschoolsheff>, read our latest news
on our Blog <http://information-studies.blogspot.co.uk/> and join our
community on LinkedIn

*I don't expect you to respond to my email outside your working hours. *
*At the University of Sheffield we value and encourage flexible working
patterns, so please be assured that I respect your working pattern and I am
looking forward to your response when you are next in work.*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.zfn.uni-bremen.de/pipermail/dl/attachments/20200303/693ae473/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ischool signature.png
Type: image/png
Size: 46941 bytes
Desc: not available
URL: <http://mailman.zfn.uni-bremen.de/pipermail/dl/attachments/20200303/693ae473/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: is.png
Type: image/png
Size: 31105 bytes
Desc: not available
URL: <http://mailman.zfn.uni-bremen.de/pipermail/dl/attachments/20200303/693ae473/attachment-0003.png>

More information about the dl mailing list