IfI Colloquium: Mining Large Cultural Heritage Corpora Using Deep Learning Methods, October 26, 2017

Speaker: Prof. Frédéric Kaplan, Ph.D.
EPFL, Switzerland

Date: Thursday, October 26, 2017, 17:15 h

Location: BIN 2.A.01


I will report on our ongoing investigations on three large-scale cultural heritage datasets: 4 Million Swiss newspapers articles covering a two hundred years period, 1 Million photographs of artworks currently under digitisation at the Cini Foundation and the Venice Time Machine continuously expanding corpora covering documents from a 1000 years period. The Swiss newspaper archives is sufficiently large to test word embeddings methods like Word2Vec, and study how they perform in diachronic contexts for which words progressively change meanings as language itself evolves. On the Artworks databases we are using convolutional neural networks for finding similarity between paintings, engravings, drawings and sculpture and design architectures for efficiently spotting matching details. Eventually, we combine these two approaches to try to crack one of the hardest problem of the Venice Time Machine: the direct projection of graphical forms in semantic spaces without passing through the currently impractical full textual transcription of the digitised documents.


Prof. Frédéric Kaplan, Ph.D., holds the Digital Humanities Chair at Ecole Polytechnique Fédérale de Lausanne (EPFL) and directs the EPFL Digital Humanities Laboratory (DHLAB). He conducts research projects combining archive digitisation, information modelling and museographic design. He is currently directing the "Venice Time Machine", an international project in collaboration with the Ca'Foscari University in Venice and the Venice State Archives, aiming to model the evolution and history of Venice over a 1000 year period.