PULS
Foto: Matthias Friel
In this practical course, we will look at current models that work at the intersection of natural language processing and computer vision, specifically those that turn images into words. We will do this in the context of a (somewhat) practical application. We will build an interactive system that can serve as the "eyes" of the user, through and with which the user can investigate a virtual environment. This takes the existing language & vision models out of their usual laboratory environment, in which they are tested on automated metrics, and into the real of actual use; we will see how well they do.The students will gain familiarity with current state-of-the-art models in Language and Vision, as well as some insight into modelling dialogue. On the software engineering side, questions of how to actually deploy deep learning models will become relevant as well. The project will be group work, with the division of tasks to be determined in the first meetings.
© Copyright HISHochschul-Informations-System eG