Auditory image understanding for the visually impaired based on a modular computer vision sonification model

Banf, Michael

Citation link: https://nbn-resolving.org/urn:nbn:de:hbz:467-7716

Files in This Item:

File	Description	Size	Format
banf.pdf		44.57 MB	Adobe PDF	View/Open

Dokument Type:	Doctoral Thesis
metadata.dc.title:	Auditory image understanding for the visually impaired based on a modular computer vision sonification model Akustisches Bildverständnis für Sehbehinderte basierend auf einem modularen Computer Visions Sonifikations Modell
Authors:	Banf, Michael
Institute:	Institut für Bildinformatik
Free keywords:	Computer Vision, Sonifikation, Computer Mensch Interaktion, Assistive Systeme
Dewey Decimal Classification:	004 Informatik
GHBS-Clases:	TVVC TWK
Issue Date:	2013
Publish Date:	2013
Abstract:	Die vorliegende Arbeit beschreibt ein System das blinden Menschen einen direkt erfahrbaren Zugang zu Bildern mit Hilfe akustischer Signale anbietet. Der Benutzer exploriert ein Bild interaktiv auf einem berührungsempfindlichen Bildschirm und erhält eine akustische Rückmeldung über den Bildinhalt an der jeweiligen Fingerposition. Die Gestaltung eines solchen Systems beinhaltet zwei größere Herausforderungen: Welche ist die relevante Bildinformation, und wie kann möglichst viel Information in einem Audiosignal untergebracht werden. Wir behandeln diese Probleme basierend auf einem modularen Computer Vision Sonikations Modell, welches wir als grundlegendes Gerüst für die Aufnahme, Exploration und Sonikation von visueller Information zur Unterstützung blinder Menschen vorstellen. Es werden einige Ansätze vorgestellt, welche hierzu die Information auf verschiedenen Abstraktionsebenen kombinieren. So z.B. sehr grundlegende Information wie Farbe, Kanten und Rauigkeit und komplexere Information welche durch die Verwendung von Machine Learning Algorithmen gewonnen werden kann. Diese Machine Learning Algorithmen behandeln sowohl das Erkennen von Objekten als auch die Klassikation von Bildregionen in "künstlich" und "natürlich", basierend auf einem neu entwickelten Typs eines probabilistischen graphischen Modells. Wir zeigen, dass dieser Mehr-Ebenen Ansatz dem Benutzer direkten Zugang zum Wesen und Position von Objekten und Strukturen im Bild ermöglicht und gleichzeitig das Potential neuester Entwicklungen im Bereich Computer Vision und Machine Learning ausnutzt. Während der Exploration kann der Benutzer erkannte "künstliche" Strukturen oder bestimmte natürliche Regionen als Referenzpunkte verwenden um andere natürliche Regionen mit Hilfe deren individueller Position, Farbe und Texturen zu klassizieren. Wir werden zeigen, dass geburtsblinde Teilnehmer diese Strategie erfolgreich einsetzen um ganze Szenen zu interpretieren und zu verstehen. This thesis presents a system that strives to give visually impaired people direct perceptual access to images via an acoustic signal. The user explores the image actively on a touch screen or touch pad and receives auditory feedback about the image content at the current position. The design of such a system involves two major challenges: what is the most useful and relevant image information, and how can as much information as possible be captured in an audio signal. We address those problems, based on a Modular Computer Vision Sonication Model, which we propose as a general framework for acquisition, exploration and sonication of visual information to support visually impaired people. General approaches are presented that combine low-level information, such as color, edges, and roughness, with mid- and high-level information obtained from Machine Learning algorithms. This includes object recognition and the classication of regions into the categories "man-made" versus "natural" based on a novel type of discriminative graphical model. We argue that this multi-level approach gives users direct access to the identity and location of objects and structures in the image, yet it still exploits the potential of recent developments in Computer Vision and Machine Learning. During exploration, the user can utilize detected man made structures or specic natural regions as reference points to classify other natural regions by their individual location, color and texture. We show that congenital blind participants employ that strategy successfully to interpret and understand whole scenes.
URN:	urn:nbn:de:hbz:467-7716
URI:	https://dspace.ub.uni-siegen.de/handle/ubsi/771
License:	https://dspace.ub.uni-siegen.de/static/license.txt
Appears in Collections:	Hochschulschriften

This item is protected by original copyright

View License

Show full item record

Page view(s)

829

checked on Dec 3, 2024

Download(s)

512

checked on Dec 3, 2024

Google Scholar^TM

Check

Opus Siegen

Files in This Item:

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM