Multimodal Learning in AIoT Systems: Sensor Fusion and Vision-Based Intelligence

Agnes Prima Wulanjari; Ria Dymyati; Indar Bismoko Indar Bismoko; Nuryake Fajaryati; Pipit Utami

doi:10.37676/jmcs.v4i2.11040

Agnes Prima Wulanjari Universitas Negeri Yogyakarta
Ria Dymyati Universitas Negeri Yogyakarta
Indar Bismoko Indar Bismoko Universitas Negeri Yogyakarta
Nuryake Fajaryati Universitas Negeri Yogyakarta
Pipit Utami Universitas Negeri Yogyakarta

DOI: https://doi.org/10.37676/jmcs.v4i2.11040

Keywords: Multimodal Learning, Artificial Intelligence of Things (AIoT), Sensor Fusion, Computer Vision, Meta-Analysis, Performance Evaluation

Abstract

This study evaluates the effectiveness of multimodal learning in Artificial Intelligence of Things (AIoT) systems, focusing on the integration of sensor fusion and computer vision for classification tasks. A systematic review and meta-analysis were conducted on studies published between 2020 and 2025. Thirteen studies met the inclusion criteria; however, only six provided comparable quantitative data due to inconsistent baseline reporting and evaluation practices. The results indicate that multimodal approaches generally improve accuracy compared to unimodal baselines when comparable evaluations are available, with an average increase of 8.88% (95% CI: 5.33%–12.44%, p < 0.001). High heterogeneity was observed, influenced by domain, sensor configuration, and model architecture. These findings suggest that multimodal effectiveness is conditional and depends on modality complementarity, fusion strategy, and system-level constraints

Downloads

Download data is not yet available.

Multimodal Learning in AIoT Systems: Sensor Fusion and Vision-Based Intelligence

Abstract

Downloads

Most read articles by the same author(s)