Multimodal Learning in AIoT Systems: Sensor Fusion and Vision-Based Intelligence

  • Agnes Prima Wulanjari Universitas Negeri Yogyakarta
  • Ria Dymyati Universitas Negeri Yogyakarta
  • Indar Bismoko Indar Bismoko Universitas Negeri Yogyakarta
  • Nuryake Fajaryati Universitas Negeri Yogyakarta
  • Pipit Utami Universitas Negeri Yogyakarta
Keywords: Multimodal Learning, Artificial Intelligence of Things (AIoT), Sensor Fusion, Computer Vision, Meta-Analysis, Performance Evaluation

Abstract

This study evaluates the effectiveness of multimodal learning in Artificial Intelligence of Things (AIoT) systems, focusing on the integration of sensor fusion and computer vision for classification tasks. A systematic review and meta-analysis were conducted on studies published between 2020 and 2025. Thirteen studies met the inclusion criteria; however, only six provided comparable quantitative data due to inconsistent baseline reporting and evaluation practices. The results indicate that multimodal approaches generally improve accuracy compared to unimodal baselines when comparable evaluations are available, with an average increase of 8.88% (95% CI: 5.33%–12.44%, p < 0.001). High heterogeneity was observed, influenced by domain, sensor configuration, and model architecture. These findings suggest that multimodal effectiveness is conditional and depends on modality complementarity, fusion strategy, and system-level constraints

Downloads

Download data is not yet available.
Published
2025-07-31
How to Cite
Wulanjari, A., Dymyati, R., Indar Bismoko, I. B., Fajaryati, N., & Utami, P. (2025). Multimodal Learning in AIoT Systems: Sensor Fusion and Vision-Based Intelligence. Jurnal Media Computer Science, 4(2), 461-478. https://doi.org/10.37676/jmcs.v4i2.11040
Section
Articles

Most read articles by the same author(s)