As we move into 2025, innovation in infrastructure as code continues to transform how we manage and deploThe paper titled “Enabling Autonomy through Voice Control: AI-Assisted Mobile Platform” by Ivelina Balcheva and Yassen Gorbounov explores the development of an autonomous mobile platform that utilizes voice control and artificial intelligence (AI) techniques for effective object recognition and decision-making. The research is driven by the rapid advancement of technology and the growing need for automation in various human activities, particularly in environments that are dangerous or hard to reach. The integration of AI and voice control systems is seen as a way to significantly improve the functionality, efficiency, and safety of these platforms, making them more intuitive and adaptive.
One of the key advantages of voice control in robotics is simpler communication between humans and machines. Voice commands allow robots to respond more quickly in dynamic environments, as they can adjust their actions based on real-time verbal cues from operators. This approach also increases accessibility, enabling people with physical disabilities to control robotic systems without the need for specialized training or equipment. Additionally, voice control enhances safety by reducing the need for direct physical interaction with machines, allowing operators to stay away from potentially hazardous environments while still efficiently regulating the machine’s actions. The integration of AI further enhances these benefits by enabling autonomous decision-making. AI algorithms allow robots to analyze vast amounts of data, adapt to changing environments, and make decisions independently, reducing the need for human intervention and improving overall efficiency.
The paper compares three AI models—GPT, BlackBox AI, and Vertex AI—to determine their suitability for the proposed platform. GPT excels in natural language understanding and generation but lacks object detection capabilities. BlackBox AI specializes in image processing and object recognition but does not offer optimized voice control. Vertex AI, on the other hand, provides a combination of flexibility and power, allowing for the creation of custom models that integrate both voice control and object recognition, making it the most suitable choice for the platform. From a hardware perspective, the paper compares the ESP32-CAM and NVIDIA Jetson Nano. The ESP32-CAM is suitable for basic tasks but has limited processing power for complex voice control and object recognition. In contrast, the NVIDIA Jetson Nano offers superior processing capabilities, making it better suited for advanced AI tasks, though it comes at a higher cost.
The proposed platform operates using a continuous loop that listens for voice commands, which are transcribed and processed using Google Cloud SDK, PyAudio, gSTT, and gTTS. The object detection module processes video input from a camera, using a pre-trained model to detect and track objects in real-time. The decision-making module handles command responses, system monitoring, and safety protocols. Testing demonstrated high accuracy and robustness across various environments and conditions, with the platform showing strong performance in object recognition and voice command processing.
In conclusion, the research successfully demonstrated a voice-controlled autonomous mobile platform with integrated AI capabilities. The platform’s high accuracy and speed in object recognition and voice command processing highlight its potential for future improvements, particularly in aiding people with visual impairments or limited mobility. The authors believe this work can serve as a foundation for further advancements in autonomous systems, with applications in home assistance, hazardous environments, and accessibility. The research was supported by a 2024 grant from The Central Fund for Strategic Development of the New Bulgarian University.
If you’re interested to read the entire paper follow the link you send an email to Ivelina Balcheva at ivab1414@gmail.com.