This paper is published in the 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC)This paper describes an ongoing project that aims to provide people with physical disabilities a method to control a robot arm using voice commands.The goal is to enable the user to control the system using natural language, i.e., without learning a special robot control vocabulary. The work describes the design and evaluation of a real-time framework that combines speech recognition, camera-based object detection, and an inference module that matches the potentially ambiguous results of speech recognition with object detection outputs to generate a unique control input for a computer vision-based robot arm.