Automation of the Timed-Up-and-Go test using a conventional video camera


The Timed-Up-and-Go (TUG) test is a simple clinical tool commonly used to quickly assess the mobility of patients. Researchers have endeavored to automate the test using sensors or motion tracking systems to improve its accuracy and to extract more resolved information about its sub-phases. While some approaches have shown promise, they often require the donning of sensors or the use of specialized hardware, such as the now discontinued Microsoft Kinect, which combines video information with depth sensors (RGBD). In this work, we leverage recent advances in computer vision to automate the TUG test using a regular RGB video camera without the need for custom hardware or additional depth sensors. Thirty healthy participants were recorded using a Kinect V2 and a standard video feed while performing multiple trials of 3 and 1.5 meter versions of the TUG test. A Mask Regional Convolutional Neural Net (R-CNN) algorithm and a Deep Multitask Architecture for Human Sensing (DMHS) were then used together to extract global 3D poses of the participants. The timing of transitions between the six key movement phases of the TUG test were then extracted using heuristic features extracted from the time series of these 3D poses. The proposed video-based vTUG system yielded the same error as the standard Kinect-based system for all six key transitions points, and average errors of less than 0.15 seconds from a multi-observer hand labeled ground truth. This work describes a novel method of video-based automation of the TUG test using a single standard camera, removing the need for specialized equipment and facilitating the extraction of additional meaningful information for clinical use.