Research

At MIT, I was a member of the Energy-Efficient Multimedia Systems Group led by Professor Vivienne Sze. My projects initially focused on communication interfaces between PCs, FPGAs, and embedded computing platforms. As part of the 2017-2018 SuperUROP cohort, I began working on low-latency learning-based depth estimation research. My work was recognized as one of the top three projects completed that year. I continued to expand on this work as a graduate student, with it culminating in my master's thesis on fast and energy-efficient monocular depth estimation on embedded systems.

Publications

Fast and Energy-Efficient Monocular Depth Estimation on Embedded Systems
Diana Wofk. MEng Thesis, advised by Professor Vivienne Sze.
Massachusetts Institute of Technology, May 2020.

Depth sensing is critical for many robotic tasks such as localization, mapping and obstacle detection. There has been a growing interest in performing depth estimation from monocular RGB images, due to the relatively low cost and form factor of RGB cameras. However, state-of-the-art depth estimation algorithms are based on fairly large deep neural networks (DNNs) that have high computational complexity and energy consumption. This poses a significant challenge to performing real-time depth estimation on embedded platforms. Our work addresses this problem.

We first present FastDepth, an efficient low-latency encoder-decoder DNN comprised of depthwise separable layers and incorporating skip connections to sharpen depth output. After deployment steps including hardware-specific compilation and network pruning, FastDepth runs at 27-178 fps on the Jetson TX2 CPU/GPU, with total power consumption of 10-12 W. When compared with prior work, FastDepth achieves similar accuracy while running an order of magnitude faster.

We then aim to improve energy-efficiency by deploying FastDepth onto a low-power embedded FPGA. Using an algorithm-hardware co-design approach, we develop an accelerator in conjunction with modifying the FastDepth DNN to be more accelerator-friendly. Our accelerator natively runs depthwise separable layers using a reconfigurable compute core that exploits several types of compute parallelism and toggles between dataflows dedicated to depthwise and pointwise convolutions. We modify the FastDepth DNN by moving skip connections and decomposing larger convolutions in the decoder into smaller ones that better map onto our compute core. This enables a 21% reduction in data movement, while ensuring high spatial utilization of accelerator hardware. On the Ultra96 SoC, our accelerator runs FastDepth layers in 29 ms with a total system power consumption of 6.1 W. When compared to the TX2 CPU, the accelerator achieves 1.5-2 times improvement in energy-efficiency.

@mastersthesis{wofk_2020_thesis, 
      
                    author = {{Wofk, Diana}}, 
 
                    title  = {{Fast and Energy-Efficient Monocular Depth Estimation on Embedded Systems}}, 
 
                    school = {{Massachusetts Institute of Technology}}, 
 
                    year   = {{2020}} 

                  }

FastDepth: Fast Monocular Depth Estimation on Embedded Systems
Diana Wofk*, Fangchang Ma*, Tien-Ju Yang, Sertac Karaman, Vivienne Sze.
IEEE International Conference on Robotics and Automation, May 2019.

Depth sensing is a critical function for robotic tasks such as localization, mapping and obstacle detection. There has been a significant and growing interest in depth estimation from a single RGB image, due to the relatively low cost and size of monocular cameras. However, state-of-the-art single-view depth estimation algorithms are based on fairly complex deep neural networks that are too slow for real-time inference on an embedded platform, for instance, mounted on a micro aerial vehicle. In this paper, we address the problem of fast depth estimation on embedded systems. We propose an efficient and lightweight encoder-decoder network architecture and apply network pruning to further reduce computational complexity and latency. In particular, we focus on the design of a low-latency decoder. Our methodology demonstrates that it is possible to achieve similar accuracy as prior work on depth estimation, but at inference speeds that are an order of magnitude faster. Our proposed network, FastDepth, runs at 178 fps on an NVIDIA Jetson TX2 GPU and at 27 fps when using only the TX2 CPU, with active power consumption under 10 W. FastDepth achieves close to state-of-the-art accuracy on the NYU Depth v2 dataset. To the best of the authors' knowledge, this paper demonstrates real-time monocular depth estimation using a deep neural network with the highest throughput on an embedded platform that can be carried by a micro aerial vehicle.

@inproceedings{icra_2019_fastdepth, 
      
                    author    = {{Wofk, Diana and Ma, Fangchang and Yang, Tien-Ju and Karaman, Sertac and Sze, Vivienne}}, 

                    title     = {{FastDepth: Fast Monocular Depth Estimation on Embedded Systems}}, 
 
                    booktitle = {{IEEE International Conference on Robotics and Automation (ICRA)}}, 
 
                    year      = {{2019}} 

                  }

Presentations

"Fast and Energy-Efficient Monocular Depth Estimation on Embedded Systems." MIT.nano Webinar Series, July 2020.

"Energy-Efficient Deep Neural Network for Depth Prediction." MIT EECS SuperUROP Showcase, April 2018.

Hello!

Education

Skills

Research

Publications

Presentations

Additional Work Experience

Contact