Fall 2017
NVIDIA
Applied Deep Learning Research
Deep Learning for Computer Vision and Realtime Graphics.
Fall 2014
Oak Ridge National Lab Internship
Continued Hetrogenous Framework Development for Exascale Computing and Hetrogenous Architecture integration with CESM workflow.
Summer 2014
Oak Ridge National Lab Internship
Hetrogenous Framework Development for Exascale Computing and Hetrogenous Architecture.
2011-2018
Clemson University Computer Engineering Ph.D. student
Framework for Life-cycle enrichment of HPC Applications towards Exascale Heterogeneous Architecture.  Advisor:Melissa C. Smith.
2011
Bachelor's Degree
Major in Computer Engineering and Minor in Mathematical Science.
Fall 2009, Summer 2010, Spring 2010, Summer 2011
Adtran,Inc Co-op Internsip
Worked on creating new features, debugging features in Adtran Operating System for network hardware equipment.
2008
Enrolled in Computer Engineering at Clemson University

I am a research scientist at NVIDIA in Santa Clara working computer vision and real-time computer graphics. I completed my PhD student from Clemson University, working with Melissa C. Smith. I received a B.Sc in Computer engineering with Mathematical Science minor from Clemson University in 2011.

Background:
I have worked on P2P, Cloud, Networking and Network Security, and Sensors during my undergraduate, first and second year of my Ph.D. I also worked at Adtran, INC for 1.5 year as a software engineer.During my PhD, I have worked on multiple projects; Mapping and Performance Prediction on High Performance Architecture, Global Gene Alignment in collaboration with Dr. Alex F. Feltus, Enabling Remote Surgery using HPC and, Secure Computing using GPUs in collaboration with Dr. Richard Brooks. I also collabrated with and worked at ORNL on Functional Partitioning Framework for enabling A2A framework and node-local post processing.

Current Projects:
I work in the Applied Research Deep Learnign team at NVIDIA applying DL for graphics.


My Complete CV is available here

Publications

Partial Convolution based Padding

G. Liu, K. Shih, T. Wang, F. Reda, K. Sapra, Z. Yu, A. Tao, B. Catanzaro

https://arxiv.org/abs/1811.11718

Improving Semantic Segmentation via Video Propagation and Label Relaxation

Y. Zhu*, K. Sapra*, F. Reda, K. Shih, S. Newsam, A. Tao, B. Catanzaro

Accepted Talk, CVPR 2019

G3NA-V: GPU-enabled tool for mining and aligning complex gene interaction graphs

K. Sapra, F. Feltus, M. Smith and J. Levine

Accepted Talk, GPU Technology Conference (GTC), 2016, San Jose, California

Framework for Lifecycle Enrichment of HPC Applications on Exascale Heterogeneous Architecture

K. Sapra

Ph.D. Doctoral Showcase, Supercomputing (SC), 2015, Texas, Austin

HPC Enabled Real-Time Remote Processing of Laparoscopic Surgery

K. Sapra, Z. Ronaghi, R. Izard, E. Duffy, M. C. Smith, KC Wang, D. Kwartowitz

Accepted Poster, Supercomputing (SC), 2015, Texas, Austin

Enhancing Collusion Resilience in Reputation Systems

H. Shen, Y. Lin, K. Sapra, Z. Li

IEEE Transactions on Parallel and Distributed Systems (TPDS), 2015

G3NA: GPU-enabled Gene Network Alignment Tool

K. Sapra, F. Feltus, M. Smith

GPU Technology Conference (GTC), 2015, San Jose, California

RIAL: Resource Intensity Aware Load Balancing in Clouds

L. Chen, H. Shen, K. Sapra

International Conference on Computer Communication (INFOCOM), 2014

A Social Network Integrated Reputation System for Cooperative P2P File Sharing

K.Chen, H. Shen, K. Sapra, G. Liu

International Conference on Communications and Network (ICCCN), 2013, Nassau

Circumventing Keyloggers and Screendump

K. Sapra, B. Husain, R. Brooks, M. Smith

MALWARE 2013, Puerto-Rico

Taxonomy Cube : A multi-Dimension Application-to-Architecture mapping

K. Sapra

Early Dissertation Showcase, Supercomputing Conference, Colorado (2013)

Collusion Detection in Reputation Systems for Peer-to-Peer Networks

K. Sapra, H. Shen, Z. Li

IEEE Transaction on Parallel and Distributed Systems (TPDS), Submitted

CEDAR: An Optimal and Distributed Strategy for Packet Recovery In Wireless Network

C. Qui, H.Shen, S.Soltani, K. Sapra, H. Jiang, J.Hallstorm

IEEE International Conference on Computer Communication (INFOCOMM), April 2013

Leveraging Social Networks to Combat Collusion in Reputation Systems for Peer-to-Peer Networks

Z. Li, H. Shen, K. Sapra

IEEE Journal on Transaction of Computers(TC)

Cooperative End to End Traffic Redundancy Elimination for Reducing Cloud Bandwidth Cost

L. Yu, K. Sapra, H. Shen, Lin Ye

IEEE International Conference on Network Protocols (ICNP), Oct 2012

Collusion Detection in Reputation Systems for Peer-to-Peer Networks

Z. Li, H. Shen, K. Sapra

The 41st International Conference on Parallel Processing (ICPP), Sept 2012

Leveraging Social Networks to Combat Collusion in Reputation Systems for Peer-to-Peer Networks

Z. Li, H. Shen, K. Sapra

IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2011, Anchorage, Alaska

Fast Image Mosaicking using Optical Flow

K. Sapra, S. Birchfield

Proc. of IEEE Southeastcon(SECON), 2011


Some of my previous projects

Improving Semantic Segmentation via Video Propagation and Label Relaxation
We propose an effective video prediction-based data synthesis method to scale up training sets in order to improve the accuracy of semantic segmentation networks. We also introduce a joint propagation strategy to alleviate mis-alignments in synthesized samples. Furthermore, we present a novel boundary relaxation technique to mitigate label noise. The label relaxation strategy can also be used for human annotated labels and not just synthesized labels. We achieve state-of-the-art performance on three benchmark datasets Cityscapes, CamVid and KITTI. A summarization video demo can be watched below.
Yi Zhu*(Co-Author), Karan Sapra*(Co-Author), Fitsum A. Reda, Kevin J. Shih, Shawn Newsam, Andrew Tao and Bryan Catanzaro
Partial Conv based padding
We present a simple yet effective padding scheme that can be used as a drop-in module for existing convolutional neural networks. We call it partial convolution based padding, with the intuition that the padded region can be treated as holes and the original input as non-holes. Specifically, during the convolution operation, the convolution results are re-weighted near image borders based on the ratios between the padded area and the convolution sliding window area. Extensive experiments with various deep network models on ImageNet classification and semantic segmentation demonstrate that the proposed padding scheme consistently outperforms standard zero padding with better accuracy.
Guilin Liu, Kevin J. Shih, Ting-Chun Wang, Fitsum A. Reda, Karan Sapra, Zhiding Yu, Andrew Tao, Bryan Catanzaro
Application to Architecture ( A2A ) Mapping
One of the challenging task is mapping finding an optimal architecture for a given application. Optimality can be either performance, scaling, energy/watt. My dissertation will focus on finding an optimal architecture for a given application.
Karan Sapra, Melissa C. Smith
Heterogeneous Functional Partitioning Framework
One of the challenging task in Computing Facilities such as Titan Supercomputer is under or non-utilization of cores or/and the accelerator on a node. We propose framework that allows offloading of I/O, statistics and other in-program functionality to be offloaded to the cores and accelerator to decrease the run-time and increase the utilization of cores on a node and the overall computing facilities.
Karan Sapra, Saurabh Gupta(ORNL), Atcheley Scott(ORNL), Ross Miller(ORNL), Sudarshan Vazhkudai(ORNL), Melissa C. Smith
Global GPGPU-based Gene Net Alignment and Visualization(G3NA)
We utilize GPGPUs for aligning and visualizing large scale gene interaction networks. We can perform multiple alignment between biological networks within less minute. Our Maize - Rice network alignment takes 35 seconds on Nvidia K40, each network is ~2000 nodes and ~40000 edges. We currently are extending this Xeon-Phis', FPGA and Multi-core architecture.
Interview with NVIDIA at SC15(November 2015, Austin, TX)
Karan Sapra, Melissa C. Smith, Alex F. Feltus
HPC Enabled Real-Time Remote Processing of Laparoscopic Surgery
Laparoscopic surgery is a minimally invasive surgical technique where surgeons insert a small video camera into the patient’s body to visualize internal organs and use small tools to perform surgical procedures. However, the benefit of small incisions has a drawback of limited subsurface tissue visualization. Image-guided surgery (IGS) uses images to map subsurface structures and can reduce the limitations of laparoscopic surgery. One particular laparoscopic camera system is the vision system of the daVinci robotic surgical system. Processing this huge stream of data on a single or dual node setup is a challenging task, thus we propose High Performance Computing (HPC) enabled framework for laparoscopic surgery which is secure, reliable and scalable.
Karan Sapra, Zahra Ronaghi, Ryan Izard, Edward Duffy, Melissa C. Smith, Kuang-Ching Wang, David M. Kwartowitz
show more
Domain-Name Generation Algorithm
We implement two domain name generation algorithms using Probabilistic Context Free Grammar (PCFG) and Hidden Markov Model (HMM) to evade detection routines. We compare our DGAs with three popular detection routine Kullback-Leibler (KL) distance, Jaccard Index (JI) and Edit distance (ED) detection techniques.
Karan Sapra, Benafsh Husain, Fu Yu, Richard R. Brooks
Keyloggers and Screendumps
Keyloggers (hardware or software) and screendumps of virtual keyboards by the local machine. To counter these attacks, we use DirectX 9 libraries[3] on Windows or Linux[5] operating systems. Our approach uses a remote server that communicates securely with the local process. The Direct X mode that we use executes in the GPU while being directly displayed on the screen. There is no direct communication between the operating system and the GPU storage, which allows us to communicate with the user securely even if the local machine is compromised. We present a simple prototype application of this approach, which supports web browsing.
Karan Sapra, Benafsh Husain, Richard R. Brooks, Melissa Smith
Cloud Load Balancing
Robust infrastructure as a service (IaaS), clouds currently perform load balancing by migrating virtual machines (VMs) from heavily loaded physical machines (PMs) to lightly loaded PMs. The unique features of clouds pose formidable challenges to achieving effective and efficient load balancing. We propose a Resource Intensity Aware Load balancing method (RIAL). For each PM, RIAL dynamically assigns different weights to different resources according to their usage intensity in the PM, which significantly reduces the time and cost to achieve load balance and avoids future load imbalances. We extensively trace-driven simulation results and real-world experimental results show the superior performance of RIAL compared to other load balancing methods.
Helen Shen, Karan Sapra
Cooperative end-to-end traffic redundancy elimination for reducing cloud bandwidth cost
Pay-as-you-go service model impels cloud customers to reduce the usage cost of bandwidth. Traffic Redundancy Elimination (TRE) has been shown to be an effective solution for reducing bandwidth costs, and has recently captured significant attention in the cloud environment. We propose a sender and receiver Cooperative end-to-end TRE solution (CoRE) for efficiently identifying and removing both short-term and long-term redundancy. Through a two-layer redundancy detection design and one single pass algorithm for chunking and fingerprinting, CoRE efficiently carries out cooperative operations between the sender and the receiver. By extensive evaluation with several real traces, we show that CoRE is able to identify both short-term and long-term redundancy with low additional cost, while ensuring TRE efficiency from data changes.
Lei Yu,Karan Sapra, Helen Shen
P2P Collusion Detection and Avoidance
Peer-to-peer networks (P2Ps), many autonomous nodes without preexisting trust relationships share resources (e.g., files) between each other. Due to their open environment, P2Ps usually employ reputation systems to provide guidance in selecting trustworthy resource providers for high system reliability and security. A reputation system computes and publishes reputation score for each node based on a collection of opinions from others about the node. We analyze transaction ratings in the Amazon and Overstock online transaction platforms during one year. The analysis of real trace confirms the existence of collusion as well as its important behavior characteristics and influence on reputation values in real reputation systems. We also proposed a collusion detection method to specifically thwart collusion behaviors. We further optimized the method by reducing the computing cost.
Karan Sapra, Helen Shen
Fast image mosaicking using optical flow
Method of mosaicking a series of unidirectional 2-D motional images independent of type of motion in Single Plane i.e. planar or circular in nature, assuming a unidirectional displacement is occurring with each frame. The resultant image after mosaicking can be a color image even though major computation is done on gray scale images and thus making the mosaicking faster and output more aesthetic.
Karan Sapra, Stan Birchfield


20 Pageviews
Apr. 27th - May. 27th