Biography

Avatar

Eric Hsiung

Graduate Research Assistant

Department of Computer Science, The University of Texas at Austin

I am a 2nd year PhD student at the University of Texas at Austin, co-advised by professors Swarat Chaudhuri and Joydeep Biswas. I currently focus on applying neurosymbolic methods to robotics problems, as well as making robot programming less difficult for people. I recieved my MS in Computer Science from Brown University, where I focused on reinforcement learning and methods for teaching intelligent machines, and received my BS in Engineering Physics from Cornell University, where I was part of the Violet Nanosatellite team and also spent a couple summers at the Squishy Cell lab at the University of Chicago. Prior to joining UT Austin, I spent time at RightHand Robotics. Previously, I was a Technical Manager at LGS Innovations (previously the Government Communications Lab of Bell Labs at Alcatel-Lucent, now part of CACI), where as part of the Internet Research Lab, I led efforts in data engineering, analytics, infrastructure, network reconnaissance, and secure communications.

Education

Publications

Learning Reward Machines through Preference Queries over Sequences

Reward machines have shown great promise at capturing non-Markovian reward functions for learning tasks that involve complex action sequencing. However, no algorithm currently exists for learning reward machines with realistic weak feedback in the form of preferences. We contribute REMAP, a novel algorithm for learning reward machines from preferences, with correctness and termination guarantees. REMAP introduces preference queries in place of membership queries in the L* algorithm, and leverages a symbolic observation table along with unification and constraint solving to narrow the hypothesis reward machine search space. In addition to the proofs of correctness and termination for REMAP, we present empirical evidence measuring correctness: how frequently the resulting reward machine is isomorphic under a consistent yet inexact teacher, and the regret between the ground truth and learned reward machines.

Learning Reward Functions from a Combination of Demonstration and Evaluative Feedback

As robots become more prevalent in society, they will need to learn to act appropriately under diverse human teaching styles. We present a human-centered approach for teaching robots reward functions by using a mixture of teaching strategies when communicating action appropriateness and goal success. Our method incorporates two teaching strategies for learning: explicit action instruction and evaluative, scalar-based feedback. We demonstrate that a robot instantiating our method can learn from humans who use both kinds of strategies to train the robot in a complex navigation task that includes norm-like constraints.

Norm Learning with Reward Models from Instructive and Evaluative Feedback

People are increasingly interacting with artificial agents in social settings, and as these agents become more sophisticated, people will have to teach them social norms. Two prominent teaching methods include instructing the learner how to act, and giving evaluative feedback on the learner’s actions. Our empirical findings indicate that people naturally adopt both methods when teaching norms to a simulated robot, and they use the methods selectively as a function of the robot’s perceived expertise and learning progress. In our algorithmic work, we conceptualize a set of context-specific norms as a reward function and integrate learning from the two teaching methods under a single likelihood-based algorithm, which estimates a reward function that induces policies maximally likely to satisfy the teacher’s intended norms. We compare robot learning under various teacher models and demonstrate that a robot responsive to both teaching methods can learn to reach its goal and minimize norm violations in a navigation task for a grid world. We improve the robot’s learning speed and performance by enabling teachers to give feedback at an abstract level (which rooms are acceptable to navigate) rather than at a low level (how to navigate any particular room).

Rainbow RBF-DQN

Deep reinforcement learning has been extensively studied, resulting in several extensions to DQN that improve its perfor- mance, such as replay buffer sampling strategies, distributional value representations, and double/dueling networks. Previ- ous works have examined these extensions in the context of either discrete action spaces or in conjunction with actor-critic learning algorithms, but there has been no investigation of combining them for deep value-based continuous control. We adapted the methods discussed in Rainbow DQN to RBF-DQN, a deep valued-based method for continuous control, showing improvements in baseline performance and sample efficiency. Rainbow RBF-DQN is able to outperform vanilla RBF-DQN on the most challenging tasks even outperforming state of the art policy gradient methods like SAC.

Generalizing to New Domains by Mapping Natural Language to Lifted LTL

Recent work on using natural language to specify commands to robots has grounded that language to LTL. However, mapping natural language task specifications to LTL task specifications using language models require probability distributions over finite vocabulary. Existing state-of-the-art methods have extended this finite vocabulary to include unseen terms from the input sequence to improve output generalization. However, novel out-of-vocabulary atomic propositions cannot be generated using these methods. To overcome this, we introduce an intermediate contextual query representation which can be learned from single positive task specification examples, associating a contextual query with an LTL template. We demonstrate that this intermediate representation allows for generalization over unseen object references, assuming accurate groundings are available. We compare our method of mapping natural language task specifications to intermediate contextual queries against state-of-the-art CopyNet models capable of translating natural language to LTL, by evaluating whether correct LTL for manipulation and navigation task specifications can be output, and show that our method outperforms the CopyNet model on unseen object references. We demonstrate that the grounded LTL our method outputs can be used for planning in a simulated OO-MDP environment. Finally, we discuss some common failure modes encountered when translating natural language task specifications to grounded LTL.

Value-Based Reinforcement Learning for Continuous Control Robotic Manipulation in Multi-Task Sparse Reward Settings

Learning continuous control in high-dimensional sparse reward settings, such as robotic manipulation, is a challenging problem due to the number of samples often required to obtain accurate optimal value and policy estimates. While many deep reinforcement learning methods have aimed at improving sample efficiency through replay or improved exploration techniques, state of the art actor-critic and policy gradient methods still suffer from the hard exploration problem in sparse reward settings. Motivated by recent successes of value-based methods for approximating state-action values, like RBF-DQN, we explore the potential of value-based reinforcement learning for learning continuous robotic manipulation tasks in multi-task sparse reward settings. On robotic manipulation tasks, we empirically show RBF-DQN converges faster than current state of the art algorithms such as TD3, SAC, and PPO. We also perform ablation studies with RBF-DQN and have shown that some enhancement techniques for vanilla Deep Q learning such as Hindsight Experience Replay (HER) and Prioritized Experience Replay (PER) can also be applied to RBF-DQN. Our experimental analysis suggests that value-based approaches may be more sensitive to data augmentation and replay buffer sample techniques than policy-gradient methods, and that the benefits of these methods for robot manipulation are heavily dependent on the transition dynamics of generated subgoal states.

Chronology

Graduate Research Assistant

The University of Texas at Austin, Department of Computer Science

Aug 2022 – Present | Austin, TX
Neurosymbolic methods and robotics; co-advised by Swarat Chaudhuri and Joydeep Biswas.

Senior Software Engineer, Infrastructure

RightHand Robotics

May 2022 – Aug 2022 | Somerville, MA
Assist in efforts for improving infrastructure powering RightPick systems; assist in transitioning codebase languages; testing, validation, verification.

Graduate Research Student

Brown University, Department of Computer Science

Sep 2020 – Feb 2022 | Providence, RI
Focus on reinforcement learning problems and ways to teach intelligent agents.

Technical Contributor and Technical Manager, Internet Research Lab

LGS Innovations / CACI

Jun 2013 – Aug 2020 | Florham Park, NJ

Lead efforts in data engineering, analytics, infrastructure, network reconnaissance, and secure communications. Some examples:

Classification and Mapping of Encrypted Mobile Application Traffic onto User Actions

We compare and contrast various deep learning model’s performance on successfully predicting user-actions from captured encrypted traffic. Typical traffic sources include activities within various social media apps, content streaming apps, communications apps, and cloud storage apps. The scale-up process of obtaining massive training, test, and validation datasets leverages the LGS Innovations Device Farm software/hardware solution to programmatically control and navigate mobile applications on phones, generating classes of user-actions with enough variance in their similarities, and capturing the resulting encrypted traffic on multiple smartphones simultaneously. We can programmatically create app crawlers to drive the application, and we can record replayable user action sequences for dataset generation. Scale is also achieved by allowing multiple people to contribute simultaneously to the training dataset. Models, model parameters, training sessions, and model performance are tracked to understand how any particular model’s performance evolves over time. A short-term goal is to be able to successfully identify user actions in action sequences in real-time. Results could be used to train generative adversarial networks to generate network traffic similar to actual human users.

Diversification of Network Traffic over Dynamic Spatial-Temporal Network Paths

We have a created an extensible, flexible network diversification framework, which operates on both embedded and server grade devices. Current plans are to extend support to virtualized cloud instances, then to a distributed architecture, and finally to an autonomously re-arranging distributed cloud based system.

Scientist I

Arete Associates

Aug 2011 – Mar 2013 | Chantilly, VA

3D model reconstruction, tessellation, and mesh simplification. I worked on GPU shader programs in OpenGL and GLSL to support scene visualizations and simulations.

Arete Associates in 2011 (Internet Archive)

An occlusion shader computes dynamic shadows based on perspective. An implementation of this involves iteratively computing depthmap differences between the presence or absence of objects in a scene. By mapping the differences to a binary range, any changes in the binary value are indicative of an occlusion edge; thus dynamic shadow outlines can be computed upon specified perspectives.

GPUs can shade scenes dynamically and update the views based on the current camera and viewport perspective. If a scene needs to be exported and viewed from a non-GPU empowered program, how can we best choose to texture the outputs if the views can be dynamic? First, through perspective sampling, numerous textures can be generated from a variety of viewpoints. Second, scene objects can be tesselated to improve texturing quality, because faces in an exported model may only be painted with one texture. By painting each tesselated face with the optimal texture, export and “compression” of scene texture information can be achieved. Combining tesselated face normals with occlusion information for individual vertices allows one to select the optimal texture for the face. The optimal texture is that which was obtained using a perspective vector whose inner product with the face normal is closest to unity, or a shadowed texture based on the occlusion information. Hence, pre-rendered 360 degree scene views with approximately correct texturing can be achieved.

While tesselation is useful tool for maintaining model detail, mesh simplification allows for model compression by eliminating faces, edges, and vertices in a controllable manner.

Technical Summer Intern

MITRE

Jun 2010 – Jun 2011 | Tyson's Corner, VA
Remote Sensing: Optical Remote Sensing Utilizing Lidar and Surface Chemistry

Simulation and Test Engineer

Cornell University Violet Nanosatellite, Attitude Control System Team

Aug 2009 – May 2010 | Ithaca, NY

Attitude Control System simulations, steering law switching algorithms, reaction wheel analysis.

Violet Nanosatellite in 2016 (Internet Archive)

Violet is a cube nanosatellite designed with an array of 8 control momentum gyroscopes (CMGs). Its primary purposes was to demonstrate high agiliy maneuvers through angular momentum exchange with CMGs. Violet placed 2nd in the 6th University Nanosatellite Program.

The null space of the steering angular momentum envelopes needs to be avoided to prevent overdriving the CMGs, and more importantly, to maintain the ability to steer the spacecraft. Part of this included ensuring the magnetic torque coils and the CMGs could be utilized effectively for momentum shedding or recovery. If the CMGs became too saturated with momentum, they could not be utilized for steering (hence the importance of avoiding of the null space of the angular momentum steering envelope).

Summer Research Assistant

Gardel Cellular Biophysics Lab, University of Chicago

Jun 2009 – Aug 2009 | Chicago, IL
Characterization of Zyxin Flashes in Stress Fibers

Academic Excellence Workhop Facilitator

Cornell University, Engineering Learning Initiatives

Aug 2008 – May 2010 | Ithaca, NY
Facilitate academic excellence workshops in multivariable calculus and differential equations; develop enrichment problems for workshop students to solve and collaborate over.

Summer Research Assistant

Gardel Cellular Biophysics Lab, University of Chicago

Jun 2008 – Aug 2008 | Chicago, IL

Cell Adhesion Strength Measurement Assay Utilizing a Rheometer

Motor torque measured via the rheometer can be used to derive the shear stress at a certain radius since integrating shear stress over the surface of the circular cell plated coverslip yields the torque required to maintain constant angular velocity using the rheometer. As long as laminar flow conditions are maintained in the container, the cell spinning fluid’s viscosity can be used to compute shear stress in conjunction with the fluid’s spatial velocity profile. The profile can be determined by boundary conditions: (1) angular velocity of the spinning disk and (2) the dimensions to the edges of the container where fluid velocity is zero. The cellular adhesion strength measurement assay process utilized cell plated coverslips mounted on a rheometer immersed in a suitable fluid, which was then spun up to a target angular velocity as a means of exerting fluid shear stresses to cells on the coverslip surface. By recording the remaining distribution of cells on the circular coverslip with fluorescent microscopy and image processing, the shear stress required to detach cells could be deduced at the 50% cell density radius.