About Me


I am a software engineer building Oracle Cloud Infrastructure. In multi-cloud migration, my team helps customers migrate their compute reousrces and storage data from external cloud providers (AWS, Azure, GCP) to Oracle Cloud. In edge computing, we are building lightweight, resilient services to support real-time data processing, monitoring, storage, and device management across globally distributed nodes.

I received my Master’s degree in Computer Science from Carnegie Mellon University, advised by Prof. Teruko Mitamura. In my previous experiences, I have a diverse academic and industrial background, including Multimodal AI, Model Compression & Efficient ML, Graph Neural Networks, Computer Vision, Natural Language Processing, AI for Healthcare, Data Augmentation, Large-Scale ML, Speech & Audio, and solid software engineering experiences in algorithm design, data structures, problem-solving, and complexity. I’m also active in academics and have served as a reviewer for many top-tier AI/ML conferences, like ICLR, ACL, NeurIPS, etc.

I have been contemplating how machines can transcend their computational limitations to comprehend human intelligence. My aim is to create computationally efficient machine learning and deep learning models and algorithms, establishing the computational foundations that will enable computers to analyze, recognize, and predict subtle human communicative behaviors in social interactions.

Research Interests


Multimodal Machine Learning: representation, alignment, translation, fusion, and co-learning of heterogeneous data

Natural Language Processing: text representation, syntactic parsing, semantic analysis, text generation

Multimodal Sentiment Analysis: text, audio, video, facial expressions, and physiological signals

Computer Vision: image processing, object detection, image segmentation, scene understanding

Services


Conference/Journal reviewer: ACL, ICLR, AAAI, IJCAI, KDD, CVPR, NAACL, NeurIPS, ACMMM, Elsevier, Peerj, MDPI, etc.

Education


Experience


Projects & Publications


Project 1: Cellular Macromolecules and Image Classification by Advanced Neural Network Strategies


Image Title
This project conducts an extensive investigation into advanced deep learning methodologies tailored for intricate challenges in image classification and the effective management of imbalanced datasets. It encompasses thorough research on convolutional neural networks (CNNs) optimized for various image recognition tasks, innovative strategies for mitigating the impact of data imbalance, and comprehensive evaluations of diverse neural network architectures. Highlighted applications include the classification of macromolecules using cellular electron tomography data, providing detailed insights into robust model architectures and methodologies aimed at significantly improving classification accuracy across a wide spectrum of image analysis domains.

#Imbalanced Data, #Cryo-Electron Tomography, #Image Classification, #Computational Biology


Project 2: Unified AI Framework for Multimodal Multimedia Analysis and Efficient Distributed Computing


Image Title
This project aims to develop a comprehensive AI framework enhancing multimodal sentiment analysis, optimizing distributed file systems, and improving the deployment efficiency of advanced neural network models. By integrating techniques to fuse multiple features and modalities, it boosts sentiment analysis accuracy using both audio and text data. Simultaneously, it tackles distributed file system challenges by focusing on flexible, scalable, and resilient file storage solutions. Additionally, the project addresses Vision Transformer computational demands through optimized model compression techniques, balancing accuracy with efficiency for deployment in resource-constrained environments like edge computing devices. Through these efforts, the project seeks to create a unified AI framework that advances state-of-the-art in sentiment analysis, distributed computing, and practical deployment of neural network models across diverse technological domains.

#Vision Transformers, #Model Compression, #Edge Computing, #Resource Optimization, #Knowledge Distillation


Project 3: Audio Sentiment Analysis by Deep Learning Models


Image Title
The project aims to explore the pioneering research in the field of multimodal audio-text sentiment analysis. Sentiment analysis has garnered widespread attention in both academia and industry in recent years, with most studies focusing on text-based sentiment analysis. However, real-world information often originates from multiple modalities, including audio and text. Therefore, in this project, we integrate audio and text, considering the task of multimodal sentiment analysis, and propose a novel fusion strategy comprising both multi-feature fusion and multi-modality fusion to enhance the accuracy of audio-text sentiment analysis. We introduce the DFF-ATMF (Deep Feature Fusion - Audio and Text Modality Fusion) model, consisting of two parallel branches: an audio modality-based branch and a text modality-based branch. Its core mechanisms involve the fusion of multiple feature vectors and attention mechanisms across multiple modalities. Through experiments conducted on the CMU-MOSI dataset and the recently released CMU-MOSEI dataset, sourced from YouTube for sentiment analysis, our DFF-ATMF model demonstrates highly competitive results. Additionally, leveraging attention weight distribution heatmaps, we illustrate the complementary and robust nature of the deep features learned by the DFF-ATMF model. Remarkably, our model also achieves new state-of-the-art results on the IEMOCAP dataset, underscoring the generalization capability of our proposed fusion strategy for multimodal emotion recognition.

#Multimodal Sentiment Analysis, #Emotion Recognition, #Deep Fusion Models


Project 4: Innovative AI-Driven Summarization and Multimedia Generation Framework


Image Title
This project integrates cutting-edge advancements in natural language processing and multimedia generation to create a comprehensive AI-driven framework. By leveraging contextualized pre-trained models such as BERT and BART, the project explores innovative methods to enhance aspect-based abstractive summarization through the injection of external knowledge. This includes utilizing knowledge graphs and human-defined sequence-level scores to improve summarization accuracy and relevance. Simultaneously, the project addresses the challenges in Music Anime Douga (MAD) production, a popular form of multimedia that combines animation with music. It introduces a novel framework for generating high-quality videos from text-image pairs, overcoming the limitations of existing text-to-video synthesis methods. This multi-modal system interprets narrative and visual inputs to produce seamless video outputs, enhancing artistic control and preserving the creator's intent. By combining these two advanced AI applications, the project aims to revolutionize both text summarization and multimedia content creation, democratizing the production process and encouraging broader artistic participation and innovation. Through rigorous experimentation and validation, this integrated framework sets the stage for future advancements in AI-assisted content generation and summarization technologies.

#Pre-trained Models, #Knowledge Graph, #Natural Language Processing


Project 5: Advancing Question Generation and Mathematical Reasoning: Integrating Visual and Textual Data


Image Title
This project delves into the intersection of visual and textual data to enhance AI capabilities in question generation and mathematical reasoning. It encompasses two complementary studies: one focusing on advancing mathematical problem-solving through innovative data preprocessing and model enhancements, and the other exploring the generation of contextually relevant questions from images. By integrating advanced deep learning techniques, including VGG and LSTM networks, and refining mathematical reasoning models with sophisticated preprocessing and ensemble methods, this project aims to push the boundaries of AI’s ability to interpret and generate questions based on both visual and textual inputs. The research demonstrates significant improvements in model accuracy and performance, offering new avenues for integrating visual content into question generation and enhancing mathematical reasoning tasks. This work lays the groundwork for future advancements in cognitive AI by bridging the gap between different data modalities.

#Multimodal AI Integration, #Visual Question Generation, #Enhanced Mathematical Reasoning


Project 6: Transformative AI Innovations for Multimedia Creation Security Enhancement and Efficient Data Processing


Image Title
This project explores advancements in three critical areas of AI: multimedia synthesis, speech system security, and natural language processing (NLP) data management. It integrates diverse approaches to enhance AI capabilities across different domains. The first area focuses on creating a multi-modal framework that combines text and image inputs to generate high-quality videos with improved artistic control, facilitating more efficient and creative multimedia production. The second area addresses the vulnerability of automatic speech recognition (ASR) systems to adversarial attacks, proposing a novel detector-reformer approach that leverages deep learning techniques to enhance system robustness and security. The third area introduces a general framework for streamlining data processing across various NLP tasks, offering a more efficient and modular approach to managing data and simplifying the integration of data processing with model training and prediction. Together, these studies contribute to the advancement of multimedia creation, speech system security, and NLP data handling, paving the way for more robust and creative AI applications.

#Multimodal Video Synthesis, #Speech Security and Adversarial Defense, #NLP Data Processing Framework