About Me

I am a software engineer building Oracle Cloud Infrastructure. In multi-cloud migration, my team helps customers migrate their compute reousrces and storage data from external cloud providers (AWS, Azure, GCP) to Oracle Cloud. In edge computing, we are building lightweight, resilient services to support real-time data processing, monitoring, storage, and device management across globally distributed nodes.

I received my Master’s degree in Computer Science from Carnegie Mellon University, advised by Prof. Teruko Mitamura. In my previous experiences, I have a diverse academic and industrial background, including Multimodal AI, Model Compression & Efficient ML, Graph Neural Networks, Computer Vision, Natural Language Processing, AI for Healthcare, Data Augmentation, Large-Scale ML, Speech & Audio, and solid software engineering experiences in algorithm design, data structures, problem-solving, and complexity. I’m also active in academics and have served as a reviewer for many top-tier AI/ML conferences, like ICLR, ACL, NeurIPS, etc.

I have been contemplating how machines can transcend their computational limitations to comprehend human intelligence. My aim is to create computationally efficient machine learning and deep learning models and algorithms, establishing the computational foundations that will enable computers to analyze, recognize, and predict subtle human communicative behaviors in social interactions.

Research Interests

Multimodal Machine Learning: representation, alignment, translation, fusion, and co-learning of heterogeneous data

Natural Language Processing: text representation, syntactic parsing, semantic analysis, text generation

Multimodal Sentiment Analysis: text, audio, video, facial expressions, and physiological signals

Computer Vision: image processing, object detection, image segmentation, scene understanding

Services

Conference/Journal reviewer: ACL, ICLR, AAAI, IJCAI, KDD, CVPR, NAACL, NeurIPS, ACMMM, Elsevier, Peerj, MDPI, etc.

Education

M.S. in Computer Science, Carnegie Mellon University, Dec. 2020
B.S. in Telecommunications Engineering, Beijing University of Posts and Telecommunications, Jun. 2019

Experience

Oracle
- Software Engineer, Feb 2021 - Current
  - Health and AI, Cloud Infrasturcture, Multi-Cloud Migration, Edge Cloud
GEIRINA
- Machine Learning Engineer Intern, May 2020 - Aug 2020
  - Maintained Deep Q Network with OpenAI Gym environment for smart power supply control for power plants
Tencent
- Machine Learning Engineer Intern, Apr 2019 - Jun 2019
  - Responsible for implementing Learn To Rank pipeline in WeChat AI
Oracle
- Full Stack Software Engineer Intern, Sep 2018 - Feb 2019
  - Created automation testing methods for testing UI and API actions in frontend and backend.

Projects & Publications

Project 1: Cellular Macromolecules and Image Classification by Advanced Neural Network Strategies

This project conducts an extensive investigation into advanced deep learning methodologies tailored for intricate challenges in image classification and the effective management of imbalanced datasets. It encompasses thorough research on convolutional neural networks (CNNs) optimized for various image recognition tasks, innovative strategies for mitigating the impact of data imbalance, and comprehensive evaluations of diverse neural network architectures. Highlighted applications include the classification of macromolecules using cellular electron tomography data, providing detailed insights into robust model architectures and methodologies aimed at significantly improving classification accuracy across a wide spectrum of image analysis domains.

#Imbalanced Data, #Cryo-Electron Tomography, #Image Classification, #Computational Biology

Publications
- Ziqian Luo, Xiangrui Zeng, Min Xu, “Deep Learning-Based Strategy for Macromolecules Classification with Imbalanced Data from Cellular Electron Cryotomography,” Proc. of IJCNN’19, Budapest, 2019.
- Xueting Pan, Ziqian Luo, Lisang Zhou, “Comprehensive Survey of State-of-the-Art Convolutional Neural Network Architectures and Their Applications in Image Classification,” In Journal of Innovations in Applied Engineering and Technology, 1(1), 1–16, 2022.
- Feiyang Chen, Ziqian Luo, Nan Chen, Hanyang Mao, Hanlin Hu, Ying Jiang, Xueting Pan, Huitao Zhang, “Assessing Four Neural Networks on Handwritten Digit Recognition Dataset (MNIST), “ In Journal of Computer Science Research, 6(3), 17–22, 2024.

Project 2: Unified AI Framework for Multimodal Multimedia Analysis and Efficient Distributed Computing

This project aims to develop a comprehensive AI framework enhancing multimodal sentiment analysis, optimizing distributed file systems, and improving the deployment efficiency of advanced neural network models. By integrating techniques to fuse multiple features and modalities, it boosts sentiment analysis accuracy using both audio and text data. Simultaneously, it tackles distributed file system challenges by focusing on flexible, scalable, and resilient file storage solutions. Additionally, the project addresses Vision Transformer computational demands through optimized model compression techniques, balancing accuracy with efficiency for deployment in resource-constrained environments like edge computing devices. Through these efforts, the project seeks to create a unified AI framework that advances state-of-the-art in sentiment analysis, distributed computing, and practical deployment of neural network models across diverse technological domains.

#Vision Transformers, #Model Compression, #Edge Computing, #Resource Optimization, #Knowledge Distillation

Publications
- Feiyang Chen, Ziqian Luo, Yanyan Xu, Dengfeng Ke, “Complementary fusion of multi-features and multi-modalities in sentiment analysis,” Proc. of the Thirty-fourth AAAI workshop, New York, 2020.
- Xueting Pan, Ziqian Luo, Lisang Zhou, “Navigating the Landscape of Distributed File Systems: Architectures, Implementations, and Considerations,” In Journal of Innovations in Applied Engineering and Technology, 2(1), 1–12, 2023.
- Feiyang Chen, Ziqian Luo, Lisang Zhou, Xueting Pan, Ying Jiang, “Comprehensive Survey of Model Compression and Speed up for Vision Transformers,” In Journal of Information, Technology and Policy, 1(1), 1–12, 2024.

Project 3: Audio Sentiment Analysis by Deep Learning Models

The project aims to explore the pioneering research in the field of multimodal audio-text sentiment analysis. Sentiment analysis has garnered widespread attention in both academia and industry in recent years, with most studies focusing on text-based sentiment analysis. However, real-world information often originates from multiple modalities, including audio and text. Therefore, in this project, we integrate audio and text, considering the task of multimodal sentiment analysis, and propose a novel fusion strategy comprising both multi-feature fusion and multi-modality fusion to enhance the accuracy of audio-text sentiment analysis. We introduce the DFF-ATMF (Deep Feature Fusion - Audio and Text Modality Fusion) model, consisting of two parallel branches: an audio modality-based branch and a text modality-based branch. Its core mechanisms involve the fusion of multiple feature vectors and attention mechanisms across multiple modalities. Through experiments conducted on the CMU-MOSI dataset and the recently released CMU-MOSEI dataset, sourced from YouTube for sentiment analysis, our DFF-ATMF model demonstrates highly competitive results. Additionally, leveraging attention weight distribution heatmaps, we illustrate the complementary and robust nature of the deep features learned by the DFF-ATMF model. Remarkably, our model also achieves new state-of-the-art results on the IEMOCAP dataset, underscoring the generalization capability of our proposed fusion strategy for multimodal emotion recognition.

#Multimodal Sentiment Analysis, #Emotion Recognition, #Deep Fusion Models

Publications
- Ziqian Luo, Hua Xu, Feiyang Chen, “Audio Sentiment Analysis by Heterogeneous Signal Features Learned from Utterance-Based Parallel Neural Network,” Proc. of the Thirty-third AAAI workshop, Honolulu, 2019.
- Ziqian Luo, Hua Xu, Feiyang Chen, “Utterance-Based Audio Sentiment Analysis Learned by a Parallel Combination of CNN and LSTM,” arXiv preprint, 2018.
- Feiyang Chen, Ziqian Luo, “Sentiment Analysis using Deep Robust Complementary Fusion of Multi-Features and Multi-Modalities,” arXiv preprint, 2019.
- Feiyang Chen, Ziqian Luo, “Learning robust heterogeneous signal features from parallel neural network for audio sentiment analysis,” arXiv preprint, 2019.

Project 4: Innovative AI-Driven Summarization and Multimedia Generation Framework

This project integrates cutting-edge advancements in natural language processing and multimedia generation to create a comprehensive AI-driven framework. By leveraging contextualized pre-trained models such as BERT and BART, the project explores innovative methods to enhance aspect-based abstractive summarization through the injection of external knowledge. This includes utilizing knowledge graphs and human-defined sequence-level scores to improve summarization accuracy and relevance. Simultaneously, the project addresses the challenges in Music Anime Douga (MAD) production, a popular form of multimedia that combines animation with music. It introduces a novel framework for generating high-quality videos from text-image pairs, overcoming the limitations of existing text-to-video synthesis methods. This multi-modal system interprets narrative and visual inputs to produce seamless video outputs, enhancing artistic control and preserving the creator's intent. By combining these two advanced AI applications, the project aims to revolutionize both text summarization and multimedia content creation, democratizing the production process and encouraging broader artistic participation and innovation. Through rigorous experimentation and validation, this integrated framework sets the stage for future advancements in AI-assisted content generation and summarization technologies.

#Pre-trained Models, #Knowledge Graph, #Natural Language Processing

Publications
- Ziqian Luo, “Knowledge-guided Aspect-based Summarization,” Proc. of the International Conference on Communications, Computing and Artificial Intelligence CCCAI, 2023.
- Lisang Zhou, Ziqian Luo, Xueting Pan, “Machine learning-based system reliability analysis with Gaussian Process Regression”, In Journal of Computational Methods in Engineering Application, 3(1), 1–23, 2023.
- Ziqian Luo, Feiyang Chen, Xiaoyang Chen, Xueting Pan, “A Novel Framework for Text-Image Pair to Video Generation in Music Anime Douga (MAD) Production”, In Journal of Artificial Intelligence Advances, 2024

Project 5: Advancing Question Generation and Mathematical Reasoning: Integrating Visual and Textual Data

This project delves into the intersection of visual and textual data to enhance AI capabilities in question generation and mathematical reasoning. It encompasses two complementary studies: one focusing on advancing mathematical problem-solving through innovative data preprocessing and model enhancements, and the other exploring the generation of contextually relevant questions from images. By integrating advanced deep learning techniques, including VGG and LSTM networks, and refining mathematical reasoning models with sophisticated preprocessing and ensemble methods, this project aims to push the boundaries of AI’s ability to interpret and generate questions based on both visual and textual inputs. The research demonstrates significant improvements in model accuracy and performance, offering new avenues for integrating visual content into question generation and enhancing mathematical reasoning tasks. This work lays the groundwork for future advancements in cognitive AI by bridging the gap between different data modalities.

#Multimodal AI Integration, #Visual Question Generation, #Enhanced Mathematical Reasoning

Publications
- Ziqian Luo, Xueting Pan, “Advancing Mathematical Reasoning in AI: A Comprehensive Approach with the MathQA Dataset,” HAL preprint, 2024.
- Ziqian Luo, Xueting Pan, “Visual Question Generation on VQA Dataset,” HAL preprint, 2024.

Project 6: Transformative AI Innovations for Multimedia Creation Security Enhancement and Efficient Data Processing

This project explores advancements in three critical areas of AI: multimedia synthesis, speech system security, and natural language processing (NLP) data management. It integrates diverse approaches to enhance AI capabilities across different domains. The first area focuses on creating a multi-modal framework that combines text and image inputs to generate high-quality videos with improved artistic control, facilitating more efficient and creative multimedia production. The second area addresses the vulnerability of automatic speech recognition (ASR) systems to adversarial attacks, proposing a novel detector-reformer approach that leverages deep learning techniques to enhance system robustness and security. The third area introduces a general framework for streamlining data processing across various NLP tasks, offering a more efficient and modular approach to managing data and simplifying the integration of data processing with model training and prediction. Together, these studies contribute to the advancement of multimedia creation, speech system security, and NLP data handling, paving the way for more robust and creative AI applications.

#Multimodal Video Synthesis, #Speech Security and Adversarial Defense, #NLP Data Processing Framework

Publications
- Ziqian Luo, Xueting Pan, “A Multi-Modal Framework for Text-Image to Video Synthesis with Enhanced Artistic Control,” HAL preprint, 2024.
- Ziqian Luo, “Generalizing Data Processing for Natural Language Processing Tasks,” HAL preprint, 2024.
- Ziqian Luo, Xueting Pan, “Defense Against Adversarial Attacks on Speech Systems,” HAL preprint, 2024.

Ziqian Luo