Projects

GemmaX: Multilingual Translator based on Gemma Open Models

GemmaX are many-to-many LLM-based multilingual translation models, which adopt multilingual continual pretraining with Parallel-First Monolingual-Second (PFMS) data mixing strategy and instruction finetuning with high-quality translation prompts.

Forte: Composing Diverse NLP Tools for Text Retrieval, Analysis and Generation

Forte is a flexible composable system designed for text processing, providing integrated architecture support for a wide spectrum of tasks, from Information Retrieval to tasks in Natural Language Processing (including text analysis and language generation). Empowered by principled abstraction and design principles, Forte provides a platform to gather cutting-edge NLP and ML technologies in a composable manner.

Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation

Texar-PyTorch is an open-source toolkit based on PyTorch, aiming to support a broad set of machine learning especially text generation tasks, such as machine translation, dialog, summarization, content manipulation, language modeling, and so on. Texar is designed for both researchers and practitioners for fast prototyping and experimentation.

Pengzhi Gao

Projects

GemmaX: Multilingual Translator based on Gemma Open Models

Forte: Composing Diverse NLP Tools for Text Retrieval, Analysis and Generation

Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation