An Easy-to-use Instruction Processing Framework
for Large Language Models.

Yixin Ou^♠♡, Ningyu Zhang^♠♡*, Honghao Gui^♠♡, Ziwen Xu^♠♡, Shuofei Qiao^♠♡, Yida Xue^♠, Runnan Fang^♠♡, Kangwei Liu^♠♡, Lei Li^♠♡, Zhen Bi^♠♡, Guozhou Zheng^♠♡, Huajun Chen^♠♡*

^♠Zhejiang University ^♡Zhejiang University - Ant Group Joint Laboratory of Knowledge Graph
^*Corresponding Author

ArXiv

🤗

HF Paper Code Demo

Abstract

Instruction tuning has gained increasing attention and emerged as a crucial technique to enhance the capabilities of Large Language Models (LLMs), which bridges the gap between the next-word prediction objective of LLMs and human preference. To construct a high-quality instruction dataset, many instruction processing approaches have been proposed, aiming to achieve a delicate balance between data quantity and data quality. Nevertheless, due to inconsistencies that persist among various instruction processing methods, there is no standard implementation framework available for the community, which hinders practitioners from further developing and advancing. To facilitate instruction processing research, we present EasyInstruct, an easy-to-use instruction processing framework for LLMs, which modularizes instruction generation, selection, and prompting, while also considering their combination and interaction.

🌟Overview

EasyInstruct is a Python package which is proposed as an easy-to-use instruction processing framework for Large Language Models(LLMs) like GPT-4, LLaMA, ChatGLM in your research experiments. EasyInstruct modularizes instruction generation, selection, and prompting, while also considering their combination and interaction.

The APIs & Engines module standardizes the instruction execution process, enabling the execution of instruction prompts on specific LLM API services or locally deployed LLMs.
The Generators module streamlines the instruction generation process, enabling automated generation of instruction data based on chat data, corpus, or knowledge graphs.
The Selectors module standardizes the instruction selection process, enabling the extraction of high-quality instruction datasets from raw, unprocessed instruction data.
The Prompts module standardizes the instruction prompting process.

The instruction generation methods implemented in Generators are categorized into three groups, based on their respective seed data sources: chat data, corpus, and knowledge graphs. The evaluation metrics in Selectors are divided into two categories, based on the principle of their implementation: statistics-based and LM-based.

We detail the components of Generators and Selectors modules in the table below:

🎨Design Principles

The framework is designed to cater to users with varying levels of expertise, providing a user-friendly experience ranging from code-free execution to low-code customization and advanced customization options:

Zero-Code Instruction Processing. Novice users, who do not require coding knowledge, can leverage pre-defined configuration files and shell scripts to accomplish code-free instruction processing. By running these scripts, they can complete instruction processing tasks without the need for coding skills.
Low-Code Customization. Intermediate users have the option to customize various process inputs and outputs using a low-code approach. This allows them to have more control over the different stages within the framework.
Advanced Components Extension. Experienced users can easily extend our components based on their specific scenarios and requirements. To customize their classes, users can inherit the base classes of modules and override the necessary methods as per their requirements. This flexibility enables them to implement their functional components, tailored to their unique needs.

⏩Quickstart

We provide two ways for users to quickly get started with EasyInstruct. You can either use the shell script or the Gradio app based on your specific needs.

Shell Script

Step1: Prepare a configuration file. Users can easily configure the parameters of EasyInstruct in a YAML-style file or just quickly use the default parameters in the configuration files we provide. Following is an example of the configuration file for Self-Instruct:

Step2: Run the shell script. Users should first specify the configuration file and provide their own OpenAI API key. Then, run the following shell script to launch the instruction generation or selection process.

Gradio App

We provide a Gradio app for users to quickly get started with EasyInstruct. Users can choose to launch the Gradio app locally on their own machines or alternatively, they can try the hosted Gradio app that we provide on HuggingFace Spaces.

📊Evaluation

In experiments, we mainly consider four instruction datasets as follows: (a) self_instruct_5k is constructed by employing the Self-Instruct method to distill instruction data from text-davinci-003; (b) alpaca_data_5k is randomly sampled from the Alpaca dataset; (c) evol_instruct_5k is constructed by employing the Evol-Instruct method; (d) easyinstruct_5k is collected by integrating the three instruction datasets above and applying multiple Selctors in EasyInstruct to extract high-quality instruction datasets.

To conduct the experiments on the effect of instruction datasets, we adopt a LLaMA2 (7B) model. We fine-tune all our models with LoRA in the format proposed in Alpaca. The evaluation is conducted by comparing the generated results from different fine-tuned models based on the AlpacaFarm evaluation set. Following AlpacaFarm, for each comparison, we employ ChatGPT as the evaluator to automatically compare two outputs from different models and label which one they prefer, reporting the win rate as the evaluation metric.

Instruction Diversity. To study the diversity of the instruction datasets considered in our experiments, we identify the verb-noun structure in the generated instructions and plot the top 20 most prevalent root verbs and their top 4 direct nouns in the figure below. Overall, we see a wide range of intents and textual formats within these instructions.

Main Results. We compare the generated outputs from models fine-tuned separately on the four instruction datasets with the outputs from the base version of the LLaMA2 (7B) model on the AlpacaFarm evaluation set. As depicted in the figure below, there are improvements in the win rate metric for all the settings. Moreover, the model performs optimally under the easyinstruct_5k setting, indicating the importance of a rich instruction selection strategy.

Case Study. To conduct a qualitative evaluation of EasyInstruct, we sample several instruction examples selected by the Selctors module in easyinstruct_5k for the case study. We also attach the corresponding evaluation scores for each of these instruction examples, as shown in the table below. We observe that the selected instructions often possess fluent language and meticulous logic.

🚩Citation

@article{ou2024easyinstruct,
  title={EasyInstruct: An Easy-to-use Instruction Processing Framework for Large Language Models},
  author={Ou, Yixin and Zhang, Ningyu and Gui, Honghao and Xu, Ziwen and Qiao, Shuofei and Bi, Zhen and Chen, Huajun},
  journal={arXiv preprint arXiv:2402.03049},
  year={2024}
}

An Easy-to-use Instruction Processing Frameworkfor Large Language Models.