# llmtcg-linux-kernel

**Repository Path**: liyifm/llmtcg-linux-kernel

## Basic Information

- **Project Name**: llmtcg-linux-kernel
- **Description**: linux kernel compatibility testing based on large language models
- **Primary Language**: C
- **License**: BSD-3-Clause
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 2
- **Forks**: 0
- **Created**: 2024-04-25
- **Last Updated**: 2024-09-23

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

LLM-Generated Linux-Kernel Testcases
====================================

The repository contains a set of test cases that are:

1. Aimed to consistency testing between linux-compatible kernels,
2. Automatically generated using large language models.

All the test case in this repository are current generated by the tool
[llmtcg](https://gitee.com/liyifm/llmtcg).

Cost Statistics
----------------

| Tested Subsystems | Test Scenes | Test Cases | Generated Rate |
|-------------------|-------------|------------|----------------|
| syscalls          |        4604 |       3897 |          84.6% |


Evaluation On Capabilities of LLMs
-----------------------------------

| Tested Subsystems | Test Scenes | `claude3-haiku` |  `llama-3-8b` | `deepseek-v2-chat` |
|-------------------|-------------|-----------------|---------------|---------------------
| syscalls          |        4604 |   2227 (48.37%) | 2504 (54.39%) |      2835 (62.58%) |


History
--------


- *2024.4.26* Initial version, use the `claude3-haiku` model to extract test scenes from linux kernel's manpages
    - 4604 test scenes generated, where for 2227 scenes we successfully generated runnable (and passed) test cases
    - 38,057,211 input tokens used, 10,715,788 output tokens used
    - 22.91$ costed
- *2024.4.29* use `moonshot-v1-32k` model to make further attempt on previously failed test scenes
    - tried on 457 test scenes, where we succesfully generated test code for 170 scenes
    - 3,203,882 input tokens used, 736,716 output tokens used
    - 94.57 rmb costed
- *2024.5.2* use locally deployed `llama3-8b-instruct` model to re-generate all test scenes on syscalls
    - tried on 4604 test scenes, 2504 of them succeeded
    - using a single Nvidia RTX 4090
    - 30 rmb costed (electric cost ;)
- *2024.5.7* use `deepseek-v2` model to re-generate all test scenes on syscalls
    - tried on 4604 test scenes, 2835 of them succeeded
    - 55 rmb costed