# llmtcg-linux-kernel **Repository Path**: liyifm/llmtcg-linux-kernel ## Basic Information - **Project Name**: llmtcg-linux-kernel - **Description**: linux kernel compatibility testing based on large language models - **Primary Language**: C - **License**: BSD-3-Clause - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 2 - **Forks**: 0 - **Created**: 2024-04-25 - **Last Updated**: 2024-09-23 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README LLM-Generated Linux-Kernel Testcases ==================================== The repository contains a set of test cases that are: 1. Aimed to consistency testing between linux-compatible kernels, 2. Automatically generated using large language models. All the test case in this repository are current generated by the tool [llmtcg](https://gitee.com/liyifm/llmtcg). Cost Statistics ---------------- | Tested Subsystems | Test Scenes | Test Cases | Generated Rate | |-------------------|-------------|------------|----------------| | syscalls | 4604 | 3897 | 84.6% | Evaluation On Capabilities of LLMs ----------------------------------- | Tested Subsystems | Test Scenes | `claude3-haiku` | `llama-3-8b` | `deepseek-v2-chat` | |-------------------|-------------|-----------------|---------------|--------------------- | syscalls | 4604 | 2227 (48.37%) | 2504 (54.39%) | 2835 (62.58%) | History -------- - *2024.4.26* Initial version, use the `claude3-haiku` model to extract test scenes from linux kernel's manpages - 4604 test scenes generated, where for 2227 scenes we successfully generated runnable (and passed) test cases - 38,057,211 input tokens used, 10,715,788 output tokens used - 22.91$ costed - *2024.4.29* use `moonshot-v1-32k` model to make further attempt on previously failed test scenes - tried on 457 test scenes, where we succesfully generated test code for 170 scenes - 3,203,882 input tokens used, 736,716 output tokens used - 94.57 rmb costed - *2024.5.2* use locally deployed `llama3-8b-instruct` model to re-generate all test scenes on syscalls - tried on 4604 test scenes, 2504 of them succeeded - using a single Nvidia RTX 4090 - 30 rmb costed (electric cost ;) - *2024.5.7* use `deepseek-v2` model to re-generate all test scenes on syscalls - tried on 4604 test scenes, 2835 of them succeeded - 55 rmb costed