# Rex-Omni **Repository Path**: lgsg/Rex-Omni ## Basic Information - **Project Name**: Rex-Omni - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 0 - **Created**: 2025-11-25 - **Last Updated**: 2025-12-01 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

| [code](tutorials/detection_example/detection_example.py) | [notebook](tutorials/detection_example/_full_notebook.ipynb) |
| | `object referring` |
| [code](tutorials/detection_example/referring_example.py) | [notebook](tutorials/detection_example/_full_notebook.ipynb) |
| | `gui grounding` |
| [code](tutorials/detection_example/gui_grounding_example.py) | [notebook](tutorials/detection_example/_full_notebook.ipynb) |
| | `layout grounding` |
| [code](tutorials/detection_example/layout_grouding_examle.py) | [notebook](tutorials/detection_example/_full_notebook.ipynb) |
| Pointing | `object pointing` |
| [code](tutorials/pointing_example/object_pointing_example.py) | [notebook](tutorials/pointing_example/_full_notebook.ipynb) |
| | `gui pointing` |
| [code](tutorials/pointing_example/gui_pointing_example.py) | [notebook](tutorials/pointing_example/_full_notebook.ipynb) |
| | `affordance pointing` |
| [code](tutorials/pointing_example/affordance_pointing_example.py) | [notebook](tutorials/pointing_example/_full_notebook.ipynb) |
| Visual prompting | `visual prompting` |
| [code](tutorials/visual_prompting_example/visual_prompt_example.py) | [notebook](tutorials/visual_prompting_example/_full_tutorial.ipynb) |
| OCR | `ocr word box` |
| [code](tutorials/ocr_example/ocr_word_box_example.py) | [notebook](tutorials/ocr_example/_full_tutorial.ipynb) |
| | `ocr textline box` |
| [code](tutorials/ocr_example/ocr_textline_box_example.py) | [notebook](tutorials/ocr_example/_full_tutorial.ipynb) |
| | `ocr polygon` |
| [code](tutorials/ocr_example/ocr_polygon_example.py) | [notebook](tutorials/ocr_example/_full_tutorial.ipynb) |
| Keypointing | `person keypointing` |
| [code](tutorials/keypointing_example/person_keypointing_example.py) | [notebook](tutorials/keypointing_example/_full_tutorial.ipynb)|
| | `animal keypointing` |
| [code](tutorials/keypointing_example/animal_keypointing_example.py) | [notebook](tutorials/keypointing_example/_full_tutorial.ipynb) |
| Other | `batch inference` | | [code](tutorials/other_example/batch_inference.py) ||
## 4. Applications of Rex-Omni
Rex-Omni's unified detection framework enables seamless integration with other vision models.
| Application | Description | Demo | Documentation |
|:------------|:------------|:----:|:-------------:|
| **Rex-Omni + SAM** | Combine language-driven detection with pixel-perfect segmentation. Rex-Omni detects objects β SAM generates precise masks |
| [README](applications/_1_rexomni_sam/README.md) |
| **Grounding Data Engine** | Automatically generate phrase grounding annotations from image captions using spaCy and Rex-Omni. |
| [README](applications/_2_automatic_grounding_data_engine/README.md) |
## 5. Gradio Demo

We provide an interactive Gradio demo that allows you to test all Rex-Omni capabilities through a web interface.
### Quick Start
```bash
# Launch the demo
CUDA_VISIBLE_DEVICES=0 python app.py --model_path IDEA-Research/Rex-Omni
# With custom settings
CUDA_VISIBLE_DEVICES=0 python app.py \
--model_path IDEA-Research/Rex-Omni \
--backend vllm \
--server_name 0.0.0.0 \
--server_port 7890
```
### Available Options
- `--model_path`: Model path or HuggingFace repo ID (default: "IDEA-Research/Rex-Omni")
- `--backend`: Backend to use - "transformers" or "vllm" (default: "transformers")
- `--server_name`: Server host address (default: "192.168.81.138")
- `--server_port`: Server port (default: 5211)
- `--temperature`: Sampling temperature (default: 0.0)
- `--top_p`: Nucleus sampling parameter (default: 0.05)
- `--max_tokens`: Maximum tokens to generate (default: 2048)
## 6. Evaluation
Please refer to [Evaluation](evaluation/README.md) for more details.
## 7. Fine-tuning Rex-Omni
Please refer to [Fine-tuning Rex-Omni](finetuning/README.md) for more details.
## 8. LICENSE
Rex-Omni is licensed under the [IDEA License 1.0](LICENSE), Copyright (c) IDEA. All Rights Reserved. This model is based on Qwen, which is licensed under the [Qwen RESEARCH LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct/blob/main/LICENSE), Copyright (c) Alibaba Cloud. All Rights Reserved.
## 9. Citation
Rex-Omni comes from a series of prior works. If youβre interested, you can take a look.
- [RexThinker](https://arxiv.org/abs/2506.04034)
- [RexSeek](https://arxiv.org/abs/2503.08507)
- [ChatRex](https://arxiv.org/abs/2411.18363)
- [DINO-X](https://arxiv.org/abs/2411.14347)
- [Grounidng DINO 1.5](https://arxiv.org/abs/2405.10300)
- [T-Rex2](https://link.springer.com/chapter/10.1007/978-3-031-73414-4_3)
- [T-Rex](https://arxiv.org/abs/2311.13596)
```text
@misc{jiang2025detectpointprediction,
title={Detect Anything via Next Point Prediction},
author={Qing Jiang and Junan Huo and Xingyu Chen and Yuda Xiong and Zhaoyang Zeng and Yihao Chen and Tianhe Ren and Junzhi Yu and Lei Zhang},
year={2025},
eprint={2510.12798},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2510.12798},
}
```