从零开始实践：MMRotate v0.3.4旋转框目标检测全流程解析

张

张建站

2026/4/11 21:19:23

10分钟阅读

1. 为什么选择MMRotate做旋转框检测旋转框目标检测是计算机视觉中一个非常实用的方向尤其在遥感图像、文字检测、自动驾驶等场景中特别常见。传统的水平框检测会带来很多冗余区域而旋转框能更精准地框选目标。我第一次接触这个需求是在处理航拍图像时电线杆、风力发电机这些细长物体用水平框标注会包含大量无效背景。MMRotate作为OpenMMLab生态的旋转框检测工具箱最大的优势是开箱即用。它基于PyTorch框架内置了R3Det、RoI Transformer等经典算法支持DOTA、HRSC等主流数据集格式。v0.3.4版本相比之前优化了训练速度新增了对自定义数据集的友好支持。实测下来用RTX 3090显卡训练500张DOTA数据集单卡batch_size4的情况下每个epoch只需12分钟。对于初学者来说MMRotate的模块化设计让整个流程变得清晰。配置文件系统把数据预处理、模型结构、训练策略都拆分开来改配置就像搭积木一样简单。我特别喜欢它的配置文件继承机制比如要修改学习率策略时不需要重写整个训练配置只需要继承基础配置然后覆盖相关参数就行。2. 环境搭建避坑指南2.1 硬件选择与驱动配置建议使用NVIDIA显卡GTX 1080Ti及以上AMD显卡在PyTorch下的支持不太稳定。我遇到过最头疼的问题就是CUDA版本冲突特别是30/40系显卡用户要注意# 查看显卡驱动版本 nvidia-smi # 查看CUDA编译器版本 nvcc --version如果使用RTX 3060等30系显卡CUDA 11.3是较稳妥的选择。实测在Ubuntu 20.04系统下以下组合最稳定驱动版本515.65.01CUDA Toolkit11.3.1cuDNN8.2.12.2 Python环境配置强烈建议使用conda创建独立环境避免包冲突。我习惯用Python 3.8这个版本对各种库的兼容性最好conda create -n mmrotate python3.8 -y conda activate mmrotate安装PyTorch时要特别注意与CUDA版本的对应关系。对于CUDA 11.3的用户pip install torch1.12.1cu113 torchvision0.13.1cu113 --extra-index-url https://download.pytorch.org/whl/cu1132.3 MMRotate全家桶安装推荐使用OpenMMLab的mim工具自动处理依赖pip install -U openmim mim install mmcv-full1.7.1 mim install mmdet2.28.2安装MMRotate本体时有个小技巧先克隆仓库再通过pip安装开发模式这样修改代码后无需重新安装git clone https://github.com/open-mmlab/mmrotate.git cd mmrotate pip install -v -e . # 注意结尾的点不能少验证安装是否成功可以运行python demo/image_demo.py demo/demo.jpg \ configs/oriented_rcnn/oriented_rcnn_r50_fpn_1x_dota_le90.py \ https://download.openmmlab.com/mmrotate/v0.1.0/oriented_rcnn/oriented_rcnn_r50_fpn_1x_dota_le90/oriented_rcnn_r50_fpn_1x_dota_le90-6d2b2ce0.pth \ --out-file result.jpg如果看到生成的result.jpg上有旋转框标注说明环境配置正确。3. 数据处理实战技巧3.1 数据标注工具选择roLabelImg是旋转框标注的神器支持快捷键操作w创建旋转矩形框Ctrl鼠标滚轮调整角度空格键确认标注标注完成后会生成PASCAL VOC格式的XML文件。我标注100张图像的平均耗时约2小时熟练后可以缩短到1.5小时。3.2 数据格式转换MMRotate支持DOTA、HRSC等多种格式。以自定义数据集转DOTA格式为例需要将XML转换为TXT# xml_to_dota.py import xml.etree.ElementTree as ET import os def convert(xml_path, txt_path): tree ET.parse(xml_path) root tree.getroot() with open(txt_path, w) as f: for obj in root.findall(object): robndbox obj.find(robndbox) x1, y1 robndbox.find(x1).text, robndbox.find(y1).text # 写入8个坐标点和类别 f.write(f{x1} {y1} ... {obj.find(name).text} 0\n)3.3 数据集划分与增强建议按7:2:1划分训练/验证/测试集。可以使用sklearn的train_test_splitfrom sklearn.model_selection import train_test_split # 第一次拆分训练临时集 X_train, X_temp train_test_split(file_list, test_size0.3, random_state42) # 第二次拆分验证和测试 X_val, X_test train_test_split(X_temp, test_size0.33, random_state42)对于小数据集推荐使用MMRotate内置的MultiScaleFlipAug增强train_pipeline [ dict(typeLoadImageFromFile), dict(typeMultiScaleFlipAug, img_scale(1024, 1024), flipFalse, transforms[ dict(typeResize, keep_ratioTrue), dict(typeRandomFlip), dict(typeNormalize, mean[123.675, 116.28, 103.53], std[58.395, 57.12, 57.375]), dict(typePad, size_divisor32), dict(typeImageToTensor, keys[img]), dict(typeCollect, keys[img]), ]) ]4. 模型训练与调优4.1 配置文件详解MMRotate采用模块化配置主要修改三个文件模型配置如configs/r3det/r3det_tiny_r50_fpn_1x_dota_oc.pymodel dict( typeR3Det, backbonedict( typeResNet, depth50, num_stages4, out_indices(0, 1, 2, 3), frozen_stages1, # 冻结前1个stage的权重 norm_cfgdict(typeBN, requires_gradTrue)) )数据配置configs/base/datasets/dotav1.pydata dict( samples_per_gpu4, # 单卡batch size workers_per_gpu2, # 数据加载线程数 traindict( typeDOTADataset, ann_filedata/dota/train/annfiles/, img_prefixdata/dota/train/images/), valdict(...), testdict(...) )训练策略configs/base/schedules/schedule_1x.pyoptimizer dict(typeSGD, lr0.005, momentum0.9, weight_decay0.0001) lr_config dict( policystep, warmuplinear, warmup_iters500, warmup_ratio0.001, step[8, 11]) # 在第8和11个epoch降低学习率4.2 训练启动命令单卡训练python tools/train.py configs/r3det/r3det_tiny_r50_fpn_1x_dota_oc.py \ --work-dir work_dirs/r3det_dota多卡训练4卡./tools/dist_train.sh configs/r3det/r3det_tiny_r50_fpn_1x_dota_oc.py 4 \ --work-dir work_dirs/r3det_dota4.3 训练监控技巧学习率调整如果验证集mAP波动大可以减小lr并增加warmup步数梯度裁剪在optimizer_config中添加grad_clipdict(max_norm35, norm_type2)早停机制配置evaluation dict(interval1, metricmAP, save_bestauto)5. 模型测试与部署5.1 测试脚本使用单张图像测试python demo/image_demo.py test.jpg \ configs/oriented_rcnn/oriented_rcnn_r50_fpn_1x_dota_le90.py \ work_dirs/r3det_dota/latest.pth \ --out-file result.jpg批量测试并评估./tools/dist_test.sh configs/r3det/r3det_tiny_r50_fpn_1x_dota_oc.py \ work_dirs/r3det_dota/latest.pth 4 \ --eval mAP5.2 模型导出为ONNXfrom mmdet.apis import init_detector, export_model config_file configs/r3det/r3det_tiny_r50_fpn_1x_dota_oc.py checkpoint_file work_dirs/r3det_dota/latest.pth export_model(config_file, checkpoint_file, output_filer3det.onnx)5.3 性能优化技巧TensorRT加速使用mmdeploy工具转换模型量化压缩尝试QATQuantization Aware Training模型剪枝基于mmrazor进行通道剪枝我在实际项目中将R3Det模型从187MB压缩到43MB推理速度从45ms提升到22msRTX 3090。关键是要在模型大小和精度之间找到平衡点一般建议先保证mAP0.8再考虑优化。

Gemma-3-270m应用场景：政务公文润色、政策文件要点速读生成案例

Gemma-3-270m应用场景：政务公文润色、政策文件要点速读生成案例 1. 引言：当轻量级AI遇上公文处理你有没有遇到过这样的场景？一份冗长的政策文件需要快速提炼核心要点，或者一份起草好的公文需要润色得更加严谨、得体。传统的人工…...

2026/4/11 21:16:11 阅读更多 →

Verilog新手避坑指南：用Icarus Verilog写Testbench时，$dumpfile和$dumpvars这两行到底有什么用？

Verilog仿真核心机制解析：$dumpfile与$dumpvars的底层逻辑与实战技巧刚接触Verilog仿真的开发者，往往会在Testbench中看到这两行神秘的代码： $dumpfile("waveform.vcd"); $dumpvars(0, top_module);它们像黑魔法咒语一样被复制粘贴…...

2026/4/11 21:15:08 阅读更多 →