DBNet for Text Detection
Model Hub: shuzi-mewtant/dbnet_res18_text_detection_v0.1
This is a DBNet model for text detection, ported to Hugging Face Transformers. It uses a ResNet-18 backbone and Feature Pyramid Network (FPN) for multi-scale feature fusion. The model was trained on the ICDAR 2015 dataset and supports detecting text in natural images.
Usage
from transformers import pipeline
# Load pipeline
ocr_pipe = pipeline(
"object-detection",
model="shuzi-mewtant/dbnet_res18_text_detection_v0.1",
trust_remote_code=True
)
# Run inference
image_path = "path/to/image.jpg"
results = ocr_pipe(image_path)
for res in results:
print(f"Box: {res['box']}, Score: {res['score']}")
Local usage
If you have downloaded the model locally, you can use it directly:
from transformers import AutoModel, AutoImageProcessor
from pipeline import DBNetPipeline
model_path = "/path/to/dbnet_res18_text_detection_v0.1"
model = AutoModel.from_pretrained(model_path, trust_remote_code=True)
processor = AutoImageProcessor.from_pretrained(model_path, trust_remote_code=True)
pipe = DBNetPipeline(model=model, image_processor=processor, task="object-detection")
results = pipe("path/to/image.jpg")
print(f"Found {len(results)} text regions")
for result in results:
box = result['box']
score = result['score']
print(f"Box: {box}, Score: {score:.3f}")
Testing the model
You can test the model using the provided test script:
cd /path/to/dbnet_res18_text_detection_v0.1
python test_model.py
Model Details
- Architecture: DBNet with ResNet-18 backbone and FPN
- Input size: 1024x1024 pixels (automatically padded/resized)
- Output: 3-channel probability maps (shrink, threshold, binary)
- Training data: ICDAR 2015 dataset
- Normalization: RGB with mean [123.675, 116.28, 103.53] and std [58.395, 57.12, 57.375]
Performance
The model achieves competitive performance on text detection benchmarks:
- Trained on ICDAR 2015 dataset
- Supports detection of horizontal and oriented text
- Post-processing includes NMS and box expansion for better localization
Installation
pip install torch torchvision transformers opencv-python pillow safetensors
Or install from the requirements file:
pip install -r requirements.txt
- Downloads last month
- 51