DBNet for Text Detection

Model Hub: shuzi-mewtant/dbnet_res18_text_detection_v0.1

This is a DBNet model for text detection, ported to Hugging Face Transformers. It uses a ResNet-18 backbone and Feature Pyramid Network (FPN) for multi-scale feature fusion. The model was trained on the ICDAR 2015 dataset and supports detecting text in natural images.

Usage

from transformers import pipeline

# Load pipeline
ocr_pipe = pipeline(
    "object-detection",
    model="shuzi-mewtant/dbnet_res18_text_detection_v0.1",
    trust_remote_code=True
)

# Run inference
image_path = "path/to/image.jpg"
results = ocr_pipe(image_path)

for res in results:
    print(f"Box: {res['box']}, Score: {res['score']}")

Local usage

If you have downloaded the model locally, you can use it directly:

from transformers import AutoModel, AutoImageProcessor
from pipeline import DBNetPipeline

model_path = "/path/to/dbnet_res18_text_detection_v0.1"
model = AutoModel.from_pretrained(model_path, trust_remote_code=True)
processor = AutoImageProcessor.from_pretrained(model_path, trust_remote_code=True)
pipe = DBNetPipeline(model=model, image_processor=processor, task="object-detection")

results = pipe("path/to/image.jpg")
print(f"Found {len(results)} text regions")
for result in results:
    box = result['box']
    score = result['score']
    print(f"Box: {box}, Score: {score:.3f}")

Testing the model

You can test the model using the provided test script:

cd /path/to/dbnet_res18_text_detection_v0.1
python test_model.py

Model Details

Architecture: DBNet with ResNet-18 backbone and FPN
Input size: 1024x1024 pixels (automatically padded/resized)
Output: 3-channel probability maps (shrink, threshold, binary)
Training data: ICDAR 2015 dataset
Normalization: RGB with mean [123.675, 116.28, 103.53] and std [58.395, 57.12, 57.375]

Performance

The model achieves competitive performance on text detection benchmarks:

Trained on ICDAR 2015 dataset
Supports detection of horizontal and oriented text
Post-processing includes NMS and box expansion for better localization

Installation

pip install torch torchvision transformers opencv-python pillow safetensors

Or install from the requirements file:

pip install -r requirements.txt

Downloads last month: 51