DA-2-WebGPU / temp_hf_repo /HF_MODEL_CARD.md
phiph's picture
Upload folder using huggingface_hub
7382c66 verified
metadata
license: apache-2.0
library_name: onnx
tags:
  - depth-estimation
  - panoramic
  - 360-degree
  - webgpu
  - onnx
pipeline_tag: depth-estimation

DA-2: Depth Anything in Any Direction (ONNX WebGPU Version)

This repository contains the ONNX weights for DA-2: Depth Anything in Any Direction, optimized for WebGPU inference in the browser.

Model Details

  • Original Model: haodongli/DA-2
  • Framework: ONNX (Opset 17)
  • Precision: FP32 (Full Precision)
  • Input Resolution: 1092x546
  • Size: ~1.4 GB

Conversion Details

This model was converted from the original PyTorch weights to ONNX to enable client-side inference using onnxruntime-web.

  • Optimization: Constant folding applied.
  • Compatibility: Verified with WebGPU backend.
  • Modifications:
    • Replaced clamp operators with Max/Min combinations to ensure WebGPU kernel compatibility.
    • Removed internal normalization layers to allow raw 0-1 input from the browser.

Usage (Transformers.js)

You can also run this model using Transformers.js.

import { pipeline } from '@xenova/transformers';

// Initialize the pipeline
const depth_estimator = await pipeline('depth-estimation', 'phiph/DA-2-WebGPU', {
    device: 'webgpu',
    dtype: 'fp32', // Use FP32 as exported
});

// Run inference
const url = 'path/to/your/panorama.jpg';
const output = await depth_estimator(url);
// output.depth is the raw tensor
// output.mask is the visualized depth map

Usage (ONNX Runtime Web)

You can run this model in the browser using onnxruntime-web.

import * as ort from 'onnxruntime-web/webgpu';

// 1. Initialize Session
// Note: Model is now in the 'onnx' subdirectory
const session = await ort.InferenceSession.create('https://huggingface.co/phiph/DA-2-WebGPU/resolve/main/onnx/model.onnx', {
    executionProviders: ['webgpu'],
    preferredOutputLocation: { last_hidden_state: 'gpu-buffer' }
});

// 2. Prepare Input (Float32, 0-1 range, NCHW)
// Note: Do NOT apply ImageNet mean/std normalization. The model expects raw 0-1 floats.
const tensor = new ort.Tensor('float32', float32Data, [1, 3, 546, 1092]);

// 3. Run Inference
const results = await session.run({ images: tensor });
const depthMap = results.depth; // Access output

License

This model is a derivative work of DA-2 and is distributed under the Apache License 2.0.

Please cite the original authors if you use this model:

@article{li2025depth,
  title={DA$^{2}$: Depth Anything in Any Direction},
  author={Li, Haodong and Zheng, Wangguangdong and He, Jing and Liu, Yuhao and Lin, Xin and Yang, Xin and Chen, Ying-Cong and Guo, Chunchao},
  journal={arXiv preprint arXiv:2509.26618},
  year={2025}
}