--- license: apache-2.0 library_name: onnx tags: - depth-estimation - panoramic - 360-degree - webgpu - onnx pipeline_tag: depth-estimation --- # DA-2: Depth Anything in Any Direction (ONNX WebGPU Version) This repository contains the **ONNX** weights for [DA-2: Depth Anything in Any Direction](https://github.com/EnVision-Research/DA-2), optimized for **WebGPU** inference in the browser. ## Model Details - **Original Model:** [haodongli/DA-2](https://huggingface.co/haodongli/DA-2) - **Framework:** ONNX (Opset 17) - **Precision:** FP32 (Full Precision) - **Input Resolution:** 1092x546 - **Size:** ~1.4 GB ## Conversion Details This model was converted from the original PyTorch weights to ONNX to enable client-side inference using `onnxruntime-web`. - **Optimization:** Constant folding applied. - **Compatibility:** Verified with WebGPU backend. - **Modifications:** - Replaced `clamp` operators with `Max`/`Min` combinations to ensure WebGPU kernel compatibility. - Removed internal normalization layers to allow raw 0-1 input from the browser. ## Usage (Transformers.js) You can also run this model using [Transformers.js](https://huggingface.co/docs/transformers.js). ```javascript import { pipeline } from '@xenova/transformers'; // Initialize the pipeline const depth_estimator = await pipeline('depth-estimation', 'phiph/DA-2-WebGPU', { device: 'webgpu', dtype: 'fp32', // Use FP32 as exported }); // Run inference const url = 'path/to/your/panorama.jpg'; const output = await depth_estimator(url); // output.depth is the raw tensor // output.mask is the visualized depth map ``` ## Usage (ONNX Runtime Web) You can run this model in the browser using `onnxruntime-web`. ```javascript import * as ort from 'onnxruntime-web/webgpu'; // 1. Initialize Session // Note: Model is now in the 'onnx' subdirectory const session = await ort.InferenceSession.create('https://huggingface.co/phiph/DA-2-WebGPU/resolve/main/onnx/model.onnx', { executionProviders: ['webgpu'], preferredOutputLocation: { last_hidden_state: 'gpu-buffer' } }); // 2. Prepare Input (Float32, 0-1 range, NCHW) // Note: Do NOT apply ImageNet mean/std normalization. The model expects raw 0-1 floats. const tensor = new ort.Tensor('float32', float32Data, [1, 3, 546, 1092]); // 3. Run Inference const results = await session.run({ images: tensor }); const depthMap = results.depth; // Access output ``` ## License This model is a derivative work of [DA-2](https://github.com/EnVision-Research/DA-2) and is distributed under the **Apache License 2.0**. Please cite the original authors if you use this model: ```bibtex @article{li2025depth, title={DA$^{2}$: Depth Anything in Any Direction}, author={Li, Haodong and Zheng, Wangguangdong and He, Jing and Liu, Yuhao and Lin, Xin and Yang, Xin and Chen, Ying-Cong and Guo, Chunchao}, journal={arXiv preprint arXiv:2509.26618}, year={2025} } ```