Improve Model Card: Add Metadata, Paper Link, and Code Link (#1)

- Improve Model Card: Add Metadata, Paper Link, and Code Link (602f66b0b6434e63a8676fe6541bebe13fbd1a67)

Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,8 +1,9 @@
 ---
 license: mit
 ---
 <div align="center">
 # Aether: Geometric-Aware Unified World Modeling
@@ -20,6 +21,8 @@ license: mit
 <a href='https://huggingface.co/spaces/AmberHeart/AetherV1'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-blue'></a> &nbsp;
 </div>
 Aether addresses a fundamental challenge in AI: integrating geometric reconstruction with generative modeling
 for human-like spatial reasoning. Our framework unifies three core capabilities: (1) **4D dynamic reconstruction**,
 (2) **action-conditioned video prediction**, and (3) **goal-conditioned visual planning**. Trained entirely on
@@ -29,6 +32,7 @@ synthetic data, Aether achieves strong zero-shot generalization to real-world sc
     <img src="assets/teaser.png" alt="Teaser" width="800"/>
 </div>
 ## 📝 Citation
 If you find this work useful in your research, please consider citing:
@@ -60,4 +64,4 @@ Our work is primarily built upon
 [DroidCalib](https://github.com/boschresearch/DroidCalib),
 [Grounded-SAM-2](https://github.com/IDEA-Research/Grounded-SAM-2),
 [ceres-solver](https://github.com/ceres-solver/ceres-solver), etc.
-We extend our gratitude to all these authors for their generously open-sourced code and their significant contributions to the community.

 ---
 license: mit
+pipeline_tag: image-to-video
+library_name: CogVideoX
 ---
 <div align="center">
 # Aether: Geometric-Aware Unified World Modeling
 <a href='https://huggingface.co/spaces/AmberHeart/AetherV1'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-blue'></a> &nbsp;
 </div>
+This repository contains the model used in the paper [Aether: Geometric-Aware Unified World Modeling](https://arxiv.org/abs/2503.18945).
 Aether addresses a fundamental challenge in AI: integrating geometric reconstruction with generative modeling
 for human-like spatial reasoning. Our framework unifies three core capabilities: (1) **4D dynamic reconstruction**,
 (2) **action-conditioned video prediction**, and (3) **goal-conditioned visual planning**. Trained entirely on
     <img src="assets/teaser.png" alt="Teaser" width="800"/>
 </div>
+Find the code at https://github.com/OpenRobotLab/Aether.
 ## 📝 Citation
 If you find this work useful in your research, please consider citing:
 [DroidCalib](https://github.com/boschresearch/DroidCalib),
 [Grounded-SAM-2](https://github.com/IDEA-Research/Grounded-SAM-2),
 [ceres-solver](https://github.com/ceres-solver/ceres-solver), etc.
+We extend our gratitude to all these authors for their generously open-sourced code and their significant contributions to the community.