ai-starter-pack (AI Starter Pack)

posted an update 8 days ago

Post

292

Hi all,

I am working on a project for the Pytorch/HF OpenEnv challenge, and part of the challenge is that a participant/team needs to write a blog post (an Article) on HuggingFace about their submission. However, when I try to create an Article, it says I need a "pro" account.. but I already am a HF Pro member and have been for almost a year!

Is anyone else having this issue? I have a bunch of ideas for blog posts/articles so I'd really like to be able to access this feature, even outside of the OpenEnv Challenge. Can someone let me know if there's a way to fix this or something I'm missing?

Thanks,
Neo

1 reply

·

sagar007

posted an update about 1 month ago

Post

4150

🚀 I built a Multimodal Vision-Language Model from using Gemma-270M + CLIP!

Just finished training my multimodal model on the full LLaVA-Instruct-150K dataset (157K samples) and wanted to share the results!

🔧 What I Built:
A vision-language model that can understand images and answer questions about them, combining:
- Google Gemma-3-270M (language)
- OpenAI CLIP ViT-Large/14 (vision)
- LoRA fine-tuning for efficiency

📊 Training Stats:
- 157,712 training samples (full LLaVA dataset)
- 3 epochs on A100 40GB
- ~9 hours training time
- Final loss: 1.333 training / 1.430 validation
- Only 18.6M trainable params (3.4% of 539M total)

📈 sagar007/multigemma
Benchmark Results:
- VQA Accuracy: 53.8%
- Works great for: animal detection, room identification, scene understanding

🔗 **Try it yourself:**
- 🤗 Model: sagar007/multigemma
- 🎮 Demo: https://huggingface.co/spaces/sagar007/Multimodal-Gemma
- 💻 GitHub: https://github.com/sagar431/multimodal-gemma-270m

Built with PyTorch Lightning + MLflow for experiment tracking. Full MLOps pipeline with CI/CD!

Would love to hear your feedback! 🙏

#multimodal #gemma #clip #llava #vision-language #pytorch