Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

eaddario 
posted an update 2 days ago
view post
Post
2914
Experimental global target bits‑per‑weight quantization of Qwen/Qwen3.6-27B and Qwen/Qwen3.6-35B-A3B.

Unlike standard llama.cpp quantizations that rely on fixed type heuristics (e.g., Q4_K_M), the Target BPW approach optimizes per-tensor precision where it matters the most, and produces high quality models that meet a precise global file size target.

Key Advantages:
- VRAM Maximization: Can generate high quality models sized exactly to fit hardware constraints (e.g., fitting the model into exactly 24GB VRAM).
- Data-Driven Precision: Quantization mix is determined by actual weight error sensitivity rather than hardcoded rules, often yielding better PPL/KLD size trade-offs.

Full benchmarks (PPL, KLD, ARC, GPQA, MMLU, etc.) and methodology in the models' cards.

eaddario/Qwen3.6-27B-GGUF
eaddario/Qwen3.6-35B-A3B-GGUF
cihatyldz 
posted an update 2 days ago
view post
Post
3364
Şifahane, a dual-inference medical classification demo, is now live on Spaces. It features side-by-side Turkish BERT and Qwen2.5 architectures for real-time evaluation of the "Classifier vs. LLM" trade-offs, all within a single space. The system utilizes a fine-tuned Turkish BERT for high-speed, cost-effective inference and the Qwen2.5-7B model for flexible multi-task reasoning, with support for department classification, condition analysis, urgency assessment, and rationale generation across 12 medical departments.


🧠 BERT model: https://lnkd.in/dCUUASqq
📊 Dataset: https://lnkd.in/dGK9y24w
🤗 Demo: https://lnkd.in/dtWjCCPF
salma-remyx 
posted an update 2 days ago
view post
Post
4933
SciCrafter measured something AI practitioners have intuited: frontier agents are improving at executing inside well-framed problems, but lag at framing the problem in the first place.

GPT-5.2, Gemini-3-Pro, and Claude Opus 4.5 all plateaued near 26% on a new Minecraft benchmark for probing AI capabilities in the discovery-to-application loop.

So the authors ran targeted interventions:
* Hints about what to investigate doubled performance.
* A structured experimentation template added 7-14 more points.
* Structured consolidation beat free-form summaries by 6 points.
* Curriculum context beat independent task-solving.

These interventions helped the agent frame what’s worth investigating, and structure what gets learned so it compounds. The bottleneck for AI in scientific workflows is upstream of execution.

Their findings are congruent with the design patterns we've adopted at Remyx AI to help AI teams close the development loop scientifically.

Agents work well inside structured loops, but they perform poorly when tasked with creating the structure. Instrumenting your scientific workflows offers greater leverage than scaling compute with a less informed search.

In the work of building production AI systems, teams are flying through execution. The bigger challenge is identifying which experiments moved which production outcome, or what to try next.

One of the more interesting results I found this week by tracking work in AI for scientific workflows using Remyx: https://engine.remyx.ai/papers/d8f23b9b-b14b-4ada-b44e-ccfc221c06b4
Crownelius 
posted an update 2 days ago
view post
Post
5083
Day 3 - 05/02/2026
Scamp ships, hits the wall. New plan...

Scamp came back from training today... Didn't go so well, I'm still unsure...

Fast benchmark, temperature 0.7, top_p 0.9:
- "Capital of France is" produced "covered by the Crown" (grammatical, factually wrong)
- "23 + 19 = ?" produced "23. Answer: 23. Answer: 23..." (loops, math broken)
- "def fibonacci(n):" produced a list of letters

It speaks English. It can't reason. At 8K vocab and 50M params, it was never going to.

Next build: 412M MoE-3E. Three experts (math, language, code), top-1 routing, random init, let specialization emerge from gradient signal alone. Tried seeded Branch-Train-MiX first then dropped it. Adds compute for no clear win when the router will find its own attractors anyway.

Big lesson today came from limit testing on A100 80GB. Surprise, every planned phase ran out of memory even on 80GB. Root cause: at vocab 262144 (Gemma 3 standard), the output logits dominate during forward and backward. Fix: Liger Kernel's fused cross-entropy. It streams the loss computation instead of materialising the full B by T by vocab tensor. Without it the build would not run.

Scamp proved the pipeline runs end-to-end on real hardware. The 412M run starts tomorrow. If routing balances naturally and math finally crystallises, ships as Crowfeather-412M-3E with GGUF in F16, Q8, Q5, and Q4.

So... the training may have produced a poet if I had done it better. But I didn't, so instead... we get a malformed robot named Scamp... This is progress.

-Shane

P.S Join discord for discussion: https://discord.gg/8ZscHNmJYE and
I post my finished stuff here:
CompactAI-O
  • 2 replies
·
AbstractPhil 
posted an update 3 days ago
view post
Post
2657
By trying to disprove the Omega H2 battery I have discovered;
* Each topology formed by the H2 battery is deviant, none have a uniformly shared substrate of behavior. They are each uniquely independent per training set all with perfect recon.
* Image recon can be tracked and mapped, yielding a consistently mapped and response 16.77m vocabulary potential. In the current spectrum testing at around 5 million unicode bytes.
* The model scale shows patch size is related to how much data you want the model to represent within the model itself, and this has yet to see a capacity to this day. The MSE recons and yields - and the more data fed, the more they yield.
* The scaling principle shows that the model indefinitely scales upward and each level of the model can be iteratively captured upward to form deviant and uniformly consistent repeatable pathways of implicit codewise response, not just arbitrary bitwise recall. Meaningful implicit learned utility.
* Image recon patch size should match the slice of image you want to represent, as it uses patch smoothing per patch internally from identity.
* byte trigrams are channel-agnostic, they do not require a channel count just a formula for recall at nGram recall 99.6% for byte-by-byte representations. With those comes an adjacently capable codebook.
* sentencepiece preliminary tests show validity and reconstruction just like the byte trigrams, using the new byte trigram this would be arbitrarily convenient to recon a codebook for the structure.
* binary trees learn a uniformly potent and powerful gating mechanism that required further exploration, each of them produces direct responsive independent capacity and the responses are controllable.
* ternary experiments show the models are directly responsive to -1, 0, +1 behavior, so the quantization is very much a valid potential.
* preliminary tests with the H2O1 series of batteries show the models are responding similar to natural universal elements in the universe itself
  • 7 replies
·
kanaria007 
posted an update 1 day ago
view post
Post
146
✅ Article highlight: *Runtime Admissibility and Barrier Objects* (art-60-226, v0.1)

TL;DR:
This article turns runtime admissibility into a first-class object family.

A governed runtime should not rely on scattered booleans, warning banners, or hidden branches to decide whether an effect may proceed. It should evaluate the requested effect under an explicit *barrier object*, emit a normalized verdict, record the resulting runtime posture, and preserve the full lineage if the path later degrades, reopens, or reenters.

Read:
kanaria007/agi-structural-intelligence-protocols

Why it matters:
• turns “was this allowed?” into a replayable governance question
• makes runtime gating portable and auditable instead of implementation-specific branching
• distinguishes degraded postures that are operationally different even when they normalize to the same exported verdict
• prevents history laundering by requiring explicit reopen and reentry lineage

What’s inside:
• the core idea that a *barrier* is an effect-admissibility object
• a minimal artifact family: *BarrierObject*, *BarrierInputSet*, *AdmissibilityVerdict*, and *RuntimePostureRecord*
• explicit runtime postures such as *REVIEW_ONLY*, *LOCAL_ONLY*, *RECEIPT_ONLY*, *SANDBOX_ONLY*, *BLOCKED*, and *REENTERED*
• the rule that DEGRADE alone is not enough; the posture must also be explicit
• append-only lineage across barrier creation, verdict emission, degraded posture, reopen trigger, reentry, and closure

Key idea:
A governed runtime should not merely say:

*“this action was allowed.”*

It should be able to say:

*“this requested effect was evaluated under this barrier, against this input set, with this verdict, in this runtime posture, for these reasons, and along this replayable lineage.”*
Crownelius 
posted an update 3 days ago
view post
Post
3592
[DAY TWO] PROJECT CROWFEATHER - 5/1/2026
Que sera, what will he be?

Step 47,500 of 100,000. Loss hovering around 2.76 on 6.2B tokens. Throughput steady at 87k per second on the A100. Not a GH200, but she gets it done.

Still haven't named him. Scamp has a rascally charm. Quentin sounds like he'd wear a bow tie and think hard before speaking. Taking votes.

Phase two is what's keeping me up. Datasets everywhere and I can't pick. I'm fusing Google and DeepSeek's ideas: Gemma 4's alternating sliding and global attention, DeepSeek V4's Muon optimizer and WSD scheduler, Gemma 2's logit soft cap, and PaLM's z-loss. Sounds like peanut butter on a hamburger, but the loss curve says it works.

Tribe_v2 has real potential but needs more scaffolding than a barn raising before I throw it in. One thing's certain though. This model's gonna be a thinker. Not a Wikipedia parrot. Something that chews before it answers.

Finally got a use for my less popular datasets too. Some Opus-4.5-Writing-Style for polish. A few rows of Human-Archtypes-25k to see what personality bubbles up. Could be a poet, could be a grump. Either beats a flimsy fine-tune.

The bank's after my credit card. Until then, full steam.

Next model gets graphs. I swear.

-Shane
  • 3 replies
·
prithivMLmods 
posted an update 3 days ago
view post
Post
3390
Multimodal-Edge Demo, a node-based inference canvas demo, is now live on Spaces. It features node-based Transformers for fast inference across 10+ edge-device multimodal models on the Hub, all within a single space. The series includes models from Qwen3.5, Qwen3-VL, Gemma 4, and the LFM 2.5 VL model series, with support for reasoning and grounding tasks.

🤗 Demo: prithivMLmods/Multimodal-Edge-Node
🔗 GitHub: https://github.com/PRITHIVSAKTHIUR/Multimodal-Edge-Node
✅ Multimodal Apps Collections: https://huggingface.co/collections/prithivMLmods/hall-of-multimodal-apps

🤗 > To learn more, visit the app page or the respective model pages.
salma-remyx 
posted an update about 1 hour ago
view post
Post
VQASynth is the open source implementation of the SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities (2401.12168) paper, putting together the data synthesis pipeline behind remyxai/SpaceQwen2.5-VL-3B-Instruct, remyxai/SpaceThinker-Qwen2.5VL-3B, and several other spatial reasoning models we've shared on here on HF.

Here's how we use Remyx AI to build and improve VQASynth from the original concept forward.

Stage 1: When you connect a repo to Remyx, we extract development milestones from the commit history. For VQASynth, that surfaces the moments we changed how scenes get parsed, how captions get generated, how spatial relations get encoded. Those milestones power personalized recommendations for methods semantically relevant to improving your system.

Stage 2: When the model is serving in production, that same commit history delineates changes so you can learn from quasi-experiments through observational outcomes. This generates causal evidence about which changes drove which outcomes, sharpens recommendations, and supports inference on questions you haven't directly tested.

Stage 3: Once teams are running controlled experiments, the intervention outcomes tighten those estimates further.

Stage 4: When A/B testing becomes the operational bottleneck, we instrument decision points in the production system to explore via counterfactual perturbations. Initially in shadow mode, and after passing audits, with live traffic.

If you want recommendations tuned to your own project context, you can set up a feed here: https://docs.remyx.ai/platform/discover/feed
MikeDoes 
posted an update about 14 hours ago
view post
Post
77
AI4Privacy datasets are being used to decide what data should never leave the device.

A new paper on privacy-preserving cloud computing uses the AI4Privacy PII-Masking-65K dataset to train models that classify text as private or public before it’s ever sent to the cloud.

This is a subtle but important shift.

Instead of encrypting everything or trusting the cloud by default, the authors ask a simpler question:

Can we detect sensitive text early enough to keep it local?

Using DistilBERT, trained partly on AI4Privacy PII data, the system learns to:

route private text to local processing

send non-sensitive text to the cloud

train collaboratively using federated learning, without sharing raw data

The result:

99.9% accuracy in private vs public text detection

Near-centralized performance in downstream tasks like SMS spam detection

Privacy protection enforced by design, not policy

What stands out here is not just the model performance, but the architectural idea:
privacy as a routing decision, backed by large-scale PII annotations.

This work reinforces a pattern we keep seeing: scalable privacy systems don’t start with encryption, they start with good PII data.

📄 Full Paper here: https://dl.acm.org/doi/full/10.1145/3773276.3774872

#Ai4Privacy #DataPrivacy #PIIMasking #FederatedLearning #PrivacyEngineering #OpenSourceAI #ResponsibleAI #AcademicResearch #LLMSecurity