DeepGHS

community

Verified

deepghs

Activity Feed Request to join this org

AI & ML interests

Computer Vision Technology and Data Collection for Anime Waifu

Recent Activity

Aratako authored a paper 19 days ago

T5Gemma-TTS Technical Report

Aratako submitted a paper 19 days ago

T5Gemma-TTS Technical Report

wangsssssss authored a paper 20 days ago

Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens

View all activity

Papers

A Large-scale Dataset for Robust Complex Anime Scene Text Detection

View all Papers

prithivMLmods

posted an update 4 days ago

Post

3863

HY-World-2.0 — A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds is now available on Spaces, and it works both as native Gradio components and in Gradio server mode.

> HY-World-2.0-Demo: prithivMLmods/HY-World-2.0-Demo
> HY-World-2.0 [Server Mode]: prithivMLmods/HY-World-2.0-Demo
> Featuring 3D reconstruction and Gaussian splats with the Rerun viewer, along with camera poses, depth maps, and surface normals.
> In Server Mode, Gradio is served via FastAPI, with FastAPI remaining the top-level server.
> Model: tencent/HY-World-2.0
> GitHub: https://github.com/PRITHIVSAKTHIUR/HY-World-2.0-Demo

🤗To learn more, visit the app page or the respective model pages.

AbstractPhil

posted an update 8 days ago

Post

131

The geolip-svd-transformer is almost ready.

I've spent multiple days preparing the substructure, scaling, testing, and expanding the system. The conduit is meant to reorganize data. Just like the SVAE prototypes, they are meant to sort and organize, not compress and compact.

The organization is almost prepared and almost ready. The resulting structure will produce projection-capable geometric aligned memory, compacted and transformed into a utilizable token set. The remaining structural components are specifically SVD-related utilities, and each of those are utilizing the variant natures of how difficult, how dispersed, and so on each component is as it's learned over time.

The SVAE components were perfect for testing this playground. They appear to be larger when analyzed, however the representation of those are meant to represent huge vocabularies. Patch 16x16 expanded upward to 768 is meant to encapsulate the behavior of near-pi upscaled, condensed into a considerably simpler smaller form.

This model is behaving perfectly. It does not encode in the traditional sense, it analyzes and produces geometric opinions throughout it's structure. Each of them proved one after the other the model could not only learn, but it can perfectly reconstruct, and with that produce utility-driven expansion capacity directly.

Fresnel -> effective image analysis battery.
Johanna -> effective noise analysis
Grandmaster -> Johanna finetuned with sigma restoration using Fresnel's opinions.
Freckles -> massive analysis array for noise (4096 to 16k tks)

Geometric batteries.

Cayley rotation is meant to encapsulate that potential and expand it, allowing further differentiation down the chain of model structural behavioral events.

Suffice it to say, this is the geometric transformer's evolved state. These will exist as conduits throughout the models, the expanded behavioral attenuation units meant to provide geometric analysis internally within models for data-oriented CV alignment.

6 replies

prithivMLmods

posted an update 9 days ago

Post

6134

A new comparator on Spaces showcases Standard FLUX.2 Decoder vs. FLUX.2 Small Decoder. The Small Decoder is ~1.4× faster, uses ~1.4× less VRAM, and maintains near-identical image quality. It has ~28M parameters with narrower channels [96, 192, 384, 384] vs. [128, 256, 512, 512], and the demo supports sequence generation by running both decoders simultaneously and comparing the results side by side.

🤗 Comparator: prithivMLmods/Flux.2-4B-Decoder-Comparator
🔗 FLUX.2-small-decoder: black-forest-labs/FLUX.2-small-decoder
🔗 GitHub: https://github.com/PRITHIVSAKTHIUR/Flux.2-4B-Encoder-Comparator
🚁 Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection

🤗 > App built on the Gradio SDK. To learn more, visit the app page or the respective model pages.

prithivMLmods

posted an update 10 days ago

Post

4185

Now, a collection of various compression schemes for Gemma 4 and the abliterated version 1 of dense models is available on the Hub. Check it out via the links below. 👇

🔗Gemma 4 Compression(s)- https://huggingface.co/collections/prithivMLmods/gemma-4-compressions
🔗Gemma 4 Uncensored [MAX] + Compression(s) - [`β ]- https://huggingface.co/collections/prithivMLmods/gemma-4-uncensored-max-compressions
🔗Gemma 4 Compression(s) - MoE- https://huggingface.co/collections/prithivMLmods/gemma-4-compressions-moe
🔗Gemma-4 F32 GGUF- https://huggingface.co/collections/prithivMLmods/gemma-4-f32-gguf

🤗 > To learn more, visit the app page or the respective model pages.

prithivMLmods

posted an update 13 days ago

Post

2282

Now the demo for image detection based on SAM3 and Gemma-4 (*Filter) is available on Spaces, using full-fledged Transformers inference with multimodal reasoning for processed images. It also supports video segmentation (mask), video segmentation (annotation), and image click segmentation.

🤗 Demo Space: prithivMLmods/SAM3-Gemma4-CUDA
🥽 SAM3: facebook/sam3
🔗 gemma-4-E2B-it: google/gemma-4-E2B-it

To learn more, visit the app page or the respective model pages.

1 reply

AbstractPhil

posted an update 15 days ago

Post

148

Say hello to surge resonance training. From random init, 1 epoch trained the 128x128 imagenet SVAE with test reconstruction over 99% accurate by epoch 1 to 99.9% accurate by epoch 5.
AbstractPhil/geolip-SVAE

Epoch 1 test recon error 0.0064
Epoch 2 test recon error 0.0022
Epoch 8 is now 0.000294
Epoch 12 is now 0.000206
Epoch 14 is now 0.000190
Epoch 18 is now 0.000187
Epoch 24 is now 0.000117
Epoch 30 landmark 0.000099

There are NO EXPERTS HERE. This is pure self learning. The model learns the entire behavioral set within 1 epoch to reconstruct imagenet's test set to a useful state. By epoch 12 a recon of 0.000202 recall is now measured. This means, 99.99% accuracy at RECONSTRUCTING the test set through the bottleneck, while simultaneously leaving a trail of centerwise extraction as rich or richer.

ONE epoch. Just one.
Took about 10 minutes to train an already converged epoch, and I set it up for 200 epochs. This model will not need 200 epochs. I'd be surprised if it needs 3.
What you're looking at here, is the emergence of surge resonance. The power of a single epoch when the geometric CV alignment hits the tuning fork of absolute resonant perfection and counterpointed with the concerto's dissonant harmonic response.

I give you, surge resonance.

The metrics will be ready by morning and I'll begin building utilities to figure out what went right and what went wrong.

This model is rewarded when it exists within the geometric spectrum while simultaneously dual punished when leaving. There is no benefit to stray, and the benefit to exist within prevents the model from leaving the validated CV band.

This allows the model to exist perfectly within the tuning fork resonance structure.

The model CONTINUES to refine, even when the CV drift has begun to drift away from home. The model has left home and is now seeking new proximity.

Upcoming training will be the 256x256, 512x512, 1024x1024, and larger if the model holds. Each will be named.

3 replies

prithivMLmods

posted an update 16 days ago

Post

4745

The demo for Image Detection (*Filter) based on SAM3 and Qwen-3.5 is now available on Hugging Face Spaces using Transformers inference, with multimodal reasoning for processed images, and it also supports video segmentation (mask), video segmentation (annotation), and image click segmentation.

🤗 Demo Space: prithivMLmods/SAM3-Plus-Qwen3.5
🥽 SAM3: facebook/sam3
🔗 Qwen-3.5: Qwen/Qwen3.5-2B

To learn more, visit the app page or the respective model pages.

5 replies

AbstractPhil

posted an update 18 days ago

Post

156

The geolip-transformer-v8 requires a fundamental rethinking of training a core structure.

I'll make this brief and to the point.

GEOLIP is an observer system at it's core. It watches, triangulates, and assists with correct answers.

Many experiments worked very well, many fell down and turned into a pile of broken circuits. The recent geometric-transformer being one of my biggest fumbles, still taught me many things about what I'm TRULY trying to accomplish here.

**Save money and lives**. Less hardware use for less need at inference. Train more calculations into a more reusable and accurate structure for near instant zero-shot or sequential inference.

In the process v8 unlocked a missing puzzle piece, EMA trajectory alignment compensation. I'm doing my best to build something that works.

The geolip distillation system is very powerful but requires much experimentation still.
* Genetic experiments were successful
* Data transfer experiments successful
* Analysis experiments successful - and expand large model accuracy
* Many distillation experiments were successful.
* The largest successes being the kernels, the distillation tools, and the geometric analysis systems.

With the good comes the bad, the faulty VITs, the simultaneous trains that fault, the internalized confusion that happens occasionally.
*** The observer NEEDS something to OBSERVE. If the observer observes the progressive development of point cloud structures, it learns how to observe THAT LEARNING PROCESS - drifting fault assessment.
*** In the process it DOES NOT learn how to improve the CE relations by embedding and compensating with anchored triangulation opinions.

BIGGEST CONCLUSION. Staged curriculum training.

These components must be DECOUPLED. One must be a compounding structural awareness beacon, the other must be an informationally aligned composition in a utilizable fashion.

This means stage-by-stage freeze/unfreeze processing. Independent task-oriented structural alignment.

2 replies

Aratako

authored a paper 19 days ago

T5Gemma-TTS Technical Report

Paper • 2604.01760 • Published 20 days ago • 11

Aratako

submitted a paper to Daily Papers 19 days ago

T5Gemma-TTS Technical Report

Paper • 2604.01760 • Published 20 days ago • 11

wangsssssss

authored a paper 20 days ago

Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens

Paper • 2603.19232 • Published Mar 19 • 33

AbstractPhil

posted an update 21 days ago

Post

173

My heavily engineered repo; https://github.com/AbstractEyes/pytorch-parallel-compiler has been directly integrated into the geofractal repo for v1.2, if you use the geofractal repo be sure to pull for potential performance increases.

The WideRouter will enable multiple core new features; the predominant two for our next experiment are as follows.

1. Directly integrated multi-opinion constellation structures. This will enable dynamic compiled expansions internally within the structure for huge performance gains.
2. Controllable stage-by-stage compilation. Each stage can be compiled or not. SVD being notoriously non-compiler friendly due to the linalg.egens, I will be addressing this particular function DIRECTLY soon. There will be no quarter for graph breaks.

If the WideRouter causes any major bugs or breaks with your code, bad calculations, incorrect deviated gradients, twisted or contorted dtype outputs, or any major compilation errors; please don't hesitate to open a pull request. Claude and I will abruptly solve any major issues.

Once everything is perfectly in-line and the graph matches, the transformer will have massive geometric performance boosts for huge structural basins with multiple layers of depth.

I will be addressing the linalg.eig+eigh directly in conjunction with multiple argsort functions that are causing huge performance dips. As well as addressing every single use of .item() that can present itself in the compiler's path.

After this, the ensemble topological transformer will be a-go. Which will enable quaternion, FlowMagnitude, FlowAlignment, FlowVelocity, FlowVelocityQuaternion, FlowVelocityOrbital, FlowVelocityPentachoron, and multiple other flow matching systems that will improve performance by dominating amounts inline with minimal overhead cost due to the precomputed geometric structure.

The ensembles will feature multiple simultaneous batched and segmented forms of learning meant to train the oscillation omega predictor "Beatrix".

5 replies

KBlueLeaf

authored a paper 22 days ago

ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks

Paper • 2603.27862 • Published 24 days ago • 30

prithivMLmods

posted an update 26 days ago

Post

5299

Flux-Klein-KV-Edit-Consistency demo is now available on Spaces. It preserves character identity and delivers high-quality, realistic results after edits. No need for any special prompts, just upload the image, type your prompt, and get the resulting image blazing fast.

🔥 Demo Space: prithivMLmods/flux-klein-kv-edit-consistency
🤗 Model: black-forest-labs/FLUX.2-klein-9b-kv
🤗 Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection
🔗 Gradio Server Mode: https://www.gradio.app/main/guides/server-mode

➔ Built with Headless Gradio, an alternative to using gr.Blocks for creating the frontend and triggering events, powered by FastAPI + Gradio. You can now design the frontend however you want, with continued support for APIs, MCP, and ZeroGPU.

➔ Gradio Server Mode is now available from gradio@v6.10.0.

To learn more, visit the app page or the respective model pages.

AbstractPhil

posted an update 26 days ago

Post

146

geolip-ryan-spearman, the first dedicated protein observation structure meant to expand the tooling of the observer modeling system and introducing additional introspective analysis to the equation for genetic mutation and abnormality.

AbstractPhil/geolip-esm2_t33_650M_UR50D

This model is based on edm2 33 650m from facebook, assessed with specific benchmarks to be around 50% accurate or so. I'll be improving those numbers by self distillation spectrum. The models will never see the validation data while unfrozen. The full spectrum of training tools are visible.

This is the first self-distillation observer prototype, and it works. Not as rapidly as I had hoped, but it most definitely works. The SVD was the missing piece of geometric solidity required to preserve full rotational behavioral control. The kernel made this possible for rapid iteration, and the first results are coming in.

This inherits much of the functionality from the CLIP_L and CLIP_G memory banks, while benefitting from the advanced research I performed while extracting CaptionBert 5x bert pooled captions for target points.

The primary driving point here is the sheer data size - and the important contributions of that data size to a full construct of geometric aligned data. There is a massive amount of very specific information, all curated, perfectly labeled, and organized in a way that can be... well not so easily accessed, but I did find a few ways in.

This data is highly accurate and forged through life for billions of years. This is what is there, this is what is expected, and I have the tooling - stage by stage, to not only develop a solution for the problem, but to fully contribute to an improved version with minimal hardware requirement for training.

This is real expectation and the results are pouring in hourly, this can improve models beyond a reasonable baseline while preserving the baseline's correctness.

3 replies

AbstractPhil

posted an update 28 days ago

Post

185

SVD + Scatterpoint2D is the official encoding structure of the geolip system as of the image encoding tests.

Both unattuned scatterpoint2d and triton-aligned SVD are a cut above the rest by a large margin.

https://github.com/kymatio/kymatio
https://huggingface.co/blog/AbstractPhil/svd-triton-kernel-optimization
AbstractPhil/svd-triton
AbstractPhil/geolip-hypersphere-experiments

Most kymatio tests were done on standard pytorch models that yielded higher accuracy than simple conv or transformers before overfitting, but not in every instance. Most common tested low-count cifar10 and cifar100 instances yielded more for less. Those are in the hypersphere-experiments notebooks and are viewable via huggingface tensorboard metrics.

The accuracy, retention, agreement, disagreement, and sheer capacity of the refined SVD kernel shows that full Procrustes alignment is not just crucial to distillation, but also entirely representable within encoders themselves as students.

This structure can representationally re-impose layer-by-layer which is what I tested, and this capture system can behave as a global regularization system, a selector, a behavioral adjudication structure, an encoding solidification unit, a trajectory systemic accumulator, an anchored differentiation unit, and about 30 other tests show - all of the above simultaneously.

The preliminary rapid-iteration capable kernel shows that not only can these behaviorally represent utility, but the noise-drift can be directly accounted for using systems like GELU, drop path, dropout, and other elements to learn to ignore that very noise that accumulates.

Attention is now officially deemed valid when utilized based on the tests and examples allowing preserved geometric structure after attention selection.

This encoding structure is substantially more durable than I can give credit for.

Surge is coming, exactly as predicted. Late I admit.

1 reply

wangsssssss

authored a paper 30 days ago

Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens

Paper • 2603.19232 • Published Mar 19 • 33

prithivMLmods

posted an update about 1 month ago

Post

4474

Map-Anything v1 (Universal Feed-Forward Metric 3D Reconstruction) demo is now available on Hugging Face Spaces. Built with Gradio and integrated with Rerun, it performs multi-image and video-based 3D reconstruction, depth, normal map, and interactive measurements.

🤗 Demo: prithivMLmods/Map-Anything-v1
🤗 Model: facebook/map-anything-v1
🤗 Hf-Papers: MapAnything: Universal Feed-Forward Metric 3D Reconstruction (2509.13414)

AbstractPhil

posted an update about 1 month ago

Post

228

I built an actionable todo based on current research, former research, and compounded a full spectrum of potentials for image encoding into pure geometric structures, hybrid geometric structures, partial geometric structures, and full spectrum analysis relational structures. Claude built the manifest based on our research after forming a full research spectrum to head into actionable directions.

AbstractPhil/geolip-hypersphere-experiments

I have to say before I continue, Claude managed to keep a large running manifest of our research, and with that list this was possible. Without that list, this would have been entirely devoid of purpose, and Claude would likely have not extracted the information in a utilizable state for this solution set.

I'll be running the full series of tests in conjunction with the constellation architecture. Either it survives, or something entirely new will form. Based on the results from these tests, the directions will evolve.

Either way, the most optimal and fastest methodologies for this system will be benchmarked and utilized as the primary use-cases. The slower and more obviously higher-resolution variations will be optimized as much as possible and solutions provided.

Lets do this right.

With that, the first experiment will be geolip-anchor-scattering and the structure will be based on the first in the list.

I will be updating posts based on benchmarks, landmarks, and new insights while the Bert data cooks.

4 replies

prithivMLmods

posted an update about 1 month ago

Post

3126

Introducing QIE-Bbox-Studio! 🔥🤗

The QIE-Bbox-Studio demo is now live — more precise and packed with more options. Users can manipulate images with object removal, design addition, and even move objects from one place to another, all in just 4-step fast inference.

🤗 Demo: prithivMLmods/QIE-Bbox-Studio
🔗 GitHub: https://github.com/PRITHIVSAKTHIUR/QIE-Bbox-Studio

🚀 Models [LoRA] :

● QIE-2511-Object-Mover-Bbox: prithivMLmods/QIE-2511-Object-Mover-Bbox
● QIE-2511-Object-Remover-Bbox-v3: prithivMLmods/QIE-2511-Object-Remover-Bbox-v3
● QIE-2511-Outfit-Design-Layout: prithivMLmods/QIE-2511-Outfit-Design-Layout
● QIE-2509-Object-Remover-Bbox-v3: prithivMLmods/QIE-2509-Object-Remover-Bbox-v3
● QIE-2509-Object-Mover-Bbox: prithivMLmods/QIE-2509-Object-Mover-Bbox

🚀 Collection:

● Qwen Image Edit [Layout Bbox]: https://huggingface.co/collections/prithivMLmods/qwen-image-edit-layout-bbox

To learn more, visit the app page or the respective model pages.

AI & ML interests

Recent Activity

Papers

Team members 143

deepghs's activity