πŸ’œ Github   |   πŸ€— Hugging Face   |   πŸ“š Cookbooks  
πŸ–₯️ Demo  

# πŸ† sherif1313/Arabic-GLM-OCR-v2

A powerful Arabic OCR model (proficient learner)

πŸ“Œ Overview

This model is an advanced Arabic OCR system designed to combine deep linguistic understanding with high accuracy in visual text extraction.

The model was trained using a unique strategy focused on:

Reducing the model's active capacity during training Maintaining the stability of visual features Promoting genuine language understanding rather than rote memorization

πŸš€ Key Features

πŸ”Ή Model size: Approximately 2 GB πŸ”Ή Performance: Outperforms much larger models in most tasks πŸ”Ή Type: Robust learning model (requires fine-tuning for inference)

βœ… Deep understanding of Arabic language context βœ… Intelligent spelling correction βœ… High visual accuracy in text extraction βœ… Noise reduction βœ… Highly stable training behavior βœ… Strong generalization on non-visual data πŸ§ͺ Evaluation Results Metric Value Evaluation loss 0.1041 Training-evaluation gap 0% - 2.5% Excellent stability

πŸ“Œ This indicates near-perfect training equilibrium with minimal overshoot.

🧠 Training Philosophy

  1. Reduce Training Capacity

The model was trained using only half its capacity in order to:

Preserve visual representations Prevent image deterioration Improve overall stability 2. From "Memorizing Shapes" to "Learning Rules"

Instead of:

Memorizing word shapes

The model now learns:

Grammar rules and image-text relationships

  1. Controlling Inference

The training included:

Reducing excessive inference Limiting the linking of complex ideas Reverting processed information to its original size before output

🎯 Objective:

Forcing the model to accurately copy text instead of paraphrasing it

  1. Multilevel Reasoning Capability

The model was given internal inference capabilities during:

Reading the page Analyzing the text Generating output

This leads to:

Better understanding of invisible data Stronger real-world performance βš™οΈ Inference Settings (Very Important)

⚠️ This is a powerful learner ← Requires precise control during inference

🎯 Use Cases πŸ“„ OCR for Arabic books πŸ“° Text extraction from images πŸ“š Manuscript digitization 🧾 Document processing πŸ” Text enhancement after OCR ⚠️ Important Notes The model may attempt autocorrect if not properly constrained. To accurately copy text, use directives such as: Extract the text exactly as it is, without correction or paraphrasing.

πŸ“¦ Why is the model small?

Despite its small size (approximately 2 GB), its outstanding performance is due to:

Effective training methodology Minimized cognitive noise Focus on patterns Significant Highly Efficient Representation Learning 🏁 Conclusion

This model achieves a rare balance between:

Visual Accuracy πŸ‘οΈ Language Comprehension 🧠 Training Stability βš–οΈ

πŸ’‘ It can be considered a sophisticated model for Arabic OCR, competing with larger systems.

License Model Size Python
Apache-2.0 2.2GB 3.12

⚠️ Important Notes

In some cases, the model may attempt to correct the text if it is not properly configured. For exact copying: Use a clear prompt such as: "Extract the text as is, without modification"

❌ Do not use high temperature settings β†’ will cause hallucinations. βœ… Use "Restricted" settings for optimal accuracy. βœ… Best suited for OCR tasks, not creative writing. Send feedback Press tab for actions

Recommended Settings It includes:

with torch.no_grad():

generated_ids = model.generate( **inputs, max_new_tokens=512, # Keep repeating the loop do_sample=True, temperature=0.4, top_p=0.9, repetition_penalty=1.1

πŸ–ΌοΈ Visualizations

πŸ› οΈ How to use it

git clone https://github.com/zai-org/glm-ocr.git cd glm-ocr uv venv --python 3.12 --seed && source .venv/bin/activate uv pip install -e .

from transformers import AutoProcessor, AutoModelForImageTextToText
import torch

MODEL_PATH = "sherif1313/Arabic-GLM-OCR-v2"
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "url": "test_image.png"
            },
            {
                "type": "text",
                "text": "Text Recognition:"
            }
        ],
    }
]
processor = AutoProcessor.from_pretrained(MODEL_PATH)
model = AutoModelForImageTextToText.from_pretrained(
    pretrained_model_name_or_path=MODEL_PATH,
    torch_dtype="auto",
    device_map="auto",
)
inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt"
).to(model.device)
inputs.pop("token_type_ids", None)
generated_ids = model.generate(**inputs, max_new_tokens=2018)
output_text = processor.decode(generated_ids[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False)
print(output_text)

πŸ› οΈ How to use it web


import gradio as gr
from transformers import AutoProcessor, AutoModelForImageTextToText
import torch
from PIL import Image
import re  

# --- KONFIGURASI MODEL ---
MODEL_PATH = "sherif1313/Arabic-GLM-OCR-v2"

# Deteksi perangkat secara otomatis
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float16 if device == "cuda" else torch.float32
print(f"πŸš€ Mesin OCR dimulai: Device={device} | Dtype={dtype}")

# --- INISIALISASI MODEL (dengan pengecekan error) ---
try:
    print("⏳ Memuat processor...")
    processor = AutoProcessor.from_pretrained(MODEL_PATH, trust_remote_code=True)

    print("⏳ Memuat model (mungkin butuh waktu beberapa menit)...")
    model = AutoModelForImageTextToText.from_pretrained(
        MODEL_PATH,
        dtype=dtype,
        trust_remote_code=True,
        low_cpu_mem_usage=True,
        device_map="auto"
    )
    model.eval()
    print("βœ… Model siap digunakan!")
except Exception as e:
    print(f"❌ Gagal memuat model: {e}")
    raise  # Hentikan eksekusi jika model gagal dimuat

# --- DAFTAR GAMBAR CONTOH (pastikan file-file ini ada di folder yang sama dengan skrip) ---
EXAMPLE_IMAGES = [
    
]

# --- FUNGSI OCR ---
import re  # ΨͺΨ£ΩƒΨ― Ω…Ω† وجود Ω‡Ψ°Ψ§ في Ψ£ΨΉΩ„Ω‰ الملف

def proses_intelijen(image):
    if image is None:
        return "⚠️ Silakan unggah gambar terlebih dahulu."

    messages = [
        {
            "role": "user",
            "content": [
                {"type": "image", "image": image},
                {"type": "text", "text": "Text Recognition:"}
            ],
        }
    ]

    try:
        # --- Ω…ΨΉΨ§Ω„Ψ¬Ψ© Ψ§Ω„Ψ΅ΩˆΨ±Ψ© وΨͺΩˆΩ„ΩŠΨ― Ψ§Ω„Ω†Ψ΅ (ΩƒΩ…Ψ§ Ω‡Ωˆ في ΩƒΩˆΨ―Ωƒ Ψ§Ω„Ψ£Ψ΅Ω„ΩŠ) ---
        inputs = processor.apply_chat_template(
            messages,
            add_generation_prompt=True,
            tokenize=True,
            return_dict=True,
            return_tensors="pt"
        ).to(model.device)

        with torch.no_grad():
            generated_ids = model.generate(
                **inputs,
                max_new_tokens=512,
                do_sample=False
            )

        hasil = generated_ids[0][len(inputs["input_ids"][0]):]
        teks_final = processor.decode(hasil, skip_special_tokens=True)

        # ----------------------------------------------------------------
        # --- Ω…Ω†Ψ·Ω‚ Ψ§Ω„ΨͺΩ†ΨΈΩŠΩ Ψ§Ω„Ω…ΨͺΩ‚Ψ―Ω… (Ψ₯Ψ²Ψ§Ω„Ψ© Ψ§Ω„ΨͺΩƒΨ±Ψ§Ψ± و HTML ΩˆΨ§Ω„Ω†Ω‚Ψ§Ψ·) ---
        # ----------------------------------------------------------------

        # 1. حذف ΩˆΨ³ΩˆΩ… HTML Ψ§Ω„Ω‚Ψ¨ΩŠΨ­Ψ© (Ω…Ψ«Ω„ <html>, <td>, etc.)
        teks_final = re.sub(r'<[^>]+>', '', teks_final)

        # 2. حذف Ψ§Ω„ΨͺΩƒΨ±Ψ§Ψ± Ψ§Ω„Ω…ΨͺΨͺΨ§Ω„ΩŠ Ω„Ω„Ψ¬Ω…Ω„ (Ω…Ω‡Ω… Ψ¬Ψ―Ψ§Ω‹ في Ψ­Ψ§Ω„ΨͺΩƒ)
        # Ω‡Ψ°Ψ§ Ψ§Ω„Ψ³Ψ·Ψ± يبحث ΨΉΩ† أي Ψ¬Ω…Ω„Ψ© أو Ω…Ψ¬Ω…ΩˆΨΉΨ© ΩƒΩ„Ω…Ψ§Ψͺ ΨͺΨΈΩ‡Ψ± Ω…Ψ±ΨͺΩŠΩ† أو Ψ£ΩƒΨ«Ψ± Ω…ΨͺΨͺΨ§Ω„ΩŠΨͺΩŠΩ†
        # ويسΨͺΨ¨Ψ―Ω„Ω‡Ψ§ Ψ¨Ω…ΨΈΩ‡Ψ± واحد فقط.
        # (.{10,}?) ΩŠΨΉΩ†ΩŠ: Ψ§Ω„ΨͺΩ‚Ψ· Ω†Ψ΅Ψ§Ω‹ Ψ·ΩˆΩ„Ω‡ 10 أحرف فأكثر (Ω„ΨͺΨ¬Ω†Ψ¨ ΨͺΩƒΨ±Ψ§Ψ± حروف Ω‚Ψ΅ΩŠΨ±Ψ©)
        # (\s+\1)+ ΩŠΨΉΩ†ΩŠ: Ω…ΨͺΨ¨ΩˆΨΉΨ§Ω‹ بمسافاΨͺ ΩˆΩ†ΩΨ³ Ψ§Ω„Ω†Ψ΅ Ψ§Ω„Ψ³Ψ§Ψ¨Ω‚ Ω…ΩƒΨ±Ψ±Ψ§Ω‹
        teks_final = re.sub(r'(\b.{10,}?)(\s+\1)+', r'\1', teks_final)



        # ----------------------------------------------------------------

        return teks_final

    except Exception as e:
        return f"🚨 Terjadi kesalahan: {str(e)}"

# --- ANTARMUKA GRADIO ---
css_custom = """
.container { max-width: 1200px; margin: auto; padding-top: 20px; }
h1 { text-align: center; color: #3b82f6; }
"""

with gr.Blocks(css=css_custom, title="Arabic GLM-OCR") as app:
    with gr.Column(elem_classes="container"):
        gr.Markdown("# Arabic GLM-OCR")
        gr.Markdown("Arabic OCR powered by GLM-OCR.")

        with gr.Row():
            with gr.Column(scale=1):
                input_img = gr.Image(type="pil", label="Upload Gambar", height=450)
                scan_btn = gr.Button("πŸš€ MULAI SCAN", variant="primary", size="lg")

            with gr.Column(scale=1):
                output_txt = gr.Textbox(label="Hasil Teks", lines=24)

        # Tambahkan contoh gambar yang bisa diklik
        gr.Examples(
            examples=EXAMPLE_IMAGES,
            inputs=input_img,
            outputs=output_txt,
            fn=proses_intelijen,
            cache_examples=False,  # Set ke True jika ingin mempercepat (butuh disk space)
            label="Contoh Gambar (klik untuk memuat)"
        )

    # Hubungkan tombol dengan fungsi
    scan_btn.click(fn=proses_intelijen, inputs=input_img, outputs=output_txt)

if __name__ == "__main__":
    app.launch()    demo.queue().launch(theme=gr.themes.Soft(), allowed_paths=["examples"])
Downloads last month
215
Safetensors
Model size
1B params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for sherif1313/Arabic-GLM-OCR-v2

Base model

zai-org/GLM-OCR
Finetuned
(21)
this model

Spaces using sherif1313/Arabic-GLM-OCR-v2 2