π Github | π€ Hugging Face | π Cookbooks
π₯οΈ Demo
A powerful Arabic OCR model (proficient learner)
π Overview
This model is an advanced Arabic OCR system designed to combine deep linguistic understanding with high accuracy in visual text extraction.
The model was trained using a unique strategy focused on:
Reducing the model's active capacity during training Maintaining the stability of visual features Promoting genuine language understanding rather than rote memorization
π Key Features
πΉ Model size: Approximately 2 GB πΉ Performance: Outperforms much larger models in most tasks πΉ Type: Robust learning model (requires fine-tuning for inference)
β Deep understanding of Arabic language context β Intelligent spelling correction β High visual accuracy in text extraction β Noise reduction β Highly stable training behavior β Strong generalization on non-visual data π§ͺ Evaluation Results Metric Value Evaluation loss 0.1041 Training-evaluation gap 0% - 2.5% Excellent stability
π This indicates near-perfect training equilibrium with minimal overshoot.
π§ Training Philosophy
- Reduce Training Capacity
The model was trained using only half its capacity in order to:
Preserve visual representations Prevent image deterioration Improve overall stability 2. From "Memorizing Shapes" to "Learning Rules"
Instead of:
Memorizing word shapes
The model now learns:
Grammar rules and image-text relationships
- Controlling Inference
The training included:
Reducing excessive inference Limiting the linking of complex ideas Reverting processed information to its original size before output
π― Objective:
Forcing the model to accurately copy text instead of paraphrasing it
- Multilevel Reasoning Capability
The model was given internal inference capabilities during:
Reading the page Analyzing the text Generating output
This leads to:
Better understanding of invisible data Stronger real-world performance βοΈ Inference Settings (Very Important)
β οΈ This is a powerful learner β Requires precise control during inference
π― Use Cases π OCR for Arabic books π° Text extraction from images π Manuscript digitization π§Ύ Document processing π Text enhancement after OCR β οΈ Important Notes The model may attempt autocorrect if not properly constrained. To accurately copy text, use directives such as: Extract the text exactly as it is, without correction or paraphrasing.
π¦ Why is the model small?
Despite its small size (approximately 2 GB), its outstanding performance is due to:
Effective training methodology Minimized cognitive noise Focus on patterns Significant Highly Efficient Representation Learning π Conclusion
This model achieves a rare balance between:
Visual Accuracy ποΈ Language Comprehension π§ Training Stability βοΈ
π‘ It can be considered a sophisticated model for Arabic OCR, competing with larger systems.
| License | Model Size | Python |
|---|---|---|
| Apache-2.0 | 2.2GB | 3.12 |
β οΈ Important Notes
In some cases, the model may attempt to correct the text if it is not properly configured. For exact copying: Use a clear prompt such as: "Extract the text as is, without modification"
β Do not use high temperature settings β will cause hallucinations. β Use "Restricted" settings for optimal accuracy. β Best suited for OCR tasks, not creative writing. Send feedback Press tab for actions
Recommended Settings It includes:
with torch.no_grad():
generated_ids = model.generate( **inputs, max_new_tokens=512, # Keep repeating the loop do_sample=True, temperature=0.4, top_p=0.9, repetition_penalty=1.1
πΌοΈ Visualizations
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
git clone https://github.com/zai-org/glm-ocr.git cd glm-ocr uv venv --python 3.12 --seed && source .venv/bin/activate uv pip install -e .
from transformers import AutoProcessor, AutoModelForImageTextToText
import torch
MODEL_PATH = "sherif1313/Arabic-GLM-OCR-v2"
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"url": "test_image.png"
},
{
"type": "text",
"text": "Text Recognition:"
}
],
}
]
processor = AutoProcessor.from_pretrained(MODEL_PATH)
model = AutoModelForImageTextToText.from_pretrained(
pretrained_model_name_or_path=MODEL_PATH,
torch_dtype="auto",
device_map="auto",
)
inputs = processor.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt"
).to(model.device)
inputs.pop("token_type_ids", None)
generated_ids = model.generate(**inputs, max_new_tokens=2018)
output_text = processor.decode(generated_ids[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False)
print(output_text)
π οΈ How to use it web
import gradio as gr
from transformers import AutoProcessor, AutoModelForImageTextToText
import torch
from PIL import Image
import re
# --- KONFIGURASI MODEL ---
MODEL_PATH = "sherif1313/Arabic-GLM-OCR-v2"
# Deteksi perangkat secara otomatis
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float16 if device == "cuda" else torch.float32
print(f"π Mesin OCR dimulai: Device={device} | Dtype={dtype}")
# --- INISIALISASI MODEL (dengan pengecekan error) ---
try:
print("β³ Memuat processor...")
processor = AutoProcessor.from_pretrained(MODEL_PATH, trust_remote_code=True)
print("β³ Memuat model (mungkin butuh waktu beberapa menit)...")
model = AutoModelForImageTextToText.from_pretrained(
MODEL_PATH,
dtype=dtype,
trust_remote_code=True,
low_cpu_mem_usage=True,
device_map="auto"
)
model.eval()
print("β
Model siap digunakan!")
except Exception as e:
print(f"β Gagal memuat model: {e}")
raise # Hentikan eksekusi jika model gagal dimuat
# --- DAFTAR GAMBAR CONTOH (pastikan file-file ini ada di folder yang sama dengan skrip) ---
EXAMPLE_IMAGES = [
]
# --- FUNGSI OCR ---
import re # ΨͺΨ£ΩΨ― Ω
Ω ΩΨ¬ΩΨ― ΩΨ°Ψ§ ΩΩ Ψ£ΨΉΩΩ Ψ§ΩΩ
ΩΩ
def proses_intelijen(image):
if image is None:
return "β οΈ Silakan unggah gambar terlebih dahulu."
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "Text Recognition:"}
],
}
]
try:
# --- Ω
ΨΉΨ§ΩΨ¬Ψ© Ψ§ΩΨ΅ΩΨ±Ψ© ΩΨͺΩΩΩΨ― Ψ§ΩΩΨ΅ (ΩΩ
Ψ§ ΩΩ ΩΩ ΩΩΨ―Ω Ψ§ΩΨ£Ψ΅ΩΩ) ---
inputs = processor.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt"
).to(model.device)
with torch.no_grad():
generated_ids = model.generate(
**inputs,
max_new_tokens=512,
do_sample=False
)
hasil = generated_ids[0][len(inputs["input_ids"][0]):]
teks_final = processor.decode(hasil, skip_special_tokens=True)
# ----------------------------------------------------------------
# --- Ω
ΩΨ·Ω Ψ§ΩΨͺΩΨΈΩΩ Ψ§ΩΩ
ΨͺΩΨ―Ω
(Ψ₯Ψ²Ψ§ΩΨ© Ψ§ΩΨͺΩΨ±Ψ§Ψ± Ω HTML ΩΨ§ΩΩΩΨ§Ψ·) ---
# ----------------------------------------------------------------
# 1. ΨΨ°Ω ΩΨ³ΩΩ
HTML Ψ§ΩΩΨ¨ΩΨΨ© (Ω
Ψ«Ω <html>, <td>, etc.)
teks_final = re.sub(r'<[^>]+>', '', teks_final)
# 2. ΨΨ°Ω Ψ§ΩΨͺΩΨ±Ψ§Ψ± Ψ§ΩΩ
ΨͺΨͺΨ§ΩΩ ΩΩΨ¬Ω
Ω (Ω
ΩΩ
Ψ¬Ψ―Ψ§Ω ΩΩ ΨΨ§ΩΨͺΩ)
# ΩΨ°Ψ§ Ψ§ΩΨ³Ψ·Ψ± ΩΨ¨ΨΨ« ΨΉΩ Ψ£Ω Ψ¬Ω
ΩΨ© Ψ£Ω Ω
Ψ¬Ω
ΩΨΉΨ© ΩΩΩ
Ψ§Ψͺ ΨͺΨΈΩΨ± Ω
Ψ±ΨͺΩΩ Ψ£Ω Ψ£ΩΨ«Ψ± Ω
ΨͺΨͺΨ§ΩΩΨͺΩΩ
# ΩΩΨ³ΨͺΨ¨Ψ―ΩΩΨ§ Ψ¨Ω
ΨΈΩΨ± ΩΨ§ΨΨ― ΩΩΨ·.
# (.{10,}?) ΩΨΉΩΩ: Ψ§ΩΨͺΩΨ· ΩΨ΅Ψ§Ω Ψ·ΩΩΩ 10 Ψ£ΨΨ±Ω ΩΨ£ΩΨ«Ψ± (ΩΨͺΨ¬ΩΨ¨ ΨͺΩΨ±Ψ§Ψ± ΨΨ±ΩΩ ΩΨ΅ΩΨ±Ψ©)
# (\s+\1)+ ΩΨΉΩΩ: Ω
ΨͺΨ¨ΩΨΉΨ§Ω Ψ¨Ω
Ψ³Ψ§ΩΨ§Ψͺ ΩΩΩΨ³ Ψ§ΩΩΨ΅ Ψ§ΩΨ³Ψ§Ψ¨Ω Ω
ΩΨ±Ψ±Ψ§Ω
teks_final = re.sub(r'(\b.{10,}?)(\s+\1)+', r'\1', teks_final)
# ----------------------------------------------------------------
return teks_final
except Exception as e:
return f"π¨ Terjadi kesalahan: {str(e)}"
# --- ANTARMUKA GRADIO ---
css_custom = """
.container { max-width: 1200px; margin: auto; padding-top: 20px; }
h1 { text-align: center; color: #3b82f6; }
"""
with gr.Blocks(css=css_custom, title="Arabic GLM-OCR") as app:
with gr.Column(elem_classes="container"):
gr.Markdown("# Arabic GLM-OCR")
gr.Markdown("Arabic OCR powered by GLM-OCR.")
with gr.Row():
with gr.Column(scale=1):
input_img = gr.Image(type="pil", label="Upload Gambar", height=450)
scan_btn = gr.Button("π MULAI SCAN", variant="primary", size="lg")
with gr.Column(scale=1):
output_txt = gr.Textbox(label="Hasil Teks", lines=24)
# Tambahkan contoh gambar yang bisa diklik
gr.Examples(
examples=EXAMPLE_IMAGES,
inputs=input_img,
outputs=output_txt,
fn=proses_intelijen,
cache_examples=False, # Set ke True jika ingin mempercepat (butuh disk space)
label="Contoh Gambar (klik untuk memuat)"
)
# Hubungkan tombol dengan fungsi
scan_btn.click(fn=proses_intelijen, inputs=input_img, outputs=output_txt)
if __name__ == "__main__":
app.launch() demo.queue().launch(theme=gr.themes.Soft(), allowed_paths=["examples"])
- Downloads last month
- 215
Model tree for sherif1313/Arabic-GLM-OCR-v2
Base model
zai-org/GLM-OCR








