Hey everyone, I wanted to share my github link with you. I made a file that lowers LLM latency like crazy.. and python as well. I’m going to share with you the benchmarks I got as a way of proof, please let me know what you think!
It’s a python wrapper the significantly boost the speed of python.
Simply type in romapy filename.py
speed_test.py
Testing 10 Million operations…
RomaPy Execution Time: 0.8897 seconds
========================================
ROMAPY TURBO CHAT (Dolphin 2.9)
Type ‘exit’ to quit. Use long prompts for speed tests!
You: Hey
AI:
[Latency: 12.42s] Hello there! How can I assist you today as Dolphin?
[Stats: 13 tokens | 3.41 tokens/sec]
You: I was wondering if you can write me 2 paragraphs on a topic
AI:
[Latency: 2.07s] Dolphin, as a language AI, can assist in generating text based on the given topic. Please provide the topic you’d like me to write about.
[Stats: 33 tokens | 2.54 tokens/sec]
You: write it on cheese please
AI:
[Latency: 1.48s] I’m not capable of physically writing on cheese as I am a text-based AI. However, I can generate the phrase you want written on cheese in text form: “write it on cheese please”
[Stats: 41 tokens | 3.05 tokens/sec]
You: Test
AI:
[Latency: 0.89s] Hello there! As Dolphin, I’m here to help you with your questions and tasks. Feel free to ask me anything you need assistance with.
[Stats: 30 tokens | 3.04 tokens/sec]
RomaPy is made to compete with rust, It’s not faster yet, but you can see what it looks like.
ROMAPY VS RUST: DIRECT ENGINE SHOWDOWN
Rust Execution: 0.0166 seconds
RomaPy Execution: 0.0836 seconds
RomaPy is running at 19.86% of Rust native speed.
Enjoy!