Honest Opinion and Suggestions

1754504252

Honest Opinion and Suggestions

So My buddy asked me to make him a script that applies suitable configurations for his poopy laptop to be able to quantize and run a small model on 16GB RAM & CPU only. I cant seem to get it to work properly and don't really see another approach outside of this down below. Opinion, Suggestions, and criticism are highly welcomed. """TinyLlama Local AI Assistant Description: A local AI assistant using TinyLlama-1.1B model. This script initializes the TinyLlama model and provides a command-line interface for interaction. It is designed to run on a system with sufficient RAM and CPU resources.""" ``` python from queue import Queue from threading import Thread import sys import os import psutil from transformers import AutoModelForCausalLM os.system("sudo cpupower frequency-set -g performance") os.environ["GGML_CUBLAS"] = "0" # Force towards CPU and disable GPU acceleration def clear_caches(): """Clear system caches.""" os.system("echo 3 | sudo tee /proc/sys/vm/drop_caches") os.system("sudo sysctl vm.swappiness=10") # Buddz Configurations MODEL_PATH = "TinyLlama-1.1B-Chat-v1.0.Q4_K_M.gguf" MAX_RAM_USAGE = 12000 # MB Leave 4GB of RAM CONTEXT_LENGTH = 2048 TEMPERATURE = 0.7 class SafeGenerator: """Threaded generator for the TinyLlama model to prevent UI freeze.""" def __init__(self): self.response_queue = Queue() self._init_model() def _init_model(self): self.model = AutoModelForCausalLM.from_pretrained( MODEL_PATH, model_type="llama", gpu_layers=0, # Route to only using the CPU context_length=CONTEXT_LENGTH, threads=4, batch_size=1, max_new_token=256, temperature=TEMPERATURE ) def _generate(self, prompt): try: response = self.model( prompt, stop=["User:", "<|endoftext|>"], stream=False ) self.response_queue.put(response) except (RuntimeError, ValueError) as e: self.response_queue.put(f"Error: {str(e)}") def get_response(self, prompt): """Threaded generation to prevent UI freeze.""" thread = Thread(target=self._generate, args=(prompt,)) thread.start() while True: if not self.response_queue.empty(): return self.response_queue.get() sys.stdout.write(".") sys.stdout.flush() def memory_safe(): """Check if the system memory usage is within safe limits.""" used_mem = psutil.virtual_memory().used / 1024 / 1024 return used_mem < MAX_RAM_USAGE def main(): """Main function to run the local AI assistant.""" generator = SafeGenerator() print("Local AI Assistant (TinyLlama-1.1B). Type 'exit' to quit.\n") while True: try: user_input = input("You: ") if user_input.lower() in ["exit", "quit"]: break prompt = f"User: {user_input}\nAssistant:" print("\nAssistant", end=" ") response = generator.get_response(prompt) print(f"\n{response}\n") except KeyboardInterrupt: break if __name__ == "__main__": main() ```

(1) Comments

amargo85

1754596439

Hello @-xvii-, welcome to our platform. I can't help you here, but I'm absolutely certain that @davidm8524 will be able to assist you.

Welcome to Chat-to.dev, a space for both novice and experienced programmers to chat about programming and share code in their posts.