Qwen2.5-14B-Instruct

Bước 0. Chuẩn bị

Để cài đặt các ai model từ HUGFACE, ta nên tạo ra một dự án dành riêng cho việc này, đồng thời thiết lập các thông số phù hợp cho việc lưu trữ các models và lựa chọn model để triển khai. Trong phần này tôi sẽ tập trung vào QWEEN2.5 để làm ví dụ hướng dẫn cũng là tài liệu tham khảo sau này.

Bước 1. Cài đặt [Qwen2.5-14B-Instruct]

Tham khảo: https://huggingface.co/Qwen/Qwen2.5-14B-Instruct

Ta cần phải tải model về (và lưu trên máy), để thuận tiện cho việc quản lý các models chúng ta có thể thiết lập thư mục riêng trên máy để lưu trữ các models này, cụ thể:

Model name: Qwen/Qwen2.5-32B-Instruct
Thư mục lưu: Documents/GitHub/ai_models

Cần cài đặt các thư viện:

pip install transformers
pip install torch #if use macOS for use GPU

Mã nguồn

from transformers import AutoTokenizer, AutoModelForCausalLM

# Tên model và thư mục lưu

path = "/Users/taipm/Documents/GitHub/ai_models/"
model_names = [
    "Qwen/Qwen2.5-32B-Instruct"#,
    # "Qwen/Qwen2.5-Math-7B-Answer",
]

for model_name in model_names:

    local_dir = f"{path}{model_name}"

    # Tải và lưu tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
    tokenizer.save_pretrained(local_dir)

    # Tải và lưu model
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        device_map=None,  # Không map tự động khi lưu
        trust_remote_code=True,
        torch_dtype="auto"
    )
    model.save_pretrained(local_dir)

    print(f"Model {model_name} đã được lưu tại: {local_dir}")

Chạy chương trình:

python install_models.py

Quá trình cài đặt diễn ra

Việc của chúng ta là chờ để test

Test

Ghi chú

Với máy tính của tôi, Mac Studio M1 – Ultra, 64GB, 1TB thì chỉ có thể cài được 14B

KẾT QUẢ:

Qwen/Qwen2.5-32B-Instruct: Không chạy nổi

Kiểm thử model (sau cài đặt)

Sau khi install_models, ta đã lưu tại thư mục (chỉ định)
Giờ chúng ta sẽ kết nối nó với chương trình để kiểm tra hoạt động

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Đường dẫn cục bộ đến thư mục chứa model
path = "/Users/taipm/Documents/GitHub/ai_models/"
model_name = f"{path}/Qwen_Qwen2.5-14B-Instruct"

# Kiểm tra thiết bị có sẵn: sử dụng MPS nếu hỗ trợ, nếu không sẽ dùng CPU
device = "mps" if torch.backends.mps.is_available() else "cpu"

# Load model và tokenizer từ thư mục cục bộ
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16 if device == "mps" else "auto",  # Sử dụng float16 cho MPS
    device_map="auto" if device == "mps" else None            # Tự động ánh xạ nếu dùng MPS
).to(device)  # Đưa model về thiết bị (MPS hoặc CPU)

tokenizer = AutoTokenizer.from_pretrained(model_name)

# Prompt và thiết lập hội thoại
prompt = "Giải thích cho tôi cách thức sử dụng hàm softmax trong mạng neural network."
messages = [
    {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
    {"role": "user", "content": prompt}
]

# Chuẩn bị input: chuyển đổi tin nhắn sang văn bản dạng model yêu cầu
try:
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
except AttributeError:
    # Nếu apply_chat_template không có sẵn, xây dựng text thủ công
    text = (
        f"System: You are Qwen, created by Alibaba Cloud. You are a helpful assistant.\n"
        f"User: {prompt}\n"
    )

# Tokenize và chuyển input vào thiết bị
model_inputs = tokenizer([text], return_tensors="pt").to(device)

# Sinh văn bản đầu ra
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)

# Loại bỏ phần input ban đầu khỏi đầu ra
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

# Decode văn bản từ token ID thành chuỗi
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

# In kết quả
print("Generated Response:")
print(response)

Test

Ghi chú

KẾT QUẢ:

Kiểm thử model (sau cài đặt)

Comments

Leave a Reply Cancel reply