1. GPU / CPU

 

GPU AMI : 

 

CPU AMI : 그냥 ubuntu 20.0 서버

 

참조 : https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-generation

sudo apt-get update
pip3 install accelerate
pip3 install transformers
pip3 install torch
git clone https://github.com/huggingface/transformers.git

cd transformers/examples/pytorch/text-generation

python3 run_generation_inf2.py --model_type gpt2 --model_name_or_path gpt2 --num_return_sequences 1 --prompt="“Hello, I’m a language model,”:" --temperature 0.7

 **  --model_name_or_path gpt2 이부분을 gpt2, gpt2-medium, gpt2-large, gpt2-xl, openai-gpt 로 바꿔줄 수 있음

 

2. Inf2.xlarge / Trn1.2xlarge

 

AMI : Neruon + Pytorch + ubuntu 20.0 조합 아무거나

( Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04) 20240102 )

참조 : https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/transformers-neuronx/transformers-neuronx-developer-guide.html

source /opt/aws_neuron_venv_pytorch/bin/activate
pip install transformers-neuronx --extra-index-url=https://pip.repos.neuron.amazonaws.com
from transformers_neuronx.gpt2.model import GPT2ForSampling
from transformers_neuronx.generation_utils import HuggingFaceGenerationModelAdapter
from transformers_neuronx.module import save_pretrained_split
from transformers import AutoModelForCausalLM, AutoTokenizer
import datetime

# Load and save the CPU model
model_cpu = AutoModelForCausalLM.from_pretrained('gpt2-xl')
save_pretrained_split(model_cpu, 'gpt2-split')

# Create and compile the Neuron model
model_neuron = GPT2ForSampling.from_pretrained('gpt2-split', batch_size=1, tp_degree=2, n_positions=256, amp='f32', unroll=None)
model_neuron.to_neuron()

# Use the `HuggingFaceGenerationModelAdapter` to access the generate API
model = HuggingFaceGenerationModelAdapter(model_cpu.config, model_neuron)

# Get a tokenizer and exaple input
tokenizer = AutoTokenizer.from_pretrained('gpt2-xl')
tokenizer.pad_token_id = tokenizer.eos_token_id
tokenizer.padding_side = 'left'
text = "Hello, I'm a language model,"
encoded_input = tokenizer(text, return_tensors='pt', padding=True)

# Run inference using temperature
model.reset_generation()
start = datetime.datetime.now()
sample_output = model.generate(
    input_ids=encoded_input.input_ids,
    attention_mask=encoded_input.attention_mask,
    do_sample=True,
    max_length=20,
    temperature=0.7,
)
end = datetime.datetime.now()
total = end - start
print("Execution time: ", total)
print([tokenizer.decode(tok) for tok in sample_output])

위 코드 포함하는 sample.py 생성

** 저 코드에서 gpt2-xl 이부분도 gpt2, gpt2-medium, gpt2-large, gpt2-xl 로 바꿔줄수 있음

Python3 sample.py

 

 

+ Recent posts