1. GPU / CPU
GPU AMI :
CPU AMI : 그냥 ubuntu 20.0 서버
참조 : https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-generation
sudo apt-get update
pip3 install accelerate
pip3 install transformers
pip3 install torch
git clone https://github.com/huggingface/transformers.git
cd transformers/examples/pytorch/text-generation
python3 run_generation_inf2.py --model_type gpt2 --model_name_or_path gpt2 --num_return_sequences 1 --prompt="“Hello, I’m a language model,”:" --temperature 0.7
** --model_name_or_path gpt2 이부분을 gpt2, gpt2-medium, gpt2-large, gpt2-xl, openai-gpt 로 바꿔줄 수 있음
2. Inf2.xlarge / Trn1.2xlarge
AMI : Neruon + Pytorch + ubuntu 20.0 조합 아무거나
( Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04) 20240102 )
source /opt/aws_neuron_venv_pytorch/bin/activate
pip install transformers-neuronx --extra-index-url=https://pip.repos.neuron.amazonaws.com
from transformers_neuronx.gpt2.model import GPT2ForSampling
from transformers_neuronx.generation_utils import HuggingFaceGenerationModelAdapter
from transformers_neuronx.module import save_pretrained_split
from transformers import AutoModelForCausalLM, AutoTokenizer
import datetime
# Load and save the CPU model
model_cpu = AutoModelForCausalLM.from_pretrained('gpt2-xl')
save_pretrained_split(model_cpu, 'gpt2-split')
# Create and compile the Neuron model
model_neuron = GPT2ForSampling.from_pretrained('gpt2-split', batch_size=1, tp_degree=2, n_positions=256, amp='f32', unroll=None)
model_neuron.to_neuron()
# Use the `HuggingFaceGenerationModelAdapter` to access the generate API
model = HuggingFaceGenerationModelAdapter(model_cpu.config, model_neuron)
# Get a tokenizer and exaple input
tokenizer = AutoTokenizer.from_pretrained('gpt2-xl')
tokenizer.pad_token_id = tokenizer.eos_token_id
tokenizer.padding_side = 'left'
text = "Hello, I'm a language model,"
encoded_input = tokenizer(text, return_tensors='pt', padding=True)
# Run inference using temperature
model.reset_generation()
start = datetime.datetime.now()
sample_output = model.generate(
input_ids=encoded_input.input_ids,
attention_mask=encoded_input.attention_mask,
do_sample=True,
max_length=20,
temperature=0.7,
)
end = datetime.datetime.now()
total = end - start
print("Execution time: ", total)
print([tokenizer.decode(tok) for tok in sample_output])
위 코드 포함하는 sample.py 생성
** 저 코드에서 gpt2-xl 이부분도 gpt2, gpt2-medium, gpt2-large, gpt2-xl 로 바꿔줄수 있음
Python3 sample.py
'<개념> > Deep learning' 카테고리의 다른 글
T5 inference in inferentia (0) | 2024.04.08 |
---|---|
openai gpt3 inference in inferentia2(neruon) (0) | 2024.03.28 |
Inf2.xlarge에서 gpt2 text generation inference 수행 (0) | 2024.01.22 |
[Scaler정리] Scaler 식 정리 (0) | 2023.05.15 |
[Python] stackingregressor + sequential 모델적용 (0) | 2023.01.17 |