월레스와 그로밋: 코딩의 날

토큰 수 출력해보기 본문

Python/Etc

토큰 수 출력해보기

구운 감자 2025. 3. 25. 16:26
import tiktoken
  • tiktoken: OpenAI의 GPT 계열 모델에서 사용하는 토크나이저(tokenizer) 라이브러리
# 토큰 인코딩 규칙 불러오기
encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")
  • tiktoken.encoding_for_model(): "gpt-3.5-turbo" 모델에 맞는 토큰 인코딩 규칙 불러오기(LLM)
  • 참고) 토크나이저 이름을 불러올 시, tiktoken.get_encoding()
# 텍스트 정의
text = "GPT is a type of language model developed by OpenAI that uses deep learning to understand and generate human-like text. It stands for Generative Pre-trained Transformer. The model is trained on a large amount of text data from the internet and learns patterns, grammar, and context. Once trained, GPT can respond to prompts, answer questions, write essays, generate code, and perform many other language-related tasks. It works by predicting the next word in a sequence, using what it has learned during pretraining."
# 텍스트 토큰화 후, 토큰 수 계산
tokens = encoding.encode(text)
num_tokens = len(tokens)
  • encoding.encode(text): text를 토큰 리스트 변환
  • len(tokens) -> 총 토큰 개수
print(f"토큰 수: {num_tokens}")

전체 코드

import tiktoken

# 토큰 인코딩 규칙 불러오기
encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")

# 텍스트 정의
text = "GPT is a type of language model developed by OpenAI that uses deep learning to understand and generate human-like text. It stands for Generative Pre-trained Transformer. The model is trained on a large amount of text data from the internet and learns patterns, grammar, and context. Once trained, GPT can respond to prompts, answer questions, write essays, generate code, and perform many other language-related tasks. It works by predicting the next word in a sequence, using what it has learned during pretraining."

# 텍스트 토큰화 후, 토큰 수 계산
tokens = encoding.encode(text)
num_tokens = len(tokens)

print(f"토큰 수: {num_tokens}")