토큰 수 출력해보기

Notice

틀린 부분이 있으면 고쳐주세요

Recent Posts

Link

Recent Comments

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

월레스와 그로밋: 코딩의 날

토큰 수 출력해보기 본문

Python/Etc

토큰 수 출력해보기

구운 감자 2025. 3. 25. 16:26

import tiktoken

tiktoken: OpenAI의 GPT 계열 모델에서 사용하는 토크나이저(tokenizer) 라이브러리

# 토큰 인코딩 규칙 불러오기
encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")

tiktoken.encoding_for_model(): "gpt-3.5-turbo" 모델에 맞는 토큰 인코딩 규칙 불러오기(LLM)
참고) 토크나이저 이름을 불러올 시, tiktoken.get_encoding()

# 텍스트 정의
text = "GPT is a type of language model developed by OpenAI that uses deep learning to understand and generate human-like text. It stands for Generative Pre-trained Transformer. The model is trained on a large amount of text data from the internet and learns patterns, grammar, and context. Once trained, GPT can respond to prompts, answer questions, write essays, generate code, and perform many other language-related tasks. It works by predicting the next word in a sequence, using what it has learned during pretraining."

# 텍스트 토큰화 후, 토큰 수 계산
tokens = encoding.encode(text)
num_tokens = len(tokens)

encoding.encode(text): text를 토큰 리스트 변환
len(tokens) -> 총 토큰 개수

print(f"토큰 수: {num_tokens}")

전체 코드

import tiktoken

# 토큰 인코딩 규칙 불러오기
encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")

# 텍스트 정의
text = "GPT is a type of language model developed by OpenAI that uses deep learning to understand and generate human-like text. It stands for Generative Pre-trained Transformer. The model is trained on a large amount of text data from the internet and learns patterns, grammar, and context. Once trained, GPT can respond to prompts, answer questions, write essays, generate code, and perform many other language-related tasks. It works by predicting the next word in a sequence, using what it has learned during pretraining."

# 텍스트 토큰화 후, 토큰 수 계산
tokens = encoding.encode(text)
num_tokens = len(tokens)

print(f"토큰 수: {num_tokens}")

'Python > Etc' 카테고리의 다른 글

web_crawling(무신사에서 캉골 브랜드 제품의 제품명 & 제품가격 크롤링) (0)	2025.01.30

'Python/Etc' Related Articles

web_crawling(무신사에서 캉골 브랜드 제품의 제품명 & 제품가격 크롤링) 2025.01.30

월레스와 그로밋: 코딩의 날

토큰 수 출력해보기 본문

토큰 수 출력해보기

'Python > Etc' 카테고리의 다른 글

티스토리툴바