๐Ÿฆ„ How to build a State-of-the-Art Conversational AI

How to build a State-of-the-Art Conversational AI with Transfer Learning

์ด ๊ธ€์€ ConvAI2 NeurIPS(2018) ๋Œ€ํšŒ์—์„œ SOTA(state-of-the-art)๋ฅผ ๊ธฐ๋กํ•œ Hugging Face ์˜ Conversation AI์— ๋Œ€ํ•œ ํŠœํ† ๋ฆฌ์–ผ๋ฅผ ๋ฒˆ์—ญํ•œ ํฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค. ์กธ์—… ํ”„๋กœ์ ํŠธ๋ฅผ ์ง„ํ–‰ํ•˜๋Š” ํ•™๋ถ€์ƒ ์ˆ˜์ค€์—์„œ ์ž‘์„ฑํ•œ ๊ธ€์ด๋‹ˆ ์ฐธ๊ณ ํ•˜๊ณ  ๋ด์ฃผ์‹œ๋ฉด ๊ฐ์‚ฌํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.๐Ÿ˜ƒ

๋‹ค์Œ ์‚ฌ์ดํŠธ์—์„œ Hugging Face๊ฐ€ ์ œ์ž‘ํ•œ ๊ฐ„๋‹จํ•œ ๋ฐ๋ชจ๋ฅผ ์ฒดํ—˜ ํ•ด ๋ณด์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.๐ŸŽฎconvai.huggingface.co.

img

๊ธ€์˜ ์ฃผ์š” ๋ชฉํ‘œ

์ด ํŠœํ† ๋ฆฌ์–ผ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์ฝ”๋“œ๋Š” ๋‹ค์Œ Git ๋ ˆํฌ์ง€ํ† ๋ฆฌ์—์„œ ํ™•์ธ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Personality๋ฅผ ์ง€๋‹Œ AI ๐Ÿค 

๋ณธ ํ”„๋กœ์ ํŠธ์€ ๋ชฉ์ ์€ Persona ๋ฅผ ์ง€๋‹Œ Conversaional AI ๋ฅผ ์ œ์ž‘ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค.

๋ณธ ํ”„๋กœ์ ํŠธ์˜ ๋Œ€ํ™” ์—์ด์ „ํŠธ(Dialog Agent)๋Š” ์–ด๋–ค persona๊ฐ€ ์–ด๋–ค ๋Œ€ํ™” ๊ธฐ๋ก(History)์„ ์„ค๋ช…ํ•˜๋Š” ์ง€์— ๋Œ€ํ•œ Knowledge Base๋ฅผ ๊ฐ–๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž๋กœ๋ถ€ํ„ฐ ์ƒˆ๋กœ์šด ๋Œ€ํ™”๊ฐ€ ์ž…๋ ฅ๋˜๋ฉด ๋Œ€ํ™” ์—์ด์ „ํŠธ๋Š” ๋Œ€ํ™”๋ฅผ ๋ถ„์„ํ•˜์—ฌ Persona๋ฅผ ์ง€๋‹Œ ๋Œ€ํ™”๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.

ํ”„๋กœ์ ํŠธ์˜ ๊ณ„ํš์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

img

๋”ฅ ๋Ÿฌ๋‹์œผ๋กœ ๋Œ€ํ™” ์—์ด์ „ํŠธ๋ฅผ ํ•™์Šต ์‹œํ‚ฌ๋•Œ์˜ ๋ฌธ์ œ์ 

ํ•ด๊ฒฐ์ฑ…

์–ธ์–ด ๋ชจ๋ธ์„ ์‚ฌ์ „ ํ•™์Šต(Pretraining)์œผ๋กœ ๊ตฌ์ถ• ํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์‹œ๊ฐ„ ๋น„์šฉ์ด ๋งŽ์ด ์†Œ์š”๋˜๊ธฐ์—, ์˜คํ”ˆ์†Œ์Šค๋ฅผ ํ™œ์šฉํ•œ๋‹ค. ๋˜ํ•œ ๋ฐ์ดํ„ฐ ์…‹์ด ํฌ๊ณ , ์ข‹์€ ๋ฐ์ดํ„ฐ(The bigger the better)๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์ข‹์ง€๋งŒ ๋ฌธ์žฅ ์ฆ‰, ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ฒŒ ๋ชฉ์ ์ด๋‹ค. ๊ทธ๋Ÿฌ๋ฏ€๋กœ, ์‚ฌ์ „ํ•™์Šต(pretraining)๋œ NPL ๋ชจ๋ธ๋กœ BERT๋ฅผ ๋งŽ์ด ์‚ฌ์šฉํ•˜์ง€๋งŒ, ์™„๋ฒฝ ๋ฌธ์žฅ(Masking์ด ์—†๋Š” ๋ฌธ์žฅ)์—์„œ๋Š” ํ•™์Šต์ด ๋˜์—ˆ์ง€๋งŒ, unfinished sentences(Masking์ด ์žˆ๋Š” ๋ฌธ์žฅ)์—์„œ๋Š” ํ•™์Šต์ด ๋˜์ง€ ์•Š์•˜๊ธฐ ๋•Œ๋ฌธ์— GPT & GPT-2๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค.

๐Ÿฆ„ OpenAI GPT and GPT-2 models

2018๋…„๊ณผ 2019๋…„, ์•Œ๋ ‰ ๋ž˜๋“œํฌ๋“œ, ์ œํ”„๋ฆฌ ์šฐ, ์˜คํ”ˆ์˜ ๋™๋ฃŒ๋“คAI๋Š” ๋งค์šฐ ๋งŽ์€ ์–‘์˜ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ํ›ˆ๋ จ๋œ ๋‘ ๊ฐ€์ง€ ์–ธ์–ด ๋ชจ๋ธ์ธ GPT์™€ GPT-2(Generative Pre-trained Transformer) ์ƒ์„ฑํ–ˆ๋‹ค.

GPT์™€ GPT-2๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฐฉ์‹์œผ๋กœ ์ž‘๋™ํ•œ๋‹ค. decoder ํ˜น์€ causal ์ด๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” ๋ชจ๋ธ์ด ๋‹ค์Œ ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•ด ์™ผ์ชฝ ๋ฌธ๋งฅ์„ ์‚ฌ์šฉํ•œ๋‹ค.

https://miro.medium.com/max/2077/1*YmND0Qj8O6b35J1yU_CPKQ.png

Decoder/Casual ํŠธ๋žœ์Šคํฌ๋จธ๋Š” ์™ผ์ชฝ์—์„œ ๋ถ€ํ„ฐ ๋‹ค์Œ ๋‹จ์–ด๋ฅผ ์ƒ์„ฑ์„ ์˜ˆ์ธกํ•œ๋‹ค. ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜ (Attention Mechanism) ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ๋‹ค. ์–ดํ…์…˜ ๋งค์ปค๋‹ˆ์ฆ˜์€ ๋‹ค์Œ ์‚ฌ์ดํŠธ์—์„œ ์ž์„ธํ•œ ๋‚ด์šฉ์„ ์ฐพ์•„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค .The Illustrated Transformer

๋ณธ ํ”„๋กœ์ ํŠธ๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด์„œ, ์–ธ์–ด ๋ชจ๋ธ(Language Model)์€ ๋‹จ์ˆœํžˆ ์ž…๋ ฅ ์‹œํ€€์Šค(Input Sequence)๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์„ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ์ž…๋ ฅ ์‹œํ€€์Šค ๋‹ค์Œ์— ์ด์–ด์ง€๋Š” ํ† ํฐ์— ๋Œ€ํ•œ ์–ดํœ˜์— ๋Œ€ํ•œ ํ™•๋ฅ  ๋ถ„ํฌ๋ฅผ ์ƒ์„ฑ ํ•ด์•ผํ–ˆ๋‹ค. ์–ธ์–ด ๋ชจ๋ธ์€ ๋Œ€๊ฐœ ์œ„์˜ ๊ทธ๋ฆผ์— ํ‘œ์‹œ๋œ ๊ฒƒ์ฒ˜๋Ÿผ ๊ธด ์ž…๋ ฅ ์‹œํ€€์Šค์—์„œ ๊ฐ ํ† ํฐ์„ ๋”ฐ๋ฅด๋Š” ํ† ํฐ์„ ์˜ˆ์ธกํ•˜์—ฌ ๋ณ‘๋ ฌ ๋ฐฉ์‹์œผ๋กœ ํ›ˆ๋ จ๋œ๋‹ค.

๋Œ€๊ทœ๋ชจ ๋ง๋ญ‰์น˜์—์„œ ์ด๋Ÿฌํ•œ ๋ชจ๋ธ์„ ์‚ฌ์ „ ๊ต์œกํ•˜๋Š” ๊ฒƒ์€ ๋น„์šฉ์ด ๋งŽ์ด ๋“œ๋Š” ์ž‘์—…์ด๊ธฐ ๋•Œ๋ฌธ์— Open AI์—์„œ Pre-trained๋  ๋ชจ๋ธ๊ณผ ํ† ํฐ๋ผ์ด์ €๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค. Tonkenizer๋Š” ์ž…๋ ฅ ๋ฌธ์ž์—ด์„ ํ† ํฐ(๋‹จ์–ด/ํ•˜์œ„ ๋‹จ์–ด)์œผ๋กœ ๋ถ„ํ• ํ•˜๊ณ , ์ด๋Ÿฌํ•œ ํ† ํฐ์„ ๋ชจ๋ธ ์–ดํœ˜๋ฅผ ๋ณ€ํ™˜ํ•œ๋‹ค.

ํ”„๋กœ์ ํŠธ ๋ชฉ์ 

๋ง๋ญ‰์น˜๊ฐ€ ํฐ ๊ฒฝ์šฐ ํ† ํฐํ™”์— ๋งŽ์€ ๋น„์šฉ์ด ๋ฐœ์ƒํ•œ๋‹ค. ๊ทธ๋ž˜์„œ OpenAI์—์„œ ์‚ฌ์ „ํ•™์Šต๋œ tokenizer๋กœ ์šฐ๋ฆฌ ๋ชจ๋ธ์— ์ ์šฉํ•ด ๋ณด์•˜๋‹ค.

pytorch-pretrained-BERT OpenAI GPT ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ๊ณผ ํ† ํฌ๋‚˜์ด์ €๋ฅผ ๋ถˆ๋Ÿฌ์˜จ๋‹ค.

from pytorch_pretrained_bert import OpenAIGPTDoubleHeadsModel, OpenAIGPTTokenizer

model = OpenAIGPTDoubleHeadsModel.from_pretrained('openai-gpt')
tokenizer = OpenAIGPTTokenizer.from_pretrained('openai-gpt')

OpenAI GPT Double Heads Model ๋ผ๋Š” ๋ชจ๋ธ๋ฅผ ๋ถˆ๋Ÿฌ์˜ด

๐Ÿ‘ป ๋Œ€ํ™”์˜์—ญ์— ์–ธ์–ด ๋ชจ๋ธ ์ ์šฉ


์šฐ๋ฆฌ ๋ชจ๋ธ์€ ๋‹จ์ผ์ž…๋ ฅ(Single Input) ์— ํ›ˆ๋ จ๋˜์—ˆ๋‹ค: ์ผ๋ จ์˜ ๋‹จ์–ด๋“ค(a sequence of words)

์ถœ๋ ฅ ์‹œํ€€์Šค๋ฅผ ์ƒ์„ฑํ•˜๋Š” ์—ฌ๋Ÿฌ ์œ ํ˜•์˜ ์ปจํ…์ŠคํŠธ(Context)๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค:

์–ด๋–ป๊ฒŒ ์œ„์™€ ๊ฐ™์€ ๋‹ค์–‘ํ•œ ๋งฅ๋ฝ(Context)์„ ๊ณ ๋ คํ•œ ์ž…๋ ฅ์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ์„๊นŒ?

๋ฌธ๋งฅ(Context) ์„ธ๊ทธ๋จผํŠธ(๋ถ€๋ถ„)๋ฅผ ํ•˜๋‚˜์˜ ์‹œํ€€์Šค๋กœ ์—ฐ๊ฒฐ(Concatenate)์‹œ์ผœ, ๊ทธ ๋Œ€๋‹ต์„ ๋งˆ์ง€๋ง‰์— ๋†“๋Š” ๊ฒƒ์ด๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ, ๋‹ค์Œ ์‹œํ€€์Šค๋ฅผ ๊ณ„์† ์ƒ์„ฑํ•˜์—ฌ ํ† ํฐ์œผ๋กœ ์‘๋‹ต ํ† ํฐ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.

https://miro.medium.com/max/4711/1*RWEUB0ViLTdMjIQd61_WIg.png

(ํŒŒ๋ž€์ƒ‰): ๋ณ‘ํ•ฉ๋œ ํŽ˜๋ฅด์†Œ๋‚˜, (ํ•‘ํฌ), (๋…น์ƒ‰): ์ด์ „ ๋Œ€ํ™” ํžˆ์Šคํ† ๋ฆฌ

ํ•˜์ง€๋งŒ, ์œ„์™€ ๊ฐ™์€ ๋ฐฉ์‹์—๋Š” ๋‘ ๊ฐ€์ง€ ๋ฌธ์ œ์ ์ด ์กด์žฌ:

์ด๋ฅผ ํ•ด๊ฒฐ ํ•˜๊ธฐ ์œ„ํ•ด ๋‹จ์–ด, ์œ„์น˜ ๋ฐ ์„ธ๊ทธ๋จผํŠธ(word, position and segments)์— ๋Œ€ํ•ด ์ž…๋ ฅ์„ ๋ฐ›๋Š” 3๊ฐœ์˜ ๋ณ‘๋ ฌ ์ž…๋ ฅ ์‹œํ€€์Šค๋ฅผ ๋งŒ๋“ค๊ณ , ์ด๋“ค์„ ํ•˜๋‚˜์˜ ๋‹จ์ผ ์‹œํ€€์Šค(Single Sequence)๋กœ ๊ฒฐํ•ฉํ•œ๋‹ค. ์ฆ‰, ๋‹จ์–ด ์œ„์น˜ ๋ฐ ์„ธ๊ทธ๋จผํŠธ ์ž„๋ฒ ๋”ฉ์˜ ์„ธ ๊ฐ€์ง€ ์œ ํ˜•์„ ํ•ฉ์‚ฐํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

https://miro.medium.com/max/4711/1*r7vi6tho6sfpVx-ZQLPDUA.png

์‹คํ–‰


์ฒซ๋ฒˆ์งธ๋กœ, ๊ตฌ๋ถ„ ๊ธฐํ˜ธ ๋ฐ ์„ธ๊ทธ๋จผํŠธ ํ‘œ์‹œ๊ธฐ์— ํŠน์ˆ˜ ํ† ํฐ(special Token)์„ ์ถ”๊ฐ€ํ–ˆ๋‹ค. ์ด ํŠน์ˆ˜ ํ† ํฐ์€ ์‚ฌ์ „ ํ•™์Šต ๋ชจ๋ธ์˜ ํฌํ•จ ๋˜์ง€ ์•Š์•„์„œ, ์ƒˆ๋กœ ํ›ˆ๋ จํ•˜๊ณ , ์ž„๋ฒ ๋”ฉํ™” ํ•˜์—ฌ ์ƒ์„ฑํ–ˆ๋‹ค. (create and train new embeddings)- pytorch-pretrained-BERT ์—ฌ๊ธฐ์—์„œ ์‰ฝ๊ฒŒ ๊ฐ€๋Šฅํ•จ.

# ๋‹ค์Œ๊ณผ ๊ฐ™์ด 5 ๊ฐ€์ง€ special tokens์„ ์‚ฌ์šฉํ•จ:
# - <bos> the sequence์˜ ์ฒ˜์Œ์„ ๊ฐ€๋ฅดํ‚ด
# - <eos> the sequence์˜ ๋์„ ๊ฐ€๋ฅดํ‚ด
# - <speaker1> ์œ ์ €์˜ ๋ฐœํ™”(utterance) ์ฒซ๋ถ€๋ถ„์„ ๊ฐ€๋ฅดํ‚ด
# - <speaker2> ์ฑ—๋ด‡์˜ ๋ฐœํ™”(utterance) ์ฒซ๋ถ€๋ถ„์„ ๊ฐ€๋ฅดํ‚ด
# - <pad> as a padding token to build batches of sequences
SPECIAL_TOKENS = ["<bos>", "<eos>", "<speaker1>", "<speaker2>", "<pad>"]

# We can add these special tokens to the vocabulary and the embeddings of the model:
tokenizer.set_special_tokens(SPECIAL_TOKENS)
model.set_num_special_tokens(len(SPECIAL_TOKENS))
from itertools import chain

# Let's define our contexts and special tokens
persona = [["i", "like", "playing", "football", "."],
           ["i", "am", "from", "NYC", "."]]
history = [["hello", "how", "are", "you", "?"],
           ["i", "am", "fine", "thanks", "."]]
reply = ["great", "to", "hear"]
bos, eos, speaker1, speaker2 = "<bos>", "<eos>", "<speaker1>", "<speaker2>"

def build_inputs(persona, history, reply):

words, segments, position, sequence = build_inputs(persona, history, reply)

# >>> print(sequence)  # Our inputs looks like this:
# [['<bos>', 'i', 'like', 'playing', 'football', '.', 'i', 'am', 'from', 'NYC', '.'],
#  ['<speaker1>', 'hello', 'how', 'are', 'you', '?'],
#  ['<speaker2>', 'i', 'am', 'fine', 'thanks', '.'],
#  ['<speaker1>', 'great', 'to', 'hear', '<eos>']]

# Tokenize words and segments embeddings:
words = tokenizer.convert_tokens_to_ids(words)
segments = tokenizer.convert_tokens_to_ids(segments)

๐Ÿ‘‘ Multi-tasks losses


์šฐ๋ฆฌ๋Š” ์ด์ œ ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์„ ์ดˆ๊ธฐํ™”ํ•˜๊ณ  ํ›ˆ๋ จ ์ž…๋ ฅ์„ ๊ตฌ์ถ•ํ–ˆ์œผ๋ฉฐ, ๋‚จ์€ ๊ฒƒ์€ ํŒŒ์ธ ํŠœ๋‹ ์ค‘์— ์ตœ์ ํ™”ํ•  ์†์‹ค์„ ์„ ํƒํ•˜๋Š” ๊ฒƒ๋ฟ์ด๋‹ค

๋ฌธ์žฅ ์˜ˆ์ธก(Next-sentence prediction)์„ ์œ„ํ•ด ์–ธ์–ด์™€ ๊ฒฐํ•ฉ๋œ multi-task loss์„ ์‚ฌ์šฉํ–ˆ๋‹ค.

๋ฌธ์žฅ ์˜ˆ์ธก ๋ชฉํ‘œ๋Š” BERT ์‚ฌ์ „ํ•™์Šต ๋ถ€๋ถ„์ด๋‹ค. ๊ทธ๊ฒƒ์€ ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ ๋ฌด์ž‘์œ„๋กœ distactor(์ •๋‹ต ์ด์™ธ์˜ ์„ ํƒ์ง€)๋ฅผ ์ถ”์ถœํ•˜๊ณ , ์ž…๋ ฅ ์‹œํ€€์Šค๊ฐ€ gold reply(?)๋กœ ๋๋‚˜๋Š”์ง€ ์•„๋‹ˆ๋ฉด distactor๋กœ ๋๋‚˜๋Š”์ง€ ๊ตฌ๋ณ„ํ•˜๊ธฐ ์œ„ํ•œ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•˜๋Š” ๊ฒƒ์œผ๋กœ ๊ตฌ์„ฑ๋œ๋‹ค. Local ๋ฌธ๋งฅ(Context) ์ด์™ธ์—, Global ์„ธ๊ทธ๋จผํŠธ ์˜๋ฏธ๋ฅผ ์‚ดํŽด๋ณด๋Š” ๋ชจ๋ธ์„ ๊ต์œกํ•œ๋‹ค.

![https://miro.medium.com/max/5326/1945IpgUS9MGLB6gchoQXlw.png](https://miro.medium.com/max/5326/1945IpgUS9MGLB6gchoQXlw.png)

Multi-Task Training ๋ชฉ์  - ๋ชจ๋ธ์€ ์–ธ์–ด ๋ชจ๋ธ๋ง ์˜ˆ์ธก์„ ์œ„ํ•ด ๋‘ ๊ฐœ์˜ ํ—ค๋“œ๋ฅผ ์ œ๊ณต(์˜ค๋ Œ์ง€), ์˜ˆ์ธก ๋ฌธ์žฅ ๋ถ„๋ฅ˜๊ธฐ(ํŒŒ๋ž‘์ƒ‰)

Total Loss๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ณ„์‚ฐ๋œ๋‹ค. language modeling loss ๊ณผ next-sentence prediction loss๋ฅผ ํ†ตํ•ด(์ดํ•ด ์•ˆ๋จ)

๐Ÿ‘ป Decoder ์„ธํŒ…

์–ธ์–ด ์ƒ์„ฑ์„ ์œ„ํ•ด์„œ๋Š” greedy-decoding ๊ณผ beam-search ๋ฐฉ๋ฒ•์„ ์ฃผ๋กœ ์‚ฌ์šฉํ•œ๋‹ค.

Greedy-decoding์€ ๋ฌธ์žฅ์„ ๋งŒ๋“œ๋Š” ๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ•์ด๋‹ค. ๋งค ๋‹จ๊ณ„๋งˆ๋‹ค ์ˆœ์„œ์˜ ๋ ํ† ํฐ์— ๋„๋‹ฌํ•  ๋•Œ๊นŒ์ง€ ๋ชจ๋ธ์— ๋”ฐ๋ผ ๊ฐ€์žฅ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์€ ๋‹ค์Œ ํ† ํฐ์„ ์„ ํƒํ•œ๋‹ค. Greedy-decoding์˜ ํ•œ ๊ฐ€์ง€ ์œ„ํ—˜์€ ๊ฐ€๋Šฅ์„ฑ์ด ๋งค์šฐ ๋†’์€ ํ† ํฐ์ด ๋‚ฎ์€ ํ† ํฐ ๋’ค์— ์ˆจ์–ด ์žˆ๋‹ค๊ฐ€ ๋†“์น  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.

Beam-search๋Š” ๋‹จ์–ด๋ณ„๋กœ ๊ตฌ์„ฑํ•˜๋Š” ๋ช‡ ๊ฐ€์ง€ ๊ฐ€๋Šฅํ•œ ์ˆœ์„œ์˜ Beam(ํŒŒ๋ผ๋ฏธํ„ฐ: ๋„ˆ๋น„์˜ ์ˆ˜)๋ฅผ ์œ ์ง€ํ•จ์œผ๋กœ์จ ์ด ๋ฌธ์ œ๋ฅผ ์™„ํ™”ํ•˜๋ ค๊ณ  ๋…ธ๋ ฅํ•œ๋‹ค. ๊ทธ ๊ณผ์ •์ด ๋๋‚˜๋ฉด Beam ์ค‘์—์„œ ๊ฐ€์žฅ ์ข‹์€ ๋ฌธ์žฅ์„ ๊ณ ๋ฅธ๋‹ค. ์ง€๋‚œ ๋ช‡ ๋…„ ๋™์•ˆ ๋น” ๊ฒ€์ƒ‰์€ ๋Œ€ํ™” ์ƒ์ž๋ฅผ ํฌํ•จํ•œ ๊ฑฐ์˜ ๋ชจ๋“  ์–ธ์–ด ์ƒ์„ฑ ์ž‘์—…์—์„œ ํ‘œ์ค€ ๋””์ฝ”๋”ฉ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด์˜€๋‹ค.

๊ธฐ์กด ๋ฌธ์ œ์ 

์ตœ๊ทผ ๋…ผ๋ฌธ(Ari Holtzman et al.)์— ๋”ฐ๋ฅด๋ฉด โ€œ๋น” ๊ฒ€์ƒ‰๊ณผ ํƒ์š•์Šค๋Ÿฌ์šด ํ•ด๋…๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ์ƒ์„ฑ๋œ ํ…์ŠคํŠธ์˜ ๋‹จ์–ด ๋ถ„ํฌ๋Š” ์ธ๊ฐ„์ด ๋งŒ๋“  ํ…์ŠคํŠธ์˜ ๋‹จ์–ด ๋ถ„ํฌ์™€ ๋งค์šฐ ๋‹ค๋ฅด๋‹ค.โ€ ๋ผ๊ณ  ํ•œ๋‹ค. ๋ถ„๋ช…ํžˆ ๋น” ๊ฒ€์ƒ‰๊ณผ ํƒ์š•์Šค๋Ÿฌ์šด ํ•ด๋…์€ ๋Œ€ํ™” ์‹œ์Šคํ…œ์˜ ๋งฅ๋ฝ์—์„œ [7, 8]์—์„œ๋„ ์–ธ๊ธ‰๋˜์—ˆ๋“ฏ์ด ์ธ๊ฐ„ ๋ฐœํ™”์˜ ๋ฌธ์ž ๋ถ„ํฌ ์ธก๋ฉด์„ ์žฌํ˜„ํ•˜์ง€ ๋ชปํ•œ๋‹ค.

![https://miro.medium.com/max/2830/1yEX1poMDsiEBisrJcdpifA.png](https://miro.medium.com/max/2830/1yEX1poMDsiEBisrJcdpifA.png)

์™ผ์ชฝ: ์‚ฌ๋žŒ์ด ์ƒ์„ฑํ•œ ํ† ํฐ๊ณผ GPT-2๋ฅผ ์‚ฌ์šฉํ•œ ๋น” ๊ฒ€์ƒ‰์— ํ• ๋‹น๋œ ํ™•๋ฅ (๋น” ๊ฒ€์ƒ‰์œผ๋กœ ์žฌํ˜„๋˜์ง€ ์•Š์€ ์ธ๊ฐ„ ํ…์ŠคํŠธ์˜ ๊ฐ•ํ•œ ์ฐจ์ด๋ฅผ ์ฐธ๊ณ ) ์˜ค๋ฅธ์ชฝ: ์ธ๊ฐ„๊ณผ ๊ธฐ๊ณ„๋กœ ์ƒ์„ฑ๋œ ํ…์ŠคํŠธ์˜ N-๊ทธ๋žจ ๋ถ„ํฌ(Greedy/Beam Search).

ํ•ด๊ฒฐ์ฑ…

ํ˜„์žฌ Beam Search/Greedy ๋””์ฝ”๋”ฉ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•๋ก ์€ top-k์™€ nuclear(๋˜๋Š” top-p) ์ƒ˜ํ”Œ๋ง์ด๋‹ค. ์ด ๋‘ ๋ฐฉ๋ฒ•๋ก ์€ ๋ถ„ํฌ๋ฅผ ํ•„ํ„ฐ๋งํ•œ ํ›„ ๋‹ค์Œ ํ† ํฐ ๋ถ„ํฌ์—์„œ ํ‘œ๋ณธ์„ ์ถ”์ถœํ•˜์—ฌ ๋ˆ„์  ํ™•๋ฅ ์ด ์ž„๊ณ„๊ฐ’(nucleus/top-p) ๋ณด๋‹ค ๋†’์€ ์žˆ๋Š” ์ƒ์œ„ ํ† ํฐ(top-k) ๋˜๋Š” ์ƒ์œ„ ํ† ํฐ๋งŒ ์œ ์ง€ํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

๊ทธ๋Ÿฌ๋ฏ€๋กœ, top-k ์™€ nucleus/top-p ์ƒ˜ํ”Œ๋ง์„ ๋””์ฝ”๋”๋กœ ์‚ฌ์šฉํ•˜๊ธฐ๋กœ ๊ฒฐ์ • ํ•˜์˜€๋‹ค.

๊ฒฐ๋ก 

๋ณธ ํฌ์ŠคํŒ…์—์„œ ์„ค๋ช… ํ•œ ๊ฒƒ์ฒ˜๋Ÿผ, Hugging Face์—์„œ๋Š” Conversational AI๋ฅผ ๊ตฌํ˜„ ํ•˜๊ธฐ ์œ„ํ•ด ๋Œ€์šฉ๋Ÿ‰ ์–ธ์–ด ๋ชจ๋ธ(large-scale language model)์ธ OpneAI์˜ GPT-2๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค

๋ณธ ํ”„๋กœ์ ํŠธ์˜ ๋ฐ๋ชจ์™€ ์ž์„ธํ•œ ์ฝ”๋“œ๋Š” ๋‹ค์Œ ๋งํฌ์—์„œ ์ฐพ์•„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

References