์ด ๋…ผ๋ฌธ์„ ์ฒ˜์Œ ์•Œ๊ฒŒ ๋œ ๊ฒƒ์€ ์ €๋ฒˆ๋‹ฌ์— Google Brain์—์„œ Tensorflow Recommenders ๋ผ๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๊ณต๊ฐœํ•˜๋ฉด์„œ ์ž…๋‹ˆ๋‹ค. Youtube๋ผ๋Š” ๊ฑฐ๋Œ€ํ•œ ์ถ”์ฒœ์‹œ์Šคํ…œ์„ ์šด์˜ํ•˜๊ณ  ์žˆ๋Š” ๊ตฌ๊ธ€์ด ์ถ”์ฒœ ์‹œ์Šคํ…œ ๊ด€๋ จ ์ฝ”๋“œ๋ฅผ ๊ณต๊ฐœํ•œ๋‹ค๊ณ  ํ•ด์„œ ์ง‘์ค‘ํ•ด์„œ ๋ณด๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ „์ฒด์ ์ธ ๋‚ด์šฉ์€ Tensorflow Blog์— ๋” ์ž์„ธํžˆ ๋‚˜์™€์žˆ์œผ๋‹ˆ ์ฝ์–ด๋ณด์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.

TFRS(TensorFlow Recommeners)์˜ ๋ชฉํ‘œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • ์ถ”์ฒœ ํ›„๋ณด๊ตฐ์„ ๋น ๋ฅด๊ณ  ์œ ์—ฐํ•˜๊ฒŒ ๋นŒ๋“œ
  • Item, User, Context ์ •๋ณด๋ฅผ ์ž์œ ๋กญ๊ฒŒ ์‚ฌ์šฉํ•˜๋Š” ๊ตฌ์กฐ
  • ๋‹ค์–‘ํ•œ objective๋ฅผ ๋™์‹œ์— ํ•™์Šตํ•˜๋Š” multi-task ๊ตฌ์กฐ
  • ํ•™์Šต๋œ ๋ชจ๋ธ์€ TF Serving์œผ๋กœ ํšจ์œจ์ ์œผ๋กœ ์„œ๋น™

์‚ฌ์‹ค ์ฝ”๋“œ ์ž์ฒด๋Š” ํฌ๊ฒŒ ๋‹ค์–‘ํ•œ ๋‚ด์šฉ๋“ค์ด ์žˆ์ง€๋Š” ์•Š์•˜์ง€๋งŒ, ์ œ์ผ ์ธ์ƒ ๊นŠ์—ˆ๋˜ ๊ฒƒ์€ ์ฝ”๋“œ์—์„œ ๊ธฐ๋ณธ ๋ชจ๋ธ๋กœ ์†Œ๊ฐœํ•œ Two Tower Model์ด์—ˆ์Šต๋‹ˆ๋‹ค. ๋ฐ”๋กœ User์™€ Item์„ ์•„์˜ˆ ๋…๋ฆฝ์ ์œผ๋กœ ํ•™์Šต์‹œ์ผœ ๋งˆ์ง€๋ง‰ ๋‹จ์—์„œ dot product๋กœ๋งŒ click / unclick์„ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์ธ๋ฐ, ์ƒ๊ฐํ•˜๋ฉด ์ƒ๊ฐํ•  ์ˆ˜๋ก ์ข‹์€ ๊ตฌ์กฐ๋”๋ผ๊ตฌ์š”. ๋น„๋ก ํ•™์Šตํ•˜๋Š” ๋‹จ์—์„œ user tower์™€ item tower๊ฐ€ interact ๋ชปํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์—„์ฒญ๋‚œ ์„ฑ๋Šฅ์„ ๋‚ผ ์ง€๋Š” ๋ฏธ์ง€์ˆ˜์˜€์ง€๋งŒ, ๊ตฌ์กฐ ์ž์ฒด๊ฐ€ input feature์˜ ์ œ์•ฝ์ด ์—†์–ด์„œ ๊ฐ€๋Šฅํ•œ feature๋ฅผ ์ž์œ ๋กญ๊ฒŒ ๋„ฃ์„ ์ˆ˜ ์žˆ์—ˆ๊ณ , inferenceํ•  ๋•Œ๋Š” user๋ณ„ embedding, item๋ณ„ embedding์œผ๋กœ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค๊ฐ€ dot product๋กœ๋งŒ similarity๋ฅผ ๊ณ„์‚ฐํ•ด์„œ servingํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ANN(Approximate Nearest Neighbors) ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์™€์˜ ํ˜ธํ™˜์„ฑ๋„ ์ข‹์•„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

๋˜ํ•˜๋‚˜์˜ ์žฅ์ ์€ ๋ฉ”ํƒ€ ์ •๋ณด๋ฅผ ๋„ฃ์„ ์ˆ˜ ์žˆ๋‹ค๋ผ๋Š” ๊ฒƒ์ธ๋ฐ, ์ถ”์ฒœ ์‹œ์Šคํ…œ์—์„œ ์ž์ฃผ ๋งŒ๋‚˜๋Š” ๋ฌธ์ œ๊ฐ€ cold start problem์ž…๋‹ˆ๋‹ค. Item์ด๋˜ User๋˜ ์ฒ˜์Œ์— ์‚ฌ์šฉ ๊ธฐ๋ก์ด ์—†์„ ๊ฒฝ์šฐ ๋ฉ”ํƒ€ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•  ์ˆ˜๋ฐ–์— ์—†๋Š”๋ฐ, ์ด๋ฅผ ๋ฒ”์šฉ์ ์œผ๋กœ ์ž˜ ๋ชจ๋ธ๋งํ•˜๊ธฐ๊ฐ€ ์–ด๋ ต์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์œ„์˜ two tower ๋ชจ๋ธ์€ ์‚ฌ์šฉ ๊ธฐ๋ก์ด ์—†์–ด๋„ User๋‚˜ Item์˜ ๋ฉ”ํƒ€ ์ •๋ณด๋ฅผ ๋„ฃ๊ณ  ๋‚˜๋จธ์ง€๋Š” ํ‰๊ท ๊ฐ’ ๊ฐ™์€ ๊ฑธ ๋„ฃ์œผ๋ฉด ๋ฐ”๋กœ modeling์ด ๋˜๊ธฐ ๋•Œ๋ฌธ์—, cold start ๋ฌธ์ œ๋„ ์•Œ์•„์„œ ์ž˜ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ ๊ฐ™์•˜์Šต๋‹ˆ๋‹ค. ์ด ๋ง์€ dynamicํ•˜๊ฒŒ ์•„์ดํ…œ ํ’€์ด ๋ฐ”๋€Œ๋Š” ๊ตฌ์กฐ์—์„œ ์ž˜ ์“ฐ์ผ ์ˆ˜ ์žˆ๋‹ค๋Š” ๋ง์ด๊ธฐ๋„ ํ•˜๊ตฌ์š”.

์—ฌ๊ธฐ๊นŒ์ง€ ์ƒ๊ฐ์„ ํ•˜๋ฉด์„œ Two Tower Model ๊ด€๋ จ ๋…ผ๋ฌธ์„ ๋” ์ž์„ธํžˆ ์ฝ๊ณ  ์ •๋ฆฌํ•ด ๋ณด์•˜์Šต๋‹ˆ๋‹ค. ์›๋ณธ ๋…ผ๋ฌธ ์ œ๋ชฉ์€ “Sampling-Bias-Corrected Neural Modeling for Large Corpus Item Recommendations” ์ด๊ณ , Google Brain in Youtube์—์„œ ๊ณต๊ฐœํ•˜์˜€์œผ๋ฉฐ, ์ง€๊ธˆ Youtube ์ถ”์ฒœ ์‹œ์Šคํ…œ์—์„œ candidate generation์—์„œ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋Š” ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

1. Concept

์œ ํŠœ๋ธŒ์—์„œ๋Š” ํฌ๊ฒŒ 2๊ฐ€์ง€ stage๋กœ ๋‚˜๋‰˜์–ด์„œ ์ถ”์ฒœ์ด ์ด๋ค„์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ”๋กœ candidate generation๊ณผ ranking model์ธ๋ฐ์š”. ์–ด๋–ค ์œ ์ €์—๊ฒŒ ์–ด๋–ค ์˜์ƒ์„ ์ถ”์ฒœํ•ด์ค€๋‹ค๊ณ  ํ–ˆ์„ ๋•Œ 1) candidate generation์„ ํ†ตํ•ด ์ „์ฒด ์•„์ดํ…œ ์ค‘์—์„œ ์ถ”์ฒœ ํ›„๋ณด๊ตฐ์œผ๋กœ ๋ฝ‘์„ ๋งŒํ•œ ๋ช‡ ๋ฐฑ๊ฐœ์˜ ์•„์ดํ…œ์„ ์ถ”๋ฆฌ๊ณ , ๊ทธ ๋ช‡ ๋ฐฑ๊ฐœ ์ค‘์—์„œ ์ตœ์ข… ์ถ”์ฒœ์— ๋‚˜๊ฐˆ ๋ช‡ ๊ฐœ๋Š” 2) ranking model์ด ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋•Œ candidate generation์€ ์‹ค์ œ ์›ํ•˜๋Š” ์•„์ดํ…œ์ด ๋ฐ˜๋“œ์‹œ ์ถ”์ฒœ ํ›„๋ณด๊ตฐ์— ์žˆ์–ด์•ผ ํ•˜๋ฏ€๋กœ recall@k ๊ฐ€ ์ค‘์š”ํ•˜๊ณ , ranking model์€ ์‹ค์ œ ์›ํ•˜๋Š” ์•„์ดํ…œ์ด ์ตœ์ƒ์œ„์— ๋žญํฌ๋˜์–ด์•ผ ํ•˜๋ฏ€๋กœ nDCG@k, HR@k ๋“ฑ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

๋˜ ์‹ค์ œ ์ถ”์ฒœ์‹œ์Šคํ…œ์„ ๋งŒ๋“ค์–ด์•ผ ํ•˜๋Š” ์ž…์žฅ์—์„œ ์ƒ๊ฐํ•ด ๋ณธ๋‹ค๋ฉด, candidate generation model์€ ์ „์ฒด ์•„์ดํ…œ ์ค‘์—์„œ ๋ช‡๋ฐฑ๊ฐœ๋ฅผ ์ถ”๋ ค์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ตœ๊ณ ์˜ ์„ฑ๋Šฅ๋ณด๋‹ค๋Š” ์ ๋‹นํ•œ ์„ฑ๋Šฅ๊ณผ ๋น ๋ฅธ inference ์†๋„๊ฐ€ ๋” ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. Inference๋ฅผ ๋น ๋ฅด๊ฒŒ ํ•˜์—ฌ ์ˆ˜๋ฐฑ~์ˆ˜์ฒœ๋งŒ๊ฐœ์˜ ์•„์ดํ…œ ์ค‘ ๋ช‡๋ฐฑ๊ฐœ๋ฅผ ์ถ”๋ฆฌ๋Š” ๋ฐฉ๋ฒ•์€ item์˜ ์ข‹์€ embedding vector๋ฅผ ๋ฝ‘์•„์„œ user๊ฐ€ ์ตœ๊ทผ์— ์†Œ๋น„ํ•œ item๊ณผ ์œ ์‚ฌํ•œ ์•„์ดํ…œ์„ ํ›„๋ณด๊ตฐ์œผ๋กœ ๋ฝ‘๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์–ด๋–ค item์˜ embedding์„ ๋ฝ‘๋Š” ๋ฐฉ์‹์€ ๊ฐ„๋‹จํ•˜๊ฒŒ๋Š” word2vec๊ณผ bag of words๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ item์˜ text embedding์„ ๋ฝ‘์„ ์ˆ˜๋„ ์žˆ๊ณ , ์˜์ƒ์ด๋‚˜ ๊ทธ๋ฆผ์˜ ๊ฒฝ์šฐ pretrained image model์„ ํ™œ์šฉํ•ด low dimension์˜ feature๋ฅผ ๋ฝ‘์„ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆ๋œ two tower model์€ meta์ •๋ณด์™€ ๋‹ค์–‘ํ•œ feature๋ฅผ ๋„ฃ์–ด์„œ ์ข‹์€ user / item embedding์„ ๋งŒ๋“  ๋‹ค์Œ์— dot distance ๊ธฐ๋ฐ˜์œผ๋กœ nearest neighbor๋ฅผ ์ฐพ๋Š” ๋ฐฉ์‹์œผ๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

2. Modeling Overview

๊ทธ๋Ÿฌ๋ฉด ์ด ๋…ผ๋ฌธ์—์„œ ํ•ต์‹ฌ์ด ๋˜๋Š” two tower ๊ตฌ์กฐ์™€, ์ด ๋…ผ๋ฌธ์€ ๋ฐ์ดํ„ฐ ๋‹จ์—์„œ๋Š” positive pair๋งŒ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค๊ฐ€ batch ๋‚ด์—์„œ negative sample์„ ๋ฝ‘์•˜๋Š”๋ฐ ์ด๋ฅผ ์–ด๋–ป๊ฒŒ ํ–ˆ๋Š”์ง€ ์•Œ์•„๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

Two-tower Model

์‚ฌ์‹ค two tower model์€ dual encoder๋ผ๋Š” ๋ง๋กœ ์ž์—ฐ์–ด์ฒ˜๋ฆฌ ๋ถ„์•ผ์—์„œ ๋จผ์ € ์œ ํ–‰๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์–ด๋–ค ๋‘ ๋ฌธ์žฅ์˜ ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ณ ์ž ํ•˜๋Š” ๊ณผ์ œ์—์„œ, ๋ฌธ์žฅ ํ•˜๋‚˜๋งˆ๋‹ค encoder๋ฅผ ํƒœ์šฐ๊ณ  ๊ฑฐ๊ธฐ์„œ ๋‚˜์˜จ sentence representation์œผ๋กœ ๋ฌธ์žฅ์˜ ๊ด€๊ณ„๋ฅผ ๋ถ„๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋•Œ ์‚ฌ์šฉํ•˜๋Š” encoder๋กœ๋Š” RNN, Transformer๋ถ€ํ„ฐ ์ตœ๊ทผ์—๋Š” pretrained BERT ๊ตฌ์กฐ๊นŒ์ง€๋„ ์‚ฌ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋ถ„๋ฅ˜ํ•˜๊ณ ์žํ•˜๋Š” label์ด ๋ฌดํ•œํžˆ ๋งŽ์„ ๋•Œ, ์ด๋ฅผ multi label classification์œผ๋กœ ์ ‘๊ทผํ•˜์ง€ ์•Š๊ณ  query์™€ label๋ฅผ input์œผ๋กœ ํ•ด์„œ ์ ํ•ฉํ•œ์ง€ ์•„๋‹Œ์ง€๋ฅผ ํŒ๋‹จํ•˜๋Š” binary classification์œผ๋กœ ์ ‘๊ทผํ•œ๋‹ค๋Š” ์ ์ด ์ด ๊ตฌ์กฐ์˜ ํ•ต์‹ฌ์ž…๋‹ˆ๋‹ค.

Batch Negative Sampling

Two-tower ๋ชจ๋ธ์„ ํ•™์Šตํ•œ๋‹ค๊ณ  ํ–ˆ์„ ๋•Œ ํ•„์š”ํ•œ ๋ฐ์ดํ„ฐ๋Š” ์–ด๋–ค user๊ฐ€ item์„ ํด๋ฆญํ–ˆ๋‹ค๋ผ๋Š” ๋ฐ์ดํ„ฐ์™€ ์–ด๋–ค user๊ฐ€ item์„ ๋ดค์ง€๋งŒ clickํ•˜์ง€ ์•Š์•˜๋‹ค๋ผ๋Š” ๋ฐ์ดํ„ฐ์ž…๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ด clickํ•˜์ง€ ์•Š์•˜๋‹ค๋ผ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•  ๋•Œ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋Š” ๋ช‡๊ฐ€์ง€ ์–ด๋ ค์›€์ด ์žˆ๋Š”๋ฐ, ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • ์• ์ดˆ์— ์„œ๋น„์Šค ํ™˜๊ฒฝ์— ๋”ฐ๋ผ ๋…ธ์ถœ๋˜์—ˆ๋‹ค๋ผ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๊ตฌํ•  ์ˆ˜ ์—†์„ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๋ฐ์ดํ„ฐ ํฌ๊ธฐ: ๋ณดํ†ต click์— ๋น„ํ•ด impression(๋…ธ์ถœ)์˜ ์ˆ˜๋Š” ์••๋„์ ์œผ๋กœ ๋งŽ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ์ €์žฅํ•˜๊ธฐ ์‹œ์ž‘ํ•œ๋‹ค๋ฉด ๋ฐ์ดํ„ฐ์˜ ํฌ๊ธฐ๊ฐ€ ์ง€๋‚˜์น˜๊ฒŒ ๋Š˜์–ด๋‚ฉ๋‹ˆ๋‹ค.
  • Serving bias: unclick์€ ์ „์ ์œผ๋กœ ๋‹น์‹œ์— ์ถ”์ฒœ ๋กœ์ง์ด ๋ฌด์—‡์ด์—ˆ๋А๋ƒ์— ๋”ฐ๋ผ์„œ ๋ฐ์ดํ„ฐ๊ฐ€ ๊ฒฐ์ •๋ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ํ•ด๋‹น ์ถ”์ฒœ ๋กœ์ง์— ๋”ฐ๋ผ negative์˜ data distribution์ด ๋งค์šฐ ๋‹ฌ๋ผ์ง‘๋‹ˆ๋‹ค.
  • Hard negative: ๋…ธ์ถœ์ด๋ผ๋Š” ๊ฒƒ๋„ ๊ฒฐ๊ตญ ๊ธฐ์กด ๋กœ์ง์—์„œ topk๋กœ ์ถ”์ฒœ์ด ๋‚˜๊ฐ„ ๊ฒฐ๊ณผ ์ค‘์— ์‹คํŒจํ•œ ๊ฒƒ๋“ค์ธ๋ฐ, ๊ธฐ์กด ๋กœ์ง์—์„œ topk๋กœ ๋…ธ์ถœ๋˜์—ˆ๋‹ค๋ผ๋Š” ๊ฒƒ ์ž์ฒด๊ฐ€ ์ด๋ฏธ ์–ด๋ ค์šด negative์ž…๋‹ˆ๋‹ค.

์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„œ click๋ฐ์ดํ„ฐ ๋งŒ์œผ๋กœ ํ•™์Šต์„ ํ•  ๋•Œ batch ๋‚ด์—์„œ ์ž„์˜๋กœ negative๋ฅผ ๋ฝ‘๋Š” ๊ฒƒ์ด batch negative sampling์ž…๋‹ˆ๋‹ค. Batch ๋‹จ์œ„๋กœ positive pair๋“ค์ด ๋“ค์–ด์˜ฌ ๋•Œ, ์ˆœ์„œ๋ฅผ ์–ด๊ธ‹๋‚˜๊ฒŒ ํ•ด์„œ ํ•ด๋‹น user์—๊ฒŒ ๋‹ค๋ฅธ item์„ ๋งตํ•‘์‹œํ‚จ ๋‹ค์Œ์— ์ด๋ฅผ negative๋ผ๊ณ  ์ƒ๊ฐํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. Two Tower ๋ชจ๋ธ์—์„œ๋Š” ์ค‘๋ณต๋œ ๊ณ„์‚ฐ์„ ํ”ผํ•˜๊ธฐ ์œ„ํ•ด ๋งˆ์ง€๋ง‰ dot product ์ง์ „์—์„œ batch negative sampling์„ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์œ„ ๊ทธ๋ฆผ์—์„œ query embedding๊ณผ item embedding์„ matmul ๊ณ„์‚ฐ์„ ํ•˜๋ฉด label matrix๊ฐ€ ๋˜๊ณ  ์ด๋•Œ $(i,i)$ ์—ด๋งŒ positive์ด๊ณ  ๋‚˜๋จธ์ง€๋Š” ๋‹ค negative๊ฐ€ ๋ฉ๋‹ˆ๋‹ค. ๋ชจ๋“  ๊ฑธ ๋‹ค negative๋กœ ์“ฐ์ง€๋Š” ์•Š๊ณ  negative sampling์—๋„ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๊ธฐ๋ณธ ๋ฐฉ์‹์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ๋Œ€ํ‘œ์ ์ธ ๊ฒƒ๋“ค์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • random negative: $B$-1๊ฐœ ์ค‘์—์„œ k๊ฐœ๋ฅผ ์ž„์˜๋กœ sampling ํ•ฉ๋‹ˆ๋‹ค.
  • hard negative: $B$-1๊ฐœ ์ค‘์—์„œ ๋ชจ๋ธ์ด ํŒ๋‹จํ•˜๊ธฐ์— ๊ฐ€์žฅ ์–ด๋ ค์›Œํ–ˆ๋˜ pair (dot product ๊ฐ’์ด ๊ฐ€์žฅ ๋†’์€ ๊ฒƒ๋“ค)๋งŒ samplingํ•ฉ๋‹ˆ๋‹ค.
  • semi-hard negative: $B$-1๊ฐœ ์ค‘์—์„œ ๋ชจ๋ธ์ด ํŒ๋‹จํ•˜๊ธฐ์— ๋‹ค์†Œ ์–ด๋ ค์›Œํ–ˆ๋˜ pair (dot product ๊ฐ’์ด ํŠน์ • range์— ์žˆ๋Š” ๊ฒƒ๋“ค) ์ค‘์—์„œ k๊ฐœ๋ฅผ samplingํ•ฉ๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ negative sampling์„ two-tower model ๋งˆ์ง€๋ง‰ ๋‹จ์—์„œ ์ง„ํ–‰ํ•˜๊ฒŒ ๋œ๋‹ค๋ฉด ๋ฐ˜๋“œ์‹œ ์„ฑ๋Šฅ์ด ์ข‹๋‹ค๋ผ๋Š” ๋ณด์žฅ์€ ์—†์ง€๋งŒ, ์ ์–ด๋„ ๋ฐ์ดํ„ฐ์˜ ํฌ๊ธฐ๊ฐ€ ๋ช‡์‹ญ๋ถ„์˜ 1๋กœ ์ค„๊ณ , ๊ธฐ์กด๊ณผ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ์€ ํฌ๊ฒŒ ๋‹ค๋ฅด์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ํ•™์Šต ์†๋„๋„ ๋ช‡ ์‹ญ๋ฐฐ ์ฆ๊ฐ€ํ•œ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํ•˜์ง€๋งŒ ๋ฐ˜๋“œ์‹œ ์ข‹์€ ์ ๋งŒ ์žˆ๋Š” ๊ฒƒ์€ ์•„๋‹Œ๋ฐ์š”, ์•ž์„œ ์„ค๋ช…๋“œ๋ ธ๋‹ค์‹œํ”ผ batch negative sampling์€ ์ถ”์ฒœ ๋ง๊ณ ๋„ ๋‹ค๋ฅธ ๋ถ„์•ผ์—์„œ๋„ ๋งŽ์ด ์‚ฌ์šฉ๋˜์ง€๋งŒ ์ถ”์ฒœ์—์„œ batch negative sampling์„ ํ•  ๋•Œ ๋ฐœ์ƒํ•˜๋Š” ๋ฌธ์ œ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ”๋กœ popular item์ธ๋ฐ์š”. ์ถ”์ฒœ์—์„œ๋Š” item์˜ ๋“ฑ์žฅ ํ™•๋ฅ ์ด ํŠน์ • ์ธ๊ธฐ์žˆ๋Š” ๊ฒƒ๋“ค์— ๋งค์šฐ ์น˜์šฐ์ณ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋งŒํผ click์ด ๋งŽ์ด ์ผ์–ด๋‚œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ˆ๊น positive sample์„ ๋ฐ”๋ผ๋ณผ ๋•Œ๋Š” ์ƒ๊ด€์ด ์—†์ง€๋งŒ, ๋ฌธ์ œ๋Š” negative samplingํ•  ๋•Œ negative ํ›„๋ณด๊ตฐ๋“ค์—๋„ popular item์ด ๋„ˆ๋ฌด ๋งŽ๋‹ค๋ผ๋Š” ๊ฑฐ์ฃ . ์ด๋ฅผ item frequency bias๋ผ๊ณ  ๋งํ•ฉ๋‹ˆ๋‹ค. ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„œ matmulํ•œ logit์—๋‹ค๊ฐ€ ๊ฐ ์•„์ดํ…œ๋ณ„ sampling probability๋ฅผ estimateํ•ด์„œ ๋นผ์ค๋‹ˆ๋‹ค. ๊ทธ๋ ‡๊ฒŒ ๋˜๋ฉด popular item์— ๋Œ€ํ•œ loss๋Š” ์•Œ์•„์„œ ์ค„์–ด๋“ค๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

์ด ๊ธ€์€ ์›๋ณธ์˜ ์ผ๋ถ€๋งŒ ํฌํ•จํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ „์ฒด ๋‚ด์šฉ์€ ์ด์ „ ๋ธ”๋กœ๊ทธ์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.