Příklad #10 — Attention: na co se model dívá

Když model zpracovává slovo, „rozhlíží se" po ostatních slovech ve větě a každému dá váhu pozornosti — jak moc je pro něj důležité. Tahle myšlenka (attention) je srdcem dnešních modelů. Klikni na slovo a uvidíš, kam se dívá. When a model processes a word, it "looks around" at the other words in the sentence and gives each an attention weight — how important it is to it. This idea (attention) is at the heart of today's models. Click a word and see where it looks.

Co a jak?What & how?

Co to jeWhat it is Attention (pozornost): když model zpracovává slovo, „rozhlíží se" po ostatních slovech ve větě a každému dá váhu — jak moc je pro něj důležité.Attention: when the model processes a word, it “looks around” at the other words in the sentence and gives each a weight — how important it is to it.

Co zkusitWhat to try Napiš větu a klikni na slovo — uvidíš, kolik pozornosti věnuje každému dalšímu (procento a sytost modré). Dole je celá mapa pozornosti. Pozornost se vždy rozdělí tak, aby dala dohromady 100 %.Type a sentence and click a word — you’ll see how much attention it pays to each other one (a percentage and the depth of blue). Below is the full attention map. Attention is always split so it adds up to 100 %.

Proč je to důležitéWhy it matters Přesně takhle „uvažuje" každá vrstva dnešních transformerů: každé slovo si přepočítá svůj význam podle toho, na co se ve větě dívá. Tím modely chápou kontext — kdo je „on", k čemu patří „to".This is exactly how every layer of today’s transformers “reasons”: each word recomputes its meaning based on what it looks at in the sentence. It’s how models grasp context — who “he” is, what “it” refers to.

1 dotaz (co hledám?)query (what am I looking for?) → 2 klíče ostatních slovkeys of the other words → 3 skóre shodymatch scores → 4 softmax = váhy (100 %)softmax = weights (100 %) → 5 nový význam slovathe word's new meaning

věta (uprav podle libosti)sentence (edit freely)

Klikni na slovo výše.Click a word above.

mapa pozornosti — řádek = „odkud", sloupec = „kam se dívá"attention map — row = "from", column = "looks at"

Jak číst mapu? Každý řádek je jedno slovo („odkud se dívám"), sloupce jsou slova, na která se dívá. Číslo v buňce je % pozornosti a každý řádek dává dohromady 100 %. Klikni na slovo nahoře nebo na řádek a zvýrazní se. Sytá úhlopříčka znamená, že se slova hodně dívají sama na sebe.
Pozn.: tady je to zjednodušená ilustrace mechanismu (podobnost slov + jejich blízkost ve větě), ne skutečně natrénovaný model — ten by se vztahy naučil z obrovského množství textu. How to read the map? Each row is one word ("where I look from"), the columns are the words it looks at. The number in a cell is the % of attention and each row adds up to 100 %. Click a word at the top or a row to highlight it. A strong diagonal means words look at themselves a lot.
Note: this is a simplified illustration of the mechanism (word similarity + their closeness in the sentence), not a truly trained model — that would learn the relationships from a huge amount of text.

Jak attention fungujeHow attention works

Každé slovo vyšle „dotaz" (co hledám?) a každé nabízí „klíč" (co nabízím?). Model spočítá, jak dotaz sedí ke klíči — čím líp, tím větší pozornost. Váhy se přepočítají tak, aby dohromady daly 100 %, a slovo si pak poskládá svůj nový význam z těch, kterým věnovalo nejvíc pozornosti. (Zde je to zjednodušená ilustrace mechanismu, ne skutečně natrénovaný model.) Each word sends a "query" (what am I looking for?) and each offers a "key" (what do I offer?). The model computes how well a query matches a key — the better, the more attention. The weights are normalized to add up to 100 %, and the word then assembles its new meaning from those it paid the most attention to. (This is a simplified illustration of the mechanism, not a truly trained model.)

Dotaz, klíč, hodnotaQuery, key, value

Q (co hledám), K (co nabízím), V (co předám). Pozornost = jak dotaz pasuje na klíče. To rozhoduje, kolik z které „hodnoty" si slovo vezme. Q (what I'm looking for), K (what I offer), V (what I pass on). Attention = how well the query fits the keys. That decides how much of each "value" a word takes.

Váhy dávají 100 %Weights add up to 100 %

Skóre se přes softmax převede na podíly, co dohromady tvoří jednotku. Slovo tak rozdělí svou pozornost mezi ostatní. Softmax turns the scores into proportions that together make one. So a word splits its attention among the others.

Kontext řeší víceznačnostContext resolves ambiguity

Slovo „okně" dává jiný smysl podle okolí. Díky pozornosti k sousedům model pochopí, o jaké okno jde — jazyk je o vztazích mezi slovy. The word "bank" means something different depending on its surroundings. Thanks to attention to its neighbors the model works out which bank is meant — language is about the relationships between words.

Proč je to průlomWhy it's a breakthrough

Attention umí spojit i slova daleko od sebe a počítat je naráz (paralelně). Na tom stojí transformery — architektura za ChatGPT a spol. Attention can link even far-apart words and process them all at once (in parallel). Transformers are built on this — the architecture behind ChatGPT and the like.

10Attention — na co se model díváAttention — what the model looks at

Co a jak?What & how?

Dotaz, klíč, hodnotaQuery, key, value

Váhy dávají 100 %Weights add up to 100 %

Kontext řeší víceznačnostContext resolves ambiguity

Proč je to průlomWhy it's a breakthrough