5.3 - Collective Classification : Belief Propagation
- Collective Classification : Belief Propagation

Lecture 5

Lecture 5.1 - Message passing and Node Classification

Lecture 5.2 - Relational and Iterative Classification

Lecture 5.3 - Collective Classification : Belief Propagation

5.3 - Collective Classification : Belief Propagation

Collective Classification : Belief Propagation

Collective Classification Models

Relational classifiers
Iterative classification
Loopy belief propagation

관계 분류자
반복구분
루피 신앙 전파

💬 먼저 지금까지 배운 내용으로는 각 노드는 이웃 노드의 확률의 가중평균을 자신의 새로운 확률로 삼고 있다. 혹은 각 노드는 이웃 노드의 레이블을 활용해 자신의 새로운 확률을 계산하게 된다. 즉, 이웃노드의 정보가 각각의 노드에서 사용되고 있는 것이다. 이것을 각 노드는 이웃노드에게 Belief를 전달받는다고 할 수 있다. 즉 이웃 노드의 belief를 받아 자신의 belief를 생성한다. 다르게 말하면, 모델은 각 노드에 대해 데이터마다 belief를 가지고 있고, 이웃 노드의 belief를 이용해 각 노드의 belief를 업데이트하고 있다. 그렇다면 왜 굳이 바로 이웃노드에서만 belief를 받아야 할까. 좀 더 먼 노드의 belief도 중요하게 작동하지 않을까? 왜냐하면 결국 이터레이션을 반복하여 이웃노드의 belief를 받게 된다면, 해당 belief는 이웃노드의 이웃노드의 belief가 섞여있는 상태기 때문에, 이터레이션을 반복한다는 것은 자신의 이웃노드의 이웃노드의 이웃노드의 .... belief를 받고 있는 것이기 때문이다. 이를 역으로 생각하여 belief가 그래프에 직접 흐르도록 알고리즘을 구성한 것이 loopy belief propagation이 된다.

Loopy Belief Propagation

Belief Propagation is a dynamic programming approach to answering probability queries in a graph (e.g. probability of node $v$ belonging to class 1 )
Iterative process in which neighbor nodes “talk” to each other, passing messages

📌 "I (node v) believe you (node u) belong to class 1 with likelihood ..."
When consensus is reached, calculate final belief

믿음 전파는 그래프에서 확률 쿼리에 응답하는 동적 프로그래밍 접근법이다(예: 클래스 1에 속하는 노드 $v$의 확률).
인접 노드가 메시지를 전달하면서 서로 “대화”하는 반복 프로세스

📌 "나(노드 v)는 (노드 u)가 클래스 1에 속한다고 믿고 있습니다.”
합의가 이루어지면 최종 믿음을 계산합니다.

Message Passing: Basics

Task: Count the number of nodes in a graph*
Condition: Each node can only interact (pass message) with its neighbors
Example: path graph

과제: 그래프의 노드 수 계산*
조건: 각 노드는 인접 노드와만 상호 작용(메시지 전달)할 수 있습니다.
예: 경로 그래프

Potential issues when the graph contains cycles.

We’ll get back to it later!

그래프에 주기가 포함되어 있을 때 발생할 수 있는 문제. 나중에 다시 얘기하자!

Message Passing: Algorithm

Task: Count the number of nodes in a graph
Algorithm:
- Define an ordering of nodes (that results in a path)
- Edge directions are according to order of nodes
  - Edge direction defines the order of message passing
- For node $i$ from 1 to 6
  - Compute the message from node $i$ to $i+1$ (number of nodes counted so far)
  - Pass the message from node $i$ to $i+1$

과제: 그래프의 노드 수 계산
알고리즘:
- 노드 순서 정의(경로 생성)
- 에지 방향은 노드 순서에 따릅니다.
  - 에지 방향은 메시지 전달 순서를 정의합니다.
- 노드 $i$의 경우 1 ~ 6까지
  - 노드 $i$에서 $i+1$로 메시지 계산 (지금까지 카운트된 노드 수)
  - 노드 $i$에서 $i+1$로 메시지 전달

Message Passing: Basics

Task: Count the number of nodes in a graph Condition: Each node can only interact (pass message) with its neighbors Solution: Each node listens to the message from its neighbor, updates it, and passes it forward $m$ : the message

작업: 그래프의 노드 수 계산 조건: 각 노드는 인접 노드와만 상호 작용(메시지 전달)할 수 있습니다. 솔루션: 각 노드는 인접 노드로부터 메시지를 수신하고 업데이트한 후 전달 $m$ : 메시지

Generalizing to a Tree

We can perform message passing not only on a path graph, but also on a tree-structured graph
Define order of message passing from leaves to root

경로 그래프뿐만 아니라 트리 구조 그래프에서도 메시지 전달을 수행할 수 있습니다.
리프에서 루트로 전달되는 메시지 순서 정의

Message passing in a tree

Update beliefs in tree structure

트리 구조의 신뢰 업데이트

Loopy BP Algorithm

What message will $i$ send to $j$ ?

It depends on what $i$ hears from its neighbors
Each neighbor passes a message to $i$ its beliefs of the state of $i$

$i$가 $j$에 보낼 메시지는 무엇입니까?

$i$가 이웃으로부터 무엇을 듣느냐에 따라 달라진다.
각 이웃은 $i$ 상태에 대한 믿음을 $i$에 전달한다.

Notation

Label-label potential matrix $\psi$ : Dependency between a node and its neighbor. $\boldsymbol{\psi}\left(Y_{i}, Y_{j}\right)$ is proportional to the probability of a node $j$ being in class $Y_{j}$ given that it has neighbor $i$ in class $Y_{i}$.
Prior belief $\phi: \phi\left(Y_{i}\right)$ is proportional to the probability of node $i$ being in class $Y_{i}$.
$m_{i \rightarrow j}\left(Y_{j}\right)$ is $i^{\prime}$ s message / estimate of $j$ being in class $Y_{j}$.
$\mathcal{L}$ is the set of all classes/labels

레이블 레이블 잠재적 매트릭스 $\psi$ : 노드와 인접 노드 간의 종속성. $\boldsymbol{\psi}\left(Y_{i}, Y_{j}\right)$는 노드 $j$가 클래스 $Y_{j}$에 있을 확률에 비례한다.
$\phi: \phi\left(Y_{i}\right)$는 노드 $i$가 클래스 $Y_{i}$에 포함될 확률에 비례한다.
$m_{i \rightarrow j}\left(Y_{j}\right)$는 클래스 $Y_{j}$에 속하는 $j$의 $i^{\prime}$ 메시지/추정이다.
$\mathcal{L}$ 는 모든 클래스/라벨의 집합입니다.

💬 **Notation** - $\psi$ (Label-Label Potential Matrix) : $\psi$ 는 각 노드가 이웃노드의 클래스에 대한 영향력(비례)을 행렬로 표현한 것이다. - 예를 들어 $\psi\left(Y_{i}, Y_{j}\right)$ 는 이웃 노드 i의 레이블이 $Y_{i}$ 일 때, 노드 $\mathrm{j}$ 가 $Y_{j}$ 레이블에 속할 확률의 비중이다. - 만약 $i$와 $j$가 Homophily가 존재한다면(같은 class를 가진다면) 대각원소들의 크기는 높을것이다. - 또한 이 행렬을 얻기위해서는 학습이 필요하다. - $\phi$ (Prior Belief) : 노드 $i$가 $Y_{i}$ 에 속할 확률에 비례한다. - $m_{i \rightarrow j}\left(Y_{j}\right)$ : $i$의 메세지가 $j$로 전달되는 것을 의미하는데, $i$가 이웃 노드로 부터 받은 belief와 자신의 정보를 종합해 $j$의 레이블을 believe하는 것을 의미한다. - $j$의 노드를 예측할 수 있도록 $i$에서 $j$로 전달하는 메시지이다. - $L:$ 모든 레이블(클래스)을 포함하는 집합 - $b_{i}\left(Y_{i}\right)$ : 노드 i의 클래스가 $Y_{i}$ 일 belief

Loopy BP Algorithm

Initialize all messages to 1
Repeat for each node:

모든 메시지를 1로 초기화
각 노드에 대해 반복합니다.

After convergence: $b_{i}\left(Y_{i}\right)=$ node $i$ ‘s belief of being in class $Y_{i}$

수렴 후: $b_{i}\left(Y_{i}\right)=$ 노드 $i$의 클래스 $Y_{i}$에 대한 믿음

1. 가장 처음에는 모든 노드의 메세지를 1로 초기화한다. 2. 이후 가운데 이미지와 같이 모든 노드에 대해 다음 노드로 메세지를 전달하는 과정을 반복한다. 이때 가운데 이미지의 수식을 설명해보자면, 가장 앞의 분홍색 부분은 현재 노드 $i$의 모든 레이블의 가능성에 대해 반복하여 더한다는 의미이다. 녹색 부분은 label-label potential로서, $i$노드의 각 레이블마다 $j$노드가 $Y_{j}$ 레이블을 가질 확률을 계산하게 된다. 적색 부분은 Prior로서 $i$노드가 $Y_{i}$ 레이블을 가질 확률을 계산하게 된다. 청색 부분은 $i$ 노드가 메세지를 넘겨받는 이웃 노드에서 $i$ 노드가 $Y_{i}$ 레이블일 belief를 넘겨 받는 부분이다. 만약 위의 과정이 충분히 반복되어 수렴한다면 세번째 이미지에 해당하는 실제 확률 $[b_{i}\left(Y_{i}\right)]$이 계산되게 된다. 1. 즉, Prior 확률에 belief를 모두 곱하여 최종적인 belief ($[b_{i}\left(Y_{i}\right)]$) 를 결정한다. 💡 Q: 수렴 하는 것이 무엇인지? 무엇이 수렴하는 것인지 상황에 대한 질문

Example: Loopy Belief Propagation

Now we consider a graph with cycles
There is no longer an ordering of nodes
We apply the same algorithm as in previous slides:
- Start from arbitrary nodes
- Follow the edges to update the neighboring nodes

What if our graph has cycles? Messages from different subgraphs are no longer independent! But we can still run BP, but it will pass messages in loops.

이제 주기가 있는 그래프를 살펴봅시다.
더 이상 노드 순서가 없습니다.
이전 슬라이드와 동일한 알고리즘을 적용합니다.
- 임의 노드에서 시작
- 가장자리를 따라 인접 노드를 업데이트합니다.

만약 우리 그래프에 주기가 있다면? 다른 하위 그래프의 메시지는 더 이상 독립적이지 않습니다! 하지만 BP는 여전히 실행할 수 있지만 메시지를 루프 형태로 전달합니다.

💬 지금까지 이야기한 그래프들은 순환하는 구조를 가지고 있지 않아 메세지를 전달할 순서를 정하는데 문제가 없었다. 하지만 순환하는 구조를 가지는 그래프의 경우에는 단순하게 노드의 순서를 정해서 메세지를 전달하도록 만들 수 없다 그에 대해 자세히 살펴보자. 만약 위와 같은 그래프가 있고, 위와 같은 순서로 메세지를 주고 받는다고 생각해보자. $u$ 노드는 $k$에게 메세지를 받는 것처럼 보이지만, 실제로는 자기 자신의 메세지마저 받고 있는 상황이다. 즉, 더 이상 모든 노드가 독립적이지 않고, 의존성이 생긴다. 순서가 반대로 트리와 같이 $j$가 $i, k$로 메세지를 전달하고, $i, k$가 $u$로 메세지를 전달한다면, $j$의 메세지는 $u$에게 중복되어 두 번 전달되는 문제가 생긴다. 이렇게 되면 알고리즘이 크게 문제가 생기는 것 같지만, 실제 적용해보니 그렇지 않다고 한다. 실제 그래프들은 무척 크고, 거기에 순환하는 cycle 구조는 그렇게 큰 부분을 차지하지 않는데 반면, 전체 구조는 매우 복잡하기 때문에 Loopy BP 알고리즘이 잘 작동한다고 한다.

What Can Go Wrong?

Beliefs may not converge
Message $m_{u \rightarrow i}\left(Y_{i}\right)$ is based on initial belief of $i$, not a separate evidence for $i$
The initial belief of $i$ (which could be incorrect) is reinforced by the cycle

i \rightarrow j \rightarrow k \rightarrow u \rightarrow i

However, in practice, Loopy BP is still a good heuristic for complex graphs which contain many branches.

신념이 수렴되지 않을 수 있다.
메시지 $m_{u \rightarrow i}\left(Y_{i}\right)$는 $i$에 대한 별도의 증거가 아니라 $i$의 초기 믿음에 기초한다.
(잘못될 수 있음) $i$의 초기 신념은 주기에 의해 강화된다.

i \rightarrow j \rightarrow k \rightarrow u \rightarrow i

그러나 실제로 Loopy BP는 많은 분기를 포함하는 복잡한 그래프에 여전히 좋은 휴리스틱이다.

Messages loop around and around: $2,4,8,16,32, \ldots$ More and more convinced that these variables are $T$ !
BP incorrectly treats this message as separate evidence that the variable is $\mathrm{T}$!.
Multiplies these two messages as if they were independent.
- But they don’t actually come from independent parts of the graph.
- One influenced the other (via a cycle).

메시지는 돌고 돈다: $2,4,8,16,32,\ldots$ 이러한 변수가 $T$임을 점점 더 확신하게 된다!
BP는 이 메시지를 변수가 $\mathrm{T}$라는 별도의 증거로 잘못 취급한다.
이 두 메시지를 독립한 것처럼 곱합니다.
- 하지만 그것들은 사실 그래프의 독립적인 부분에서 나온 것이 아닙니다.
- 한 사람이 다른 사람에게 영향을 주었다.

This is an extreme example. Often in practice, the cyclic influences are weak. (As cycles are long or include at least one weak correlation.)

이것은 극단적인 예입니다. 실제로, 주기적인 영향은 약하다. (주기가 길거나 하나 이상의 약한 상관 관계를 포함하기 때문에)

Advantages of Belief Propagation

Advantages:
- Easy to program & parallelize
- General: can apply to any graph model with any form of potentials
  - Potential can be higher order: e.g. $\boldsymbol{\psi}\left(Y_{i}, Y_{j}, Y_{k}, Y_{v} \ldots\right)$
Challenges:
- Convergence is not guaranteed (when to stop), especially if many closed loops
Potential functions (parameters)
- Require training to estimate

장점:
- 프로그래밍 및 병렬화가 용이함
- 일반: 모든 형태의 잠재력이 있는 그래프 모델에 적용할 수 있습니다.
  - 잠재력은 고차일 수 있다. 예를 들어 $\boldsymbol{\psi}\left(Y_{i}, Y_{j}, Y_{k}, Y_{v} \ldots\right)$
과제:
- 특히 닫힌 루프가 많은 경우 수렴이 보장되지 않습니다(정지 시기).
잠재적 함수(모수)
- 평가하려면 교육 필요

Summary

We learned how to leverage correlation in graphs to make prediction on nodes
Key techniques:
- Relational classification
- Iterative classification
- Loopy belief propagation

우리는 그래프의 상관 관계를 활용하여 노드에 대한 예측을 하는 방법을 배웠다.
주요 기술:
- 관계구분
- 반복구분
- 루피 신앙 전파

CS224W: Machine Learning with Graphs 2021 Lecture 5.3 - Collective Classification