5.1 - Message passing and Node Classification
- Message Passing and Node Classification
- How do we leverage node correlations in networks?

Lecture 5

Lecture 5.1 - Message passing and Node Classification

Lecture 5.2 - Relational and Iterative Classification

Lecture 5.3 - Collective Classification : Belief Propagation

5.1 - Message passing and Node Classification

Message Passing and Node Classification

Today’s Lecture: outline

Main question today: Given a network with labels on some nodes, how do we assign labels to all other nodes in the network? Example: In a network, some nodes are fraudsters, and some other nodes are fully trusted. How do you find the other fraudsters and trustworthy nodes? We already discussed node embeddings as a method to solve this in Lecture 3

오늘 주요 질문: 일부 노드에 레이블이 있는 네트워크에서 네트워크의 다른 모든 노드에 레이블을 할당하려면 어떻게 해야 합니까? 예: 네트워크에서 일부 노드는 사기꾼이고 다른 일부 노드는 완전히 신뢰됩니다. 다른 사기꾼과 신뢰할 수 있는 노드를 어떻게 생각하십니까? 우리는 이것을 해결하기 위한 방법으로 노드 임베딩을 이미 강의 3에서 논의하였습니다

Example: Node Classification

Given labels of some nodes Let’s predict labels of unlabeled nodes This is called semi-supervised node classification

일부 노드의 지정된 레이블 레이블이 없는 노드의 레이블을 예측해 봅시다. 이를 준지도 노드 분류라고 합니다.

Today’s Lecture: outline

Main question today: Given a network with labels on some nodes, how do we assign labels to all other nodes in the network?
Today we will discuss an alternative framework: Message passing
Intuition: Correlations (dependencies) exist in networks.
- In other words: Similar nodes are connected.
- Key concept is collective classification: Idea of assigning labels to all nodes in a network together.
We will look at three techniques today:
- Relational classification
- Iterative classification
- Correct & Smooth

오늘 주요 질문: 일부 노드에 레이블이 있는 네트워크에서 네트워크의 다른 모든 노드에 레이블을 할당하려면 어떻게 해야 합니까?
오늘은 대체 프레임워크에 대해 논의하겠습니다: 메시지 전달
직관: 상관 관계(의존성) 네트워크에 존재하다
- 즉, 유사한 노드가 연결되어 있습니다.
- 핵심 개념은 집단 분류: 네트워크의 모든 노드에 label을 함께 할당하는 아이디어입니다.
오늘은 세 가지 기법을 살펴보겠습니다.
- 관계구분
- 반복구분
- 정확하고 매끄러운

Correlations Exist in Networks

Behaviors of nodes are correlated across the links of the network
Correlation: Nearby nodes have the same color (belonging to the same class)

노드의 동작은 네트워크 링크 전체에서 상관됨
상관: 근처 노드의 색상은 동일합니다(동일한 클래스에 속함).

Two explanations for why behaviors of nodes in networks are correlated:
네트워크에서 노드의 동작이 상관되는 이유에 대한 두 가지 설명은 다음과 같다.

Homophily: The tendency of individuals to associate and bond with similar others
“Birds of a feather flock together”
It has been observed in a vast array of network studies, based on a variety of attributes (e.g., age, gender, organizational role, etc.)
Example: Researchers who focus on the same research area are more likely to establish a connection (meeting at conferences, interacting in academic talks, etc.)

동성애: 개인이 유사한 타인과 연관되고 유대감을 갖는 경향
“깃털 같은 새들”
다양한 속성(예: 연령, 성별, 조직 역할 등)을 기반으로 한 광범위한 네트워크 연구에서 관찰되었다.
예시: 동일한 연구 분야에 집중하는 연구자는 (학회에서의 회의, 학술 강연에서의 상호작용 등) 연결을 확립할 가능성이 높다.

Homophily: Example

Example of homophily

Online social network
- Nodes = people
- Edges = friendship
- Node color = interests (sports, arts, etc.)
People with the same interest are more closely connected due to homophily

동음이의 예

온라인 소셜 네트워크
- 노드 = 사람
- 가장자리 = 우정
- 노드 색상 = 관심 사항(스포츠, 예술 등)
동종애로 인해 같은 관심사를 가진 사람들이 더 밀접하게 연결되어 있다.

💬 개인의 특징은 나이, 성별, 직업, 취미, 거주지 등 다양한 요소가 있을 것이다. Homophily는 개인들이 비슷한 특징을 가지는 타인들과 서로 연결되고, 함께 행동하려고 한다는 개념이다. 예를 들어, 머신러닝 연구자들은 비슷한 학회를 동시에 참여하고, 비슷한 커뮤니티를 읽고 토론하게 되면서 자연스레 친분을 쌓게 된다. 또한, 가수들은 서로 같이 공연하고, 서로의 앨범을 들으면서 서로 사회적으로 연결되게 된다. 위의 그래프는 한 학교의 학생들을 나타낸 그래프인데, 학생 개인이 노드, 친분이 엣지로 표현되어 있다. 이때 노드의 레이블인 색은 각 학생의 관심사로 운동, 예술 등이 있다. 직관적으로 살펴보아도 알 수 있지만 총 4개의 작은 그룹으로 나누어질 수 있으며, 각 그룹은 비슷한 관심사를 가지는 학생들이 모여있는 것을 알 수 있다. 이를 Homophily라고 할 수 있을 것이다.

Influence: Social connections can influence the individual characteristics of a person.
- Example: I recommend my musical preferences to my friends, until one of them grows to like my same favorite genres!
영향: 사회적 관계는 개인의 특성에 영향을 미칠 수 있다.
- 예시: 친구 중 한 명이 제가 좋아하는 장르를 좋아하게 될 때까지 친구들에게 제 음악적 취향을 추천합니다!

How do we leverage node correlations in networks?

Classification with Network Data

How do we leverage this correlation observed in networks to help predict node labels?
네트워크에서 관찰된 이 상관 관계를 활용하여 노드 레이블을 예측하는 방법은 무엇입니까?

How do we predict the labels for the nodes in grey?

노드의 레이블을 회색으로 어떻게 예측합니까?

Motivation

Similar nodes are typically close together or directly connected in the network:
- Guilt-by-association: If I am connected to a node with label 𝑋, then I am likely to have label 𝑋 as well.
- Example: Malicious/benign web page: Malicious web pages link to one another to increase visibility, look credible, and rank higher in search engines
Classification label of a node 𝑣 in network may depend on:
- Features of 𝑣
- Labels of the nodes in 𝑣’s neighborhood
- Features of the nodes in 𝑣’s neighborhood

유사한 노드는 일반적으로 네트워크에서 서로 가까이 있거나 직접 연결됩니다.
- 연관별 죄책감: 레이블 𝑋이 있는 노드에 연결되어 있다면 레이블 𝑋도 있을 수 있습니다.
- 예: 악성/악성 웹 페이지: 악성 웹 페이지는 가시성을 높이고 신뢰도를 높이며 검색 엔진에서 더 높은 순위를 차지하기 위해 서로 연결됩니다.
네트워크에 있는 노드 $v$의 분류 라벨은 다음 조건에 따라 달라질 수 있습니다.
- $v$의 기능
- $v$의 이웃에 있는 노드의 레이블
- $v$의 이웃에 있는 노드의 기능

💬 한 그래프 내에서 비슷한 노드는 가까이 위치하거나 직접 연결되어 있을 것이다. 이를 Guilt-by-association이라고 하는데, 노드 b가 아직 레이블이 없는 상태에서, 이웃노드 x가 1로 레이블 되어 있다면, 이웃노드 x와 가깝기 때문에 노드 b 역시 1로 레이블 될 가능성이 높다는 개념이다. 구체적인 예시로는 스팸 사이트들이 안전한 사이트와의 연결고리는 생성할 수 없기 때문에, 노출도와 신뢰도를 높이기 위해 서로 링크를 연결하는 경향이 있는데, 이를 이용해서 스팸 사이트 하나를 잡을 수 있다면, 서로 연결된 다른 스팸 사이트도 색출 할 수 있다고 한다. 이때, 노드 v의 분류에 이용하는 정보들은 다음과 같다. 1. 노드 v의 변수들 2. 노드 v의 이웃 노드들의 레이블 3. 노드 v의 이웃 노드들의 변수들

Semi-supervised Learning

Formal setting:

Given:

Graph
Few labeled nodes

Find: Class (red/green) of remaining nodes

Main assumption: There is homophily in the network

공식 설정:

제공됨:

그래프
레이블이 지정된 노드 몇 개

찾기: 나머지 노드의 클래스(빨간색/녹색)

주요 가정: 네트워크에 동질성이 있습니다.

Example task:

Let 𝑨 be a 𝑛×𝑛 adjacency matrix over 𝑛 nodes
Let Y = $[0,1]^n$ be a vector of labels:
- $Y_v$ = 1 belongs to Class1
- $Y_v$ = 0 belongs to Class0
- There are unlabeled node needs to be classified
Goal: Predict which unlabeled nodes are likely Class 1, and which are likely Class 0

작업 예:

A를 n개 노드의 nxn 인접 행렬로 설정
Y = $[0,1]^n$ 을 레이블 벡터라고 하자:
- $Y_v$ = 1이 클래스 1에 속함
- $Y_v$ = 0이 클래스 0에 속함
- 라벨이 지정되지 않은 노드가 분류되어야 합니다
목표: 레이블이 없는 노드가 클래스 1일 가능성이 높고 클래스 0일 가능성이 높은 노드를 예측합니다.

Problem Setting

How to predict the labels $Y_v$ for the unlabeled nodes $v$ (in grey color)? Each node $v$ has a feature vector $f_v$ Labels for some nodes are given (1 for green, 0 for red) Task: Find $P(Y_v)$ given all features and the network 라벨이 없는 노드 $v$(회색)에 대한 레이블 $Y_v$를 예측하는 방법 각 노드 $v$에는 특징 벡터 $f_v$가 있습니다. 일부 노드의 라벨이 제공됩니다(녹색은 1, 빨간색은 0). 작업: 모든 기능 및 네트워크가 주어진 $P(Y_v)$ 찾기

P(Y_v)=?

Example applications:

Many applications under this setting:
- Document classification
- Part of speech tagging
- Link prediction
- Optical character recognition
- Image/3D data segmentation
- Entity resolution in sensor networks
- Spam and fraud detection

이 설정 아래의 많은 응용 프로그램:
- 문서구분
- 음성 태그의 일부
- 링크 예측
- 광학식 문자 인식
- 영상/3D 데이터 분할
- 센서 네트워크의 엔티티 해상도
- 스팸 및 부정 행위 탐지

Collective Classification Overview (1)

collective classification을 전반적으로 살펴보면 다음과 같다.

이때 1차 마르코프 연쇄를 사용한다. 즉, 노드 $\mathrm{v}$ 의 레이블 $Y_{v}$ 를 예측하기 위해서는 이웃노드 $N_{v}$ 만 필요하다는 것이다. 2차 마르코프 연쇄를 사용할 경우 $N_{v}$ 의 이웃노드 역시 사용할 것이다. 1차 마르포크 연쇄를 사용할 경우 식은 다음과 같아질 것이다.

P\left(Y_{v}\right)=P\left(Y_{v} \mid N_{v}\right)

collective classification은 하나의 모델을 이용하거나 기존의 분류모델처럼 한번의 과정으로 구성되지 않고 총 세가지 과정으로 구성된다.

Collective Classification Overview (2)

Local Classifier

최초로 레이블을 할당하기 위해 사용되는 분류기이다. 즉, 그래프에서 레이블이 없는 노드들에 대해 우선 노드를 생성해야 하기 때문에, 기존의 분류문제와 동일하게 구성된다. 이때 예측 과정은 각 노드의 변수만 사용하여 이미 레이블이 있는 노드로 학습하고, 레이블이 없는 노드로 예측하게 된다. 그래프의 구조적 정보가 사용되지 않는다는 점에 유의하자.

Relational Classifier

노드 간 상관관계를 파악하기 위해 이웃 노드의 레이블과 변수를 사용하는 분류기이다. 이를 통해 이웃 노드의 레이블과 변수와 현재 노드의 변수를 이용해 현재 노드의 레이블을 예측할 수 있다. 이때, 이웃노드의 정보가 사용되기 때문에, 그래프의 구조적 정보가 사용된다.

Collective Inference

Collective Classification은 한번의 예측으로 종료되지 않는 것이 핵심이다. 특정 조건을 만족할 때까지 각 노드에 대해 분류하고 레이블을 업데이트한다. 이때의 조건이란 더이상 레이블이 변하지 않거나, 정해진 횟수를 의미한다. 이때 동일한 변수를 가진 노드라 하더라도 그래프의 구조에 따라 최종 예측이 달라질 수 있다는 점을 유념하자.

Overview of What is Coming

We focus on semi-supervised binary node classification
We will introduce three approaches:
- Relational classification
- Iterative classification
- Correct & Smooth

우리는 준지도 이진 노드 분류에 초점을 맞춘다.
다음 세 가지 접근 방식을 소개합니다.
- 관계구분
- 반복구분
- 정확하고 매끄러운

CS224W: Machine Learning with Graphs 2021 Lecture 5.1 - Message passing and Node Classification

5.1 - Message passing and Node Classification

Message Passing and Node Classification

Today’s Lecture: outline

Example: Node Classification

Today’s Lecture: outline

Correlations Exist in Networks

Social Homophily

Homophily: Example

Social Influence: Example

How do we leverage node correlations in networks?

Classification with Network Data

Motivation

Semi-supervised Learning

Problem Setting

Example applications:

Collective Classification Overview (1)

Collective Classification Overview (2)

Overview of What is Coming