SWT(Stroke Width Transform)

SWT(Stroke Width Transform)

OpenCV 2021. 1. 4. 17:35
아래 링크의 자료를 참조했습니다.

iskim3068.tistory.com/category/Computer%20Vision

'Computer Vision' 카테고리의 글 목록

생각대로 살지 않으면, 사는대로 생각하게 된다.

iskim3068.tistory.com

SWT(Stroke Width Transform)은 영상처리에서 특정영역을 찾아내기 위해 사용하는 방법으로, 특히 문자를 찾아낼 때 많이 사용된다.

글자의 획(Stroke)의 Width가 같다고 생각하고, Stroke를 Width의 길이만큼의 픽셀값들로 채우는 것을 말한다.

해당이론은 아래 논문에서 자세한 원문을 확인할 수 있다.

Detecting Text in Natural Scenes with Stroke Width Transform. Boris Epshtein, Yonathan Wexler, and Eyal Ofek. IEEE International Conference on Computer Vision and Pattern Recognition, 2010.

SWT는 "글자에서 한 Stroke(한 획)의 Width가 항상 같거나 거의 일정할 것이다"라는 전제로 시작한다.

3.1 The Stroke Width Transform

1. SWT의 각 elements의 초기 값을 무한대로 설정한다.

2. 먼저 Canny Edge Detector를 이용해서 이미지 내의 Edge를 계산한다.

3. 각 dege pixel p의 gradient direction dp 가 계산된다.(Fig. 3b)

4. 만약 p 가 stroke boundary에 놓여있다면, dp는 반드시 stroke의 방향에 대하여 거의 직각이어야 한다.

5. 다른 edge pixel q 가 발견될 때까지 p를 시작점으로 ray r = p + n*dp, n>0 를 증가시킨다.

6. 픽셀 q 의 gradient direction을 dq 라고 했을때, 만약 dq 가 dp에 대해서 반대(dq = -dp±π/6) 라고 하면,

이미 더 낮은 값을 가지고 있지 않다면 [p,q]를 포함하는 segment의 각 element s 는 ||p - q||의 거리값으로 할당된다.

7. 반대로, 만약 matching pixel q가 발견되지 않거나, dq가 dp에 대해서 반대방향이 아닌경우에는, ray는 폐기된다.

(a) : SWT된 image (b) : edge pixel p 의 gradient 방향 (c) : edge pixel p 의 gradient 방향 과 edge pixel q 의 gradient 방향 이 일치 할 때 ray의 쏘고 그 안에 minimum value를 넣는다.

SWT의 핵심은

에지화소 p에서 그라디언트 방향으로 선을 그어 만나는 에지화소 q를 찾고, 두 에지화 소가 서로 마주 보는 그라디언트를 갖는 경우 그 선 (p, q)을 획을 가로지르는 선으로 보고 수집한다.

수집한 선이 지나간 자리에 선의 길이를 채워 넣음으로써 SW(stroke width) 이미지를 얻는다

p에서 q까지 가는데 거리가 6픽셀만큼이다. 그렇다면 그 안에 해당된느 6개의 픽셀에 6을 채운다. 출처: https://iskim3068.tistory.com/category/Computer Vision [ikfluencer]

왼쪽 그림에서의 빨간색 픽셀은 최소값을 채우면 5칸이니까 5가 채워지는데 오른쪽 그림의 빨간 박스에 속하는 픽셀들은 글자 전체의 stroke width의 최소값으로 채워지지 않는다. 이를 해결하기 위해, 위 과정을 first pass라 할때, first pass 후 남아 있는 ray 들을 다시 value를 채우는데 한 ray상의 모든 픽셀 value의 median SWT value를 계산하여 그 값을 넘어가는 값들은 median value로 채운다 그러니까, 빨간색 박스 안의 픽셀들의 SWT를 계산했을때 만약 그 행과 열의 픽셀들의 median value값보다 크면 median 값으로 채운다는 것이다. 출처: https://iskim3068.tistory.com/category/Computer Vision [ikfluencer]

3.2 Finding letter candidates

다음 스텝의 알고리즘은 이러한 pixel들을 letter candidates(글자 후보)의 group으로 만드는 것이다.

두개의 인접한 pixel들은 만약 비슷한 stroke width를 가지고 있다면, 함께 그룹으로 묶을 수 있다.

이를 위해서 고전적인 Connected Component Algorithm 을 변형하여 적용한다.

인접한 pixel의 SWT ratio가 3.0 이 넘지 않으면 동일한 그룹으로 포함시킨다. 이렇게 하므로써 폭이 점진적으로 변화하는 Stroke를 동일한 그룹으로 포함시킬 수 있게 된다.

1. 각 connected component의 stroke width의 변화량을 계산하고, 변화량이 너무 큰 것은 제거한다.

이렇게 함으로써 단풍과 같이 일상생활에서 text와 구분하기 힘든 이미지를 분리할 수 있다. (Fig.1(c))

2. 종횡비(aspect ratio)로 과도하게 비율이 맞지 않는 것을 제거한다.

3. 마지막으로 크기가 너무 작거나, 너쿠 큰 것은 무시한다.

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/1509.pdf

3.3 Grouping letters into text lines (여기는 현재 나에게는 중요한 부분이 아니라서, 원문으로)

Since single letters are not expected to appear in images, we will now attempt to group closely positioned letter candidates into regions of text. This filters out many falsely-identified letter candidates, and improves the reliability of the algorithm results.

When deciding to pair two letters together, we have 2 options: either both letters were not assigned a region yet, or one of them was already grouped with other letters. If both are unassigned, all we need to do is to declare a new region and assign them to it. Otherwise, we need to check if adding one letter to the region of the other is reasonable. In my implementation, a merge is reasonable if the pixel count of the letters in the region and the pixel count of the letter to add divided by the size of the bounding box of all the letters combined is not bellow some threshold. This will ensure the region of text will not have loose ends, and will form a �box� of text.

The implementation of the application contains several parts, as discussed in the previous section:
* The stroke width transform: edge detection and stroke width calculation.
* Removing stray lines from the SW map.
* Finding letter candidates: finding the connected components and detecting the components with the features of a letter.
* Grouping the letters into regions of text.

Python으로 구현된 코드는 아래 링크에서 참조할 수 있다.

github.com/mypetyak/StrokeWidthTransform/blob/master/swt.py
'OpenCV' 카테고리의 다른 글

niBlack threshold (0) 2021.02.19

OpenCV 관련 도움이 많이 되는 사이트 (2) 2020.06.10
관련글 관련글 더보기
- niBlack threshold
- OpenCV 관련 도움이 많이 되는 사이트
댓글

ABOUT ME

WindRevo WindRevo

3.1 The Stroke Width Transform

3.2 Finding letter candidates

3.3 Grouping letters into text lines (여기는 현재 나에게는 중요한 부분이 아니라서, 원문으로)

'OpenCV' 카테고리의 다른 글

티스토리툴바

niBlack threshold (0)	2021.02.19
OpenCV 관련 도움이 많이 되는 사이트 (2)	2020.06.10

ABOUT ME

3.1 The Stroke Width Transform

3.2 Finding letter candidates

3.3 Grouping letters into text lines (여기는 현재 나에게는 중요한 부분이 아니라서, 원문으로)

'OpenCV' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바