"Ollama with granite3.2-vision is excellent for OCR and for processing text afterwards"
"Granite3.2-vision이 포함된 Ollama는 OCR 및 이후 텍스트 처리에 매우 적합합니다."
최근 Reddit에 올라온 Ollama granite3.2-vision 모델에 대한 글을 보고 granite 모델로 신분증에 대해서 OCR을 하면 얼마나 성능이나 정확성이 나올지 궁금해서 테스트를 해봤습니다.
결과를 말씀드리면 속도, 성능, 정확성이 매우 뛰어납니다. 몇가지 신분증을 샘플로 해보도록 하겠습니다.
granite3.2-vision
A compact and efficient vision-language model, specifically designed for visual document understanding,
enabling automated content extraction from tables, charts, infographics, plots, diagrams, and more.

A compact and efficient vision-language model, specifically designed for visual document understanding, enabling automated content extraction from tables, charts, infographics, plots, diagrams, and more. The model was trained on a meticulously curated instruction-following dataset, comprising diverse public datasets and synthetic datasets tailored to support a wide range of document understanding and general image tasks. It was trained by fine-tuning a Granite large language model with both image and text modalities.
컴팩트하고 효율적인 vision-language 모델로, 특히 시각적 문서 이해를 위해 설계되었습니다. 이 모델은 표, 차트, 인포그래픽, 그래프, 다이어그램 등에서 자동으로 콘텐츠를 추출할 수 있도록 최적화되었습니다. 다양한 공개 데이터셋과 문서 이해 및 일반 이미지 작업을 지원하는 합성 데이터셋으로 구성된 세밀하게 선별된 instruction-following 데이터셋을 활용해 학습되었습니다. 이미지와 텍스트 모달리티를 모두 포함하여 Granite 대형 언어 모델을 미세 조정(fine-tuning)하는 방식으로 훈련되었습니다.
참고
모델사이즈는 2B로 다운로드 받으면 2.4GB로 llama3.2-vision의 7.9GB에 비해서도 작은 사이즈입니다.
% ollama list
NAME ID SIZE MODIFIED
granite3.2-vision:latest 3be41a661804 2.4 GB About a minute ago
llama3.2-vision:latest 085a1fdae525 7.9 GB 5 days ago
Jupyter 노트북으로 샘플을 만들어보겠습니다.
사용한 샘플 사진은 구글에서 검색한 필리핀 Driver license 샘플이미지입니다.

테스트 이미지
import ollama
import time
from IPython.display import Image, Markdown, display
def Talk_with_granite_3_2_vision(user_query, input_image):
start_time = time.time() # Start timer
response = ollama.chat(
model='granite3.2-vision:latest',
messages=[{
'role': 'user',
'content': user_query,
'images': [input_image]
}]
)
end_time = time.time() # End timer
# Calculate elapsed time
execution_time = end_time - start_time
#print(f"Response: {response['message']['content']}")
print(f"Response: {response}")
print(f"Execution time: {execution_time:.2f} seconds")
return response
display(Image(filename="driver_license.jpeg"))

image display
Talk_with_granite_3_2_vision('Describe the content of the given image','driver_license.jpeg')
문서 안에 Text를 비롯해서 문의의 설명까지 상세하게 가져온걸 확인할수있습니다.
Response: model='granite3.2-vision:latest' created_at='2025-03-15T01:30:57.858356Z' done=True done_reason='stop' total_duration=80162497333 load_duration=46421333 prompt_eval_count=5157 prompt_eval_duration=972000000 eval_count=808 eval_duration=79134000000 message=Message(role='assistant', content='\nThe image depicts a Philippine driver\'s license issued by the Department of Transportation Land Transportation Office (DOTLTO). This document is specifically labeled as a "Non-Professional Driver’s License." The front cover of the license features several key elements:\n\n1. **Country and Issuing Authority**: At the top, there is a flag of the Philippines, indicating that this is an official government document from the Philippines. Below the flag, it reads "REPUBLIC OF THE PHILIPPINES" followed by "DEPARTMENT OF TRANSPORTATION LAND TRANSPORTATION OFFICE."\n\n2. **License Type**: The title of the document is prominently displayed as "NON-PROFESSIONAL DRIVER’S LICENSE," which specifies that this license is intended for individuals who are not professionally licensed to drive but rather hold a provisional or temporary driver\'s license.\n\n3. **Personal Information**:\n - **Last Name**: DELA CRUZ\n - **First Name**: JUAN PEDRO GARCIA\n - **Nationality**: PHIL\n - **Sex**: M (Male)\n - **Date of Birth**: 1987/10/04\n - **Height (in meters)**: 1.55\n - **Weight (in kilograms)**: 70\n\n4. **Identification Number**: The license number is "N03-12-123456."\n\n5. **Address**: The address listed on the license is "AUTODEAL UNIT/HOUSE NO. BUILDING, STREET NAME: BARANGAY, CITY/MUNICIPALITY," which indicates that the holder of this license resides in Barangay, City/Municipality.\n\n6. **License Expiration Date**: The expiration date is "2022/10/04."\n\n7. **Restrictions and Conditions**: There are two restrictions listed:\n - Restrictions 1.2\n - Conditions NONE\n\n8. **Signature of Licensee**: At the bottom, there is a signature of the licensee, which reads "EDGAR C. SALVANTE," followed by his designation as "Assistant Secretary."\n\n9. **Seal and Logo**: The top right corner features a seal with the text "DOTLTO" inside it, indicating that this document is an official government-issued license.\n\n### Analysis:\nThis non-professional driver\'s license serves as proof of identity for individuals who are not professionally licensed to drive but hold a provisional or temporary license. The details provided on the license include personal information such as name, date of birth, height, weight, and address, which are essential for identification purposes. The expiration date ensures that the license remains valid for a specified period, after which it must be renewed or replaced.\n\nThe restrictions listed (Restrictions 1.2) and conditions (NONE) indicate specific rules or regulations that the holder of this license must adhere to. These could include age limits, vehicle type restrictions, or other legal requirements pertinent to driving in the Philippines. The signature of the Assistant Secretary verifies the authenticity of the document and confirms that it was issued by a legitimate authority within the Department of Transportation Land Transportation Office.\n\n### Conclusion:\nThis non-professional driver\'s license is an essential document for individuals in the Philippines who are not professionally licensed to drive but hold a provisional or temporary license. It contains all necessary personal and identification details, as well as restrictions and conditions that must be followed by the holder. The signature of the Assistant Secretary adds an additional layer of authenticity to the document.', images=None, tool_calls=None)
Execution time: 80.19 seconds

asitop 실행모니터링
그럼 사용자 쿼리를 조정해서 OCR로 추출한 text들을 필요한 항목들만 가져도록 조정해보겠습니다.
신분증 이미지를 보면 아래 항목들이 포함되어 있습니다. 제가 필요한것은 각 항목에 맞는 데이터를 추출하는것입니다
- License Type
- Last Name, First Name, Middle Name
- Nationality
- Sex
- Date of Birth
- Weight (in kg)
- Height (in m)
- Address
- License Number
- Expiration Date
- Agency Code
- Blood Type
- Eyes Color
- Restrictions
- Conditions
먼저 user_query 를 다음과 같이 지정해서 실행해보게습니다. 'OCR the text of the image. What is license type?'
user_query = 'OCR the text of the image. What is license type?'
response = Talk_with_granite_3_2_vision(user_query,'driver_license.jpeg')
Response: model='granite3.2-vision:latest' created_at='2025-03-15T01:51:00.248422Z' done=True done_reason='stop' total_duration=72551407958 load_duration=1374949917 prompt_eval_count=5162 prompt_eval_duration=66223000000 eval_count=54 eval_duration=4948000000 message=Message(role='assistant', content='\nThe license type indicated on the driver\'s license in the image is a "NON-PROFESSIONAL DRIVER\'S LICENSE." This is clearly stated at the top of the document, just below the Philippine flag and above the personal details section.', images=None, tool_calls=None)
Execution time: 72.57 seconds
응답으로 "NON-PROFESSIONAL DRIVER\'S LICENSE." 라고 알려줍니다.
출력결과를 아래 원본이미지와 비교해보면서 필요한 필요한 항목들을 OCR한 결과와 매핑해보면 다음과 같습니다.
"License": "NON-PROFESSIONAL DRIVER'S LICENSE",
"name": "DELA CRUZ, JUAN PEDRO GARCIA",
"sex": "M",
"dateOfBirth": "1987/10/04",
"weight": "70",
"height": "1.55",
"address": "AUTODEAL UNIT/HOUSE NO. BUILDING, STREET NAME, BARANGAY, CITY/MUNICIPALITY",
"nationality": "PHL",
"licenseNumber": "N03-12-123456",
"expirationDate": "2022/10/04",
"agencyCode": "N32",
"bloodType": "O+",
"eyesColor": "BLACK",
"Restrictions": "1,2",
"conditions": "NONE"

와우 역시 좋네요.!!

'AI > Multimodal' 카테고리의 다른 글
| [eKYC] Colab LLM & OCR with granite3.2-vision (0) | 2025.02.24 |
|---|---|
| Granite Vision 3.2 2B: 멀티모달로 전환되는 Granite (0) | 2025.02.24 |
| [eKYC] DeepFace: 오픈소스 얼굴 인식(Facial Recognition) 라이브러리 (Github 번역 및 스터디) (0) | 2025.02.24 |


















