테디님이 올려놓으신 유튜브 영상을 보고 Macbook M1에서 따라해봤습니다.  중간에 몇가지 설정에 맞게 약간 변경해가면서 테스트 합니다.

https://www.youtube.com/watch?v=VkcaigvTrug&t=23s

# GPU 모니터링
% sudo asitop

# LangServe 실행
% python server.py

# ngrok으로 external 서비스
% ngrok http --domain=humble-curiously-antelope.ngrok-free.app 8000

# PDF를 RAG
% streamlit run main.py

conda 가상환경 만들고 requirements.txt로 필요한 모듈 설치


conda create -n llm311 python=3.11

% conda env list
# conda environments:
#
base                  *  /Users/dongsik/miniconda
llm                      /Users/dongsik/miniconda/envs/llm
llm311                   /Users/dongsik/miniconda/envs/llm311

 

% conda activate llm311

% python -V
Python 3.11.9

% pip list
Package    Version
---------- -------
pip        23.3.1
setuptools 68.2.2
wheel      0.41.2

 

예제 github을 내 github으로 fork 한후 내 PC에 clone 받아서 내환경에 맞게 수정하면서 진행합니다.

 

teddy github : https://github.com/teddylee777/langserve_ollama

내 github : https://github.com/dongshik/langserve_ollama

% ll
total 1000
drwxr-xr-x@ 12 dongsik  staff     384 Apr 20 09:22 .
drwxr-xr-x   4 dongsik  staff     128 Apr 19 16:35 ..
drwxr-xr-x@ 14 dongsik  staff     448 Apr 19 16:40 .git
-rw-r--r--@  1 dongsik  staff      50 Apr 19 16:35 .gitignore
-rw-r--r--@  1 dongsik  staff    3343 Apr 19 16:35 README.md
drwxr-xr-x@  8 dongsik  staff     256 Apr 19 16:35 app
drwxr-xr-x@  8 dongsik  staff     256 Apr 19 16:35 example
drwxr-xr-x@  3 dongsik  staff      96 Apr 19 16:35 images
drwxr-xr-x@  4 dongsik  staff     128 Apr 19 16:35 ollama-modelfile
-rw-r--r--@  1 dongsik  staff  481043 Apr 19 16:35 poetry.lock
-rw-r--r--@  1 dongsik  staff     659 Apr 19 16:35 pyproject.toml
-rw-r--r--@  1 dongsik  staff   14983 Apr 19 16:35 requirements.txt

 

pip install -r requirements.txt

% pip install -r requirements.txt
Ignoring colorama: markers 'python_version >= "3.11.dev0" and python_version < "3.12.dev0" and platform_system == "Windows"' don't match your environment

 

% pip list | grep lang
langchain                  0.1.16
langchain-community        0.0.32
langchain-core             0.1.42
langchain-openai           0.1.3
langchain-text-splitters   0.0.1
langchainhub               0.1.15
langdetect                 1.0.9
langserve                  0.0.51
langsmith                  0.1.47

% pip list | grep huggingface
huggingface-hub            0.22.2

 

Huggingface에서 모델 Download 받고 Ollama에 EEVE Q5 모델 등록하고 구동

huggingface-cli download \
  heegyu/EEVE-Korean-Instruct-10.8B-v1.0-GGUF \
  ggml-model-Q5_K_M.gguf \
  --local-dir /Users/dongsik/GitHub/teddylee777/langserve_ollama/ollama-modelfile/EEVE-Korean-Instruct-10.8B-v1.0 \
  --local-dir-use-symlinks False
  
Consider using `hf_transfer` for faster downloads. This solution comes with some limitations. See https://huggingface.co/docs/huggingface_hub/hf_transfer for more details.
downloading https://huggingface.co/heegyu/EEVE-Korean-Instruct-10.8B-v1.0-GGUF/resolve/main/ggml-model-Q5_K_M.gguf to /Users/dongsik/.cache/huggingface/hub/tmpkuuur4ki
ggml-model-Q5_K_M.gguf:  37%|███████████████████████████▌                                               | 2.81G/7.65G [04:35<09:55, 8.13MB/s]

 

 

% ll -sh
total 14954512
       0 drwxr-xr-x@ 5 dongsik  staff   160B Apr 20 10:02 .
       0 drwxr-xr-x@ 4 dongsik  staff   128B Apr 20 10:02 ..
       8 -rw-r--r--@ 1 dongsik  staff   369B Apr 19 16:35 Modelfile
       8 -rw-r--r--@ 1 dongsik  staff   419B Apr 19 16:35 Modelfile-V02
14954496 -rw-r--r--  1 dongsik  staff   7.1G Apr 20 10:02 ggml-model-Q5_K_M.gguf

 

<경로>/langserve_ollama/ollama-modelfile/EEVE-Korean-Instruct-10.8B-v1.0/Modelfile

FROM ggml-model-Q5_K_M.gguf

TEMPLATE """{{- if .System }}
<s>{{ .System }}</s>
{{- end }}
<s>Human:
{{ .Prompt }}</s>
<s>Assistant:
"""

SYSTEM """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions."""

PARAMETER TEMPERATURE 0
PARAMETER stop <s>
PARAMETER stop </s>

 

모델파일 설정하지 않으면 답변이 끝났을때 이상하게 대답할수도있기때문에 필요합니다. 

System prompt가 있다면 중간(.System) 위치에 넣어으라는 의미이며 여기서는 'SYSTEM'이 이자리를 치환하게 됩니다.

그다음 <s> 스페셜 토큰이 앞에 붙어서 사용자 즉 Human의 질문 .Prompt가 들어가게 됩니다.

그후 모델 Assistant가 받아서 답변하게 됩니다. 

 

※ Note!!

Modelfile에서 <s>는 문장의 시작을 나타내는 특수 토큰입니다. 이것은 "문장의 시작"을 나타내기 위해 사용됩니다. 예를 들어, 자연어 처리 작업에서 모델이 문장의 시작을 식별하고, 이에 따라 적절한 처리를 수행할 수 있도록 합니다. 이것은 토큰화된 데이터의 일부로서 모델에 제공됩니다.

 

 

tokenizer.chat_template

{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% set system_message = 'You are a helpful assistant.' %}{% endif %}{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in loop_messages %}{% if loop.index0 == 0 %}{{'<|im_start|>system ' + system_message + '<|im_end|> '}}{% endif %}{{'<|im_start|>' + message['role'] + ' ' + message['content'] + '<|im_end|>' + ' '}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant ' }}{% endif %}

 

https://huggingface.co/yanolja/EEVE-Korean-Instruct-10.8B-v1.0


Prompt Template

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
Human: {prompt}
Assistant:

 

 

ollama 목록 확인 

% ollama list
NAME    	ID          	SIZE  	MODIFIED
eeve:q4 	68f4c2c2d9fe	6.5 GB	7 days ago
gemma:2b	b50d6c999e59	1.7 GB	10 days ago

 

ollama가 잘 구동되어 있는지 확인합니다.

% ps -ef | grep ollama
  501  3715  3691   0 Wed01PM ??         0:29.51 /Applications/Ollama.app/Contents/Resources/ollama serve
  501  4430     1   0 Wed01PM ??         0:00.03 /Applications/Ollama.app/Contents/Frameworks/Squirrel.framework/Resources/ShipIt com.electron.ollama.ShipIt /Users/dongsik/Library/Caches/com.electron.ollama.ShipIt/ShipItState.plist
  501 61608  3197   0 10:45AM ttys002    0:00.01 grep ollama

 

새로받은 모델을 ollama에 등록합니다

ollama create eeve:q5 -f ollama-modelfile/EEVE-Korean-Instruct-10.8B-v1.0/Modelfile

저는 위의 Modelfile로 ollama등록할려고 하니 "Error: unknown parameter 'TEMPERATURE'"가 발생했습니다. 
그래서 소문자 temperature로 변경해서 생성되었습니다.
만일 동일한 에러가 발생한다면 소문자 temperature로 변경해서 생성해보시기 바랍니다. 

 

% ollama create eeve:q5 -f ollama-modelfile/EEVE-Korean-Instruct-10.8B-v1.0/Modelfile

transferring model data
creating model layer
creating template layer
creating system layer
creating parameters layer
creating config layer
using already created layer sha256:b9e3d1ad5e8aa6db09610d4051820f06a5257b7d7f0b06c00630e376abcfa4c1
writing layer sha256:6b70a2ad0d545ca50d11b293ba6f6355eff16363425c8b163289014cf19311fc
writing layer sha256:1fa69e2371b762d1882b0bd98d284f312a36c27add732016e12e52586f98a9f5
writing layer sha256:3ab8c1bbd3cd85e1b39b09f5ff9a76e64da20ef81c22ec0937cc2e7076f1a81c
writing layer sha256:d86595b443c06710a3e5ba27700c6a93ded80100ff1aa808a7f3444ff529fa70
writing manifest
success

 

% ollama list
NAME    	ID          	SIZE  	MODIFIED
eeve:q4 	68f4c2c2d9fe	6.5 GB	7 days ago
eeve:q5 	0732d4a47219	7.7 GB	7 minutes ago
gemma:2b	b50d6c999e59	1.7 GB	10 days ago

 

ollama run eeve:q5

% ollama run eeve:q5
>>> 대한민국의 수도는 어디야?
안녕하세요! 대한민국의 수도에 대해 궁금해하시는군요. 서울이 바로 그 곳입니다! 서울은 나라의 북부에 위치해 있으며 정치, 경제, 문화의 중심지 역할을 하고 있습니다. 2019년 기준으로 약 970만 명의 인구를 가진 대도시로,
세계에서 가장 큰 도시 중 하나입니다. 또한 세계적인 금융 허브이자 주요 관광지로, 경복궁, 남산타워, 명동과 같은 다양한 역사적 및 현대적 명소를 자랑하고 있습니다. 서울은 활기찬 밤문화로도 유명하며, 많은 바와 클럽
관광객과 현지인 모두를 끌어들입니다. 대한민국의 수도에 대해 더 알고 싶으신 것이 있으신가요?

>>>

 

로컬 command로 실행한 모습

 

아래 문구로 질문해보겠습니다.

 

한국의 수도는 어디인가요? 아래 선택지 중 골라주세요.\n\n(A) 경성\n(B) 부산\n(C) 평양\n(D) 서울\n(E) 전주

>>> 한국의 수도는 어디인가요? 아래 선택지 중 골라주세요.\n\n(A) 경성\n(B) 부산\n(C) 평양\n(D) 서울\n(E) 전주
대한민국의 수도에 대한 질문에 답변해 주셔서 감사합니다! 정답은 (D) 서울입니다. 서울은 나라의 북부에 위치해 있으며 정치, 경제, 문화의 중심지 역할을 하고 있습니다. 2019년 기준으로 약 970만 명의 인구를
대도시로, 세계에서 가장 큰 도시 중 하나입니다. 또한 세계적인 금융 허브이자 주요 관광지로, 경복궁, 남산타워, 명동과 같은 다양한 역사적 및 현대적 명소를 자랑하고 있습니다. 서울은 활기찬 밤문화로도 유명하며,
많은 바와 클럽이 관광객과 현지인 모두를 끌어들입니다. 대한민국의 수도에 대해 더 알고 싶으신 것이 있으신가요?

 

질문과 동시에 답변이 나오는것처럼 작동합니다. 속도도 좋고 답변의 퀄리티도 좋습니다.

>>> 다음 지문을 읽고 문제에 답하시오.
...
... ---
...
... 1950년 7월, 한국 전쟁 초기에 이승만 대통령은 맥아더 장군에게 유격대원들을 북한군의 후방에 침투시키는 방안을 제안했다. 이후, 육군본부는 육본직할 유격대와 육본 독립 유격대를 편성했다. 국군은 포항과 인접한 장사동 지역에 상륙작
... 전을 수행할 부대로 독립 제1유격대대를 선정했다. 육군본부는 독립 제1유격대대에 동해안의 장사동 해안에 상륙작전을 감행하여 북한군 제2군단의 보급로를 차단하고 국군 제1군단의 작전을 유리하게 하기 위한 작전명령(육본 작명 제174호)
... 을 하달했다. 9월 14일, 독립 제1유격대대는 부산에서 LST 문산호에 승선하여 영덕군의 장사동으로 출항했다.
...
... 1950년 9월 15일, 독립 제1유격대대는 장사동 해안에 상륙을 시도하였으나 태풍 케지아로 인한 높은 파도와 안개로 인해 어려움을 겪었다. LST 문산호는 북한군의 사격과 파도로 인해 좌초되었고, 상륙부대는 09:00시경에 전원이
... 상륙을 완료하였다. 그 후, 15:00시경에 200고지를 점령하였고, 다양한 무기와 장비를 노획하였다. 9월 16일과 17일에는 독립 제1유격대대가 여러 위치에서 북한군과의 전투를 벌였으며, 미 구축함과의 연락 두절로 인해 추가적인
... 어려움을 겪었다.
...
... 장사동에서 위급한 상황에 처한 독립 제1유격대대를 구출하기 위해 해군본부는 LT-1(인왕호)를 급파했으나, LST 문산호의 구출에 실패했다. 해군본부는 상륙부대의 철수를 지원하기 위해 LST 조치원호를 현지로 보냈다. 9월 18일,
... 이명흠 부대장은 유엔 해군과의 협력 하에 부족한 식량과 탄약 지원을 받았다. 9월 19일, 유엔군의 함포지원과 함께 LST 조치원호가 도착하여 철수 작전을 시작했다. 스피어 소령은 직접 해안에 상륙하여 구조작전을 지시하였다. 9월 2
... 0일, 725명이 부산항으로 복귀했으나, 32명이 장사동 해안에 남아 북한군의 포로가 되었거나 탈출하여 국군에 합류하였다.
...
... 장사리 전투가 인천 상륙작전의 양동작전으로 알려졌으나, 이 전투가 드라마틱한 요소로 인해 과장되었으며, 실제로는 인천 상륙작전과 큰 관련이 없다. 또한, 북한이나 중국의 군사적 상황을 고려할 때, 장사리에서의 전투가 낙동강 전선에 영
... 향을 끼칠 가능성은 낮다.
...
... ---
...
... 문제
... 1. 지문에 나오는 지명을 모두 쓰시오.
... 2. 그중 대게로 유명한 곳은?
지문에 나오는 지명은 다음과 같습니다:
- 포항
- 장사동
- 영덕군
- 부산
- 문산호
- 조치원호
- 스피어 소령
- 낙동강 전선
대게로 유명한 곳은 영덕군입니다.

 

ollama 쉘에서 나올때는 Use Ctrl + d or /bye to exit.

 

2021년형 14인치 MacBook Pro의 M1 Pro CPU는 10코어이고 GPU는 16코어입니다. 

2020년형 13인치 MacBook Pro의 M1 CPU는 8코어이고 GPU는 8코어입니다. (저는 이겁니다)

 

 

cpu를 100% 까지 사용하면서 일했습니다. (수고했어)

 

 

LangServe로 모델 서빙

langserve_ollama % ll app
total 40
drwxr-xr-x@  8 dongsik  staff   256 Apr 19 16:35 .
drwxr-xr-x@ 12 dongsik  staff   384 Apr 20 09:22 ..
-rw-r--r--@  1 dongsik  staff     0 Apr 19 16:35 __init__.py
-rw-r--r--@  1 dongsik  staff   549 Apr 19 16:35 chain.py
-rw-r--r--@  1 dongsik  staff   723 Apr 19 16:35 chat.py
-rw-r--r--@  1 dongsik  staff   328 Apr 19 16:35 llm.py
-rw-r--r--@  1 dongsik  staff  1444 Apr 19 16:35 server.py
-rw-r--r--@  1 dongsik  staff   559 Apr 19 16:35 translator.py
(llm311) dongsik@dongsikleeui-MacBookPro langserve_ollama %

 

 

chat.py, chain.py, llm.py, translator.py 세개 파일의 llm 모델명을 내 환경에 맞게 수정합니다.

# LangChain이 지원하는 다른 채팅 모델을 사용합니다. 여기서는 Ollama를 사용합니다.
#llm = ChatOllama(model="EEVE-Korean-10.8B:latest")
llm = ChatOllama(model="eeve:q5")

 

server.py 실행

(llm311) dongsik@dongsikleeui-MacBookPro langserve_ollama % cd app
(llm311) dongsik@dongsikleeui-MacBookPro app % pwd
/Users/dongsik/GitHub/teddylee777/langserve_ollama/app
(llm311) dongsik@dongsikleeui-MacBookPro app % ll
total 40
drwxr-xr-x@  8 dongsik  staff   256 Apr 19 16:35 .
drwxr-xr-x@ 12 dongsik  staff   384 Apr 20 09:22 ..
-rw-r--r--@  1 dongsik  staff     0 Apr 19 16:35 __init__.py
-rw-r--r--@  1 dongsik  staff   584 Apr 20 13:15 chain.py
-rw-r--r--@  1 dongsik  staff   758 Apr 20 13:15 chat.py
-rw-r--r--@  1 dongsik  staff   363 Apr 20 13:15 llm.py
-rw-r--r--@  1 dongsik  staff  1444 Apr 19 16:35 server.py
-rw-r--r--@  1 dongsik  staff   594 Apr 20 13:15 translator.py
(llm311) dongsik@dongsikleeui-MacBookPro app % python server.py

 

http://0.0.0.0:8000/prompt/playground/

 

질문 과 답변

 

 

RemoteRunable로 LangServe를 호출 하도록 변경

 

<경로>/langserve_ollama/example

% ll
total 120
drwxr-xr-x@  9 dongsik  staff    288 Apr 20 13:50 .
drwxr-xr-x@ 12 dongsik  staff    384 Apr 20 09:22 ..
drwxr-xr-x@  3 dongsik  staff     96 Apr 19 16:35 .streamlit
-rw-r--r--@  1 dongsik  staff  12504 Apr 19 16:35 00-ollama-test.ipynb
-rw-r--r--@  1 dongsik  staff   4885 Apr 19 16:35 01-remote-invoke.ipynb
-rw-r--r--@  1 dongsik  staff   3775 Apr 19 16:35 02-more-examples.ipynb
-rw-r--r--@  1 dongsik  staff   6222 Apr 19 16:35 main.py
-rw-r--r--@  1 dongsik  staff  14708 Apr 19 16:35 requirements.txt

 

01-remote-invoke.ipynb의 로컬 LangServe 주소로 변경합니다

from langserve import RemoteRunnable

# ngrok remote 주소 설정

#chain = RemoteRunnable("NGROK 에서 설정한 본인의 도메인 주소/prompt/")
# chain = RemoteRunnable("https://poodle-deep-marmot.ngrok-free.app/prompt/")
chain = RemoteRunnable("http://0.0.0.0:8000/prompt/")

for token in chain.stream({"topic": "딥러닝에 대해서 알려줘"}):
    print(token, end="")

 

 

 

 

 

 

ngrok을 이용해서 로컬 LangServe 를 Port Forwarding하기 

ngrok 가입 

https://dashboard.ngrok.com/cloud-edge/domains

 

M1용 설치 파일을 다운로드 받아서 설치합니다.

https://dashboard.ngrok.com/get-started/setup/macos

 

무료 도메인 설정 

 

 

humble-curiously-antelope.ngrok-free.app

 

LangServe 구동된 포트로 ngok 도메인 지정해서 포트 포워딩

ngrok http --domain=humble-curiously-antelope.ngrok-free.app 8000

% ngrok http --domain=humble-curiously-antelope.ngrok-free.app 8000

ngrok                                                                                                                    (Ctrl+C to quit)

K8s Gateway API support available now: https://ngrok.com/r/k8sgb

Session Status                online
Account                       dongsik.lee (Plan: Free)
Version                       3.8.0
Region                        Japan (jp)
Latency                       45ms
Web Interface                 http://127.0.0.1:4040
Forwarding                    https://humble-curiously-antelope.ngrok-free.app -> http://localhost:8000

Connections                   ttl     opn     rt1     rt5     p50     p90
                              0       0       0.00    0.00    0.00    0.00

 

https://humble-curiously-antelope.ngrok-free.app/prompt/playground/

 

ngrok url로 질의를 해보면 local 서버의 GPU가 100%로 올라가면서 Output을 만들고있습니다.

 

01-remote-invoke.ipynb 파일의 RemoteRunnable 주소를 ngrok 주소로 변경하고 vscode로 실행해봅니다.

from langserve import RemoteRunnable

# ngrok remote 주소 설정

#chain = RemoteRunnable("NGROK 에서 설정한 본인의 도메인 주소/prompt/")
chain = RemoteRunnable("https://humble-curiously-antelope.ngrok-free.app/prompt/")
#chain = RemoteRunnable("http://0.0.0.0:8000/prompt/")

for token in chain.stream({"topic": "딥러닝에 대해서 알려줘"}):
    print(token, end="")

 

잘 작동됩니다.

 

추가 예제 

번역기 

from langchain_community.chat_models import ChatOllama
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

# LangChain이 지원하는 다른 채팅 모델을 사용합니다. 여기서는 Ollama를 사용합니다.
# llm = ChatOllama(model="EEVE-Korean-10.8B:latest")
llm = ChatOllama(model="eeve:q5")

# 프롬프트 설정
prompt = ChatPromptTemplate.from_template(
    "Translate following sentences into Korean:\n{input}"
)

# LangChain 표현식 언어 체인 구문을 사용합니다.
chain = prompt | llm | StrOutputParser()

 

LLM을 Runable로 실행

from langchain_community.chat_models import ChatOllama
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

# LangChain이 지원하는 다른 채팅 모델을 사용합니다. 여기서는 Ollama를 사용합니다.
# llm = ChatOllama(model="EEVE-Korean-10.8B:latest")
llm = ChatOllama(model="eeve:q5")

 

Streamlit으로 PDF rag 해보기

Embedding을 OpenAIEmbeddings을 사용하기위해서 OPENAI_API_KEY를 .env 파일에서 가져옵니다.

% pip install python-dotenv

 

main.py 내용중 OPEN API KEY세팅과 LANGSERVE_ENDPOINT를 ngrok주소로 업데이트 한후 실행합니다

% streamlit run main.py

  You can now view your Streamlit app in your browser.

  Local URL: http://localhost:8501
  Network URL: http://192.168.0.10:8501

  For better performance, install the Watchdog module:

  $ xcode-select --install
  $ pip install watchdog

 

예제 > SPRI_AI_Brief_2023년12월호_F.pdf

https://spri.kr/posts?code=AI-Brief

 

FileNotFoundError: [Errno 2] No such file or directory: 'pdfinfo'

% conda install poppler
Channels:
 - defaults
 - conda-forge
Platform: osx-arm64

% pip install pdftotext

 

FileNotFoundError: [Errno 2] No such file or directory: 'tesseract'

% brew install tesseract
==> Auto-updating Homebrew...
Adjust how often this is run with HOMEBREW_AUTO_UPDATE_SECS or disable with
HOMEBREW_NO_AUTO_UPDATE. Hide these hints with HOMEBREW_NO_ENV_HINTS (see `man brew`).

% brew install tesseract-lang

 

UnicodeEncodeError: 'ascii' codec can't encode characters in position 22-23: ordinal not in range(128)

 

 

 

 

위 PDF에서 최종 질문 을 해보겠습니다.

 

실제해보니 내용이 엄청난 영상입니다. 

- Ollama

- EEVE 양자화 모델

- LangServe

- ngrok

- Streamlit RAG

- Asitop

 

감사합니다.

Asitop으로 내 M1 상태 모니터링


% pip install asitop

% sudo asitop 

sudo 패스워드 입력

 

앞에서는 Ollama를 이용해서 eeve 및 gemma 모델을 M1 노트북에서 실행해봤습니다.

이번에는 llama.cpp 로 모델을 실행해보고 Ollama를 사용할때와 차이점을 확인해보겠습니다

 

주의할점은 M1에서 llama.cpp를 사용하기위해서는 Tensorflow가 필요한데 이때 python은 3.8, 3.9, 3.10만 설치에 문제가 발생하지 않습니다. 3.11, 3.12에서는 저는 tensorflow설치에 실패했습니다.

% conda create -n llm python=3.10
% conda install -c apple tensorflow-deps
% python -m pip install tensorflow
% python -m pip install tensorflow-macos
% python -m pip install tensorflow-metal

# 설치된 Library 확인
% pip list | grep tensor
tensorboard                  2.16.2
tensorboard-data-server      0.7.2
tensorflow                   2.16.1
tensorflow-io-gcs-filesystem 0.36.0
tensorflow-macos             2.16.1
tensorflow-metal             1.1.0

# 설치 확인	
% python
>>> import tensorflow as tf
>>> import keras
>>> print(tf.__version__)
2.16.1
>>> print(keras.__version__)
3.2.1
>>>

 

 

사이트의 방법대로 따라합니다

 

Usage

# GPU model
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose

# CPU
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose

pip install huggingface_hub

 

설치 스크립트로는 에러가 나서 M1의 경우 oxs arm64 archtecture 옵션을 주고 해야 정상빌드가 되더군요

(LLM RAG Langchain 통합 채팅방에서 고석현 Noah님께서 도움을 주셨습니다.) 

% CMAKE_ARGS="-DCMAKE_OSX_ARCHITECTURES=arm64 -DCMAKE_APPLE_SILICON_PROCESSOR=arm64 -DLLAMA_METAL=on" pip install --upgrade --verbose --force-reinstall --no-cache-dir llama-cpp-python
Using pip 23.3.1 from /Users/dongsik/miniconda/envs/llm/lib/python3.10/site-packages/pip (python 3.10)
Collecting llama-cpp-python
  Downloading llama_cpp_python-0.2.61.tar.gz (37.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 37.4/37.4 MB 10.1 MB/s eta 0:00:00
  Running command pip subprocess to install build dependencies
...
생략
...
      Successfully uninstalled Jinja2-3.1.3
Successfully installed MarkupSafe-2.1.5 diskcache-5.6.3 jinja2-3.1.3 llama-cpp-python-0.2.61 numpy-1.26.4 typing-extensions-4.11.0

 

% pip install huggingface_hub
Collecting huggingface_hub
  Using cached huggingface_hub-0.22.2-py3-none-any.whl.metadata (12 kB)
...
생략
...
Installing collected packages: tqdm, pyyaml, fsspec, filelock, huggingface_hub
Successfully installed filelock-3.13.4 fsspec-2024.3.1 huggingface_hub-0.22.2 pyyaml-6.0.1 tqdm-4.66.2

 

설치 확인

% python -V
Python 3.10.14
% pip list
Package                      Version
---------------------------- --------------
...
huggingface-hub              0.22.2
llama_cpp_python             0.2.61
...
tensorboard                  2.16.2
tensorboard-data-server      0.7.2
tensorflow                   2.16.1
tensorflow-io-gcs-filesystem 0.36.0
tensorflow-macos             2.16.1
tensorflow-metal             1.1.0
...

 

추가로 jupyter lab을 깔아서 notebook으로 작업하도록 합니다.

% pip install jupyter lab
Collecting jupyter
  Using cached jupyter-1.0.0-py2.py3-none-any.whl.metadata (995 bytes)
...

 

jupyter lab을 구동하고 notebook으로 에제코드를 실행해보겠습니다

 

from huggingface_hub import hf_hub_download
from llama_cpp import Llama

import time
from pprint import pprint

print(Llama)
<class 'llama_cpp.llama.Llama'>

 

# download model
model_name_or_path = "heegyu/EEVE-Korean-Instruct-10.8B-v1.0-GGUF" # repo id
# 4bit
model_basename = "ggml-model-Q4_K_M.gguf" # file name

model_path = hf_hub_download(repo_id=model_name_or_path, filename=model_basename)
print(model_path)
/Users/dongsik/.cache/huggingface/hub/models--heegyu--EEVE-Korean-Instruct-10.8B-v1.0-GGUF/snapshots/9bf4892cf2017362dbadf99bd9a3523387135362/ggml-model-Q4_K_M.gguf

 

# GPU에서 사용하려면 아래 코드로 실행
lcpp_llm = Llama(
    model_path=model_path,
    n_threads=2, # CPU cores
    n_batch=512, # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.
    n_gpu_layers=43, # Change this value based on your model and your GPU VRAM pool.
    n_ctx=4096, # Context window
)
llama_model_loader: loaded meta data with 24 key-value pairs and 435 tensors from /Users/dongsik/.cache/huggingface/hub/models--heegyu--EEVE-Korean-Instruct-10.8B-v1.0-GGUF/snapshots/9bf4892cf2017362dbadf99bd9a3523387135362/ggml-model-Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 48
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv  11:                          general.file_type u32              = 15
llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
...
생략
...
Model metadata: {'general.quantization_version': '2', 'tokenizer.chat_template': "{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% set system_message = 'You are a helpful assistant.' %}{% endif %}{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in loop_messages %}{% if loop.index0 == 0 %}{{'<|im_start|>system\n' + system_message + '<|im_end|>\n'}}{% endif %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}", 'tokenizer.ggml.add_eos_token': 'false', 'tokenizer.ggml.add_bos_token': 'true', 'tokenizer.ggml.padding_token_id': '2', 'tokenizer.ggml.unknown_token_id': '0', 'tokenizer.ggml.eos_token_id': '32000', 'tokenizer.ggml.bos_token_id': '1', 'tokenizer.ggml.model': 'llama', 'llama.attention.head_count_kv': '8', 'llama.context_length': '4096', 'llama.attention.head_count': '32', 'llama.rope.freq_base': '10000.000000', 'llama.rope.dimension_count': '128', 'general.file_type': '15', 'llama.feed_forward_length': '14336', 'llama.embedding_length': '4096', 'llama.block_count': '48', 'general.architecture': 'llama', 'llama.attention.layer_norm_rms_epsilon': '0.000010', 'general.name': 'LLaMA v2'}
Using gguf chat template: {% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% set system_message = 'You are a helpful assistant.' %}{% endif %}{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in loop_messages %}{% if loop.index0 == 0 %}{{'<|im_start|>system
' + system_message + '<|im_end|>
'}}{% endif %}{{'<|im_start|>' + message['role'] + '
' + message['content'] + '<|im_end|>' + '
'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
' }}{% endif %}
Using chat eos_token: <|im_end|>
Using chat bos_token: <s>

 

prompt_template = "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.\nHuman: {prompt}\nAssistant:\n"
text = '한국의 수도는 어디인가요? 아래 선택지 중 골라주세요.\n\n(A) 경성\n(B) 부산\n(C) 평양\n(D) 서울\n(E) 전주'

prompt = prompt_template.format(prompt=text)

start = time.time()
response = lcpp_llm(
    prompt=prompt,
    max_tokens=256,
    temperature=0.5,
    top_p=0.95,
    top_k=50,
    stop = ['</s>'], # Dynamic stopping when such token is detected.
    echo=True # return the prompt
)
pprint(response)
print(time.time() - start)
llama_print_timings:        load time =    7595.99 ms
llama_print_timings:      sample time =      49.12 ms /   159 runs   (    0.31 ms per token,  3236.90 tokens per second)
llama_print_timings: prompt eval time =    7595.51 ms /    83 tokens (   91.51 ms per token,    10.93 tokens per second)
llama_print_timings:        eval time =   24649.90 ms /   158 runs   (  156.01 ms per token,     6.41 tokens per second)
llama_print_timings:       total time =   33079.70 ms /   241 tokens
{'choices': [{'finish_reason': 'stop',
              'index': 0,
              'logprobs': None,
              'text': 'A chat between a curious user and an artificial '
                      'intelligence assistant. The assistant gives helpful, '
                      "detailed, and polite answers to the user's questions.\n"
                      'Human: 한국의 수도는 어디인가요? 아래 선택지 중 골라주세요.\n'
                      '\n'
                      '(A) 경성\n'
                      '(B) 부산\n'
                      '(C) 평양\n'
                      '(D) 서울\n'
                      '(E) 전주\n'
                      'Assistant:\n'
                      '한국은 동아시아에 위치한 국가로, 공식적으로 대한민국이라고 불립니다. 한국의 수도는 (D) '
                      '서울입니다. 서울은 나라의 북동부에 위치해 있으며 가장 큰 도시이자 정치, 경제, 문화의 '
                      '중심지입니다. 1948년 대한민국이 설립된 이래로 수도 역할을 해오고 있습니다.\n'
                      '\n'
                      '다른 선택지들은 다음과 같습니다:\n'
                      '(A) 경성 - 이 용어는 구식으로, 지금은 서울이라고 불립니다.\n'
                      '(B) 부산 - 한국의 중요한 도시지만 수도는 아닙니다.\n'
                      '(C) 평양 - 북한을 구성하는 도시 중 하나이지만 대한민국의 수도가 아닙니다.\n'
                      '(D) 전주 - 한국의 역사적인 도시로 중요하지만 수도는 아닙니다.'}],
 'created': 1713092665,
 'id': 'cmpl-c3bd8b09-3a89-4364-8c8d-41d60891160f',
 'model': '/Users/dongsik/.cache/huggingface/hub/models--heegyu--EEVE-Korean-Instruct-10.8B-v1.0-GGUF/snapshots/9bf4892cf2017362dbadf99bd9a3523387135362/ggml-model-Q4_K_M.gguf',
 'object': 'text_completion',
 'usage': {'completion_tokens': 158, 'prompt_tokens': 83, 'total_tokens': 241}}
33.08910894393921

 

답변이 12~33초 정도 걸리네요.

여기서 33초는 첫번재 실행때 소요된시간이고 두세번해보면 11~12초의 속도가 나옵니다.

ggml-model-Q4_K_M.gguf: 100%
 6.51G/6.51G [10:54<00:00, 10.6MB/s]
/Users/dongsik/.cache/huggingface/hub/models--heegyu--EEVE-Korean-Instruct-10.8B-v1.0-GGUF/snapshots/9bf4892cf2017362dbadf99bd9a3523387135362/ggml-model-Q4_K_M.gguf

 

 

다음은 vllm....

 

참고


GGUF 설명

https://huggingface.co/docs/hub/gguf

 

GGUF

GGUF Hugging Face Hub supports all file formats, but has built-in features for GGUF format, a binary format that is optimized for quick loading and saving of models, making it highly efficient for inference purposes. GGUF is designed for use with GGML and

huggingface.co

 

Hugging Face GGUF Library

https://huggingface.co/models?library=gguf

 

Models - Hugging Face

 

huggingface.co

 

 

앞에서 Ollama로 Gemma 경량화 모델을 실행해봤습니다. 이번엔 한글을 지원하는 경량화 모델중에 

 

https://huggingface.co/heegyu/EEVE-Korean-Instruct-10.8B-v1.0-GGUF

 

heegyu/EEVE-Korean-Instruct-10.8B-v1.0-GGUF · Hugging Face

Usage requirements # GPU model CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose # CPU CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall

huggingface.co

 

사이트에 설치 방법과 테스트 방법이 잘 기술되어있습니다. 다만, GPU Cuda 버전이 안맞을경우 가이드대로 실행할때 Execption이 발생해서

llama.cpp는 4비트 정수 양자화를 이용해서 Llama 모델과 Python이 함께 실행하는(저수준 액세스 바인더) 것을 목표로 만들어진 프로젝트입니다. 의존성 없는 순수 C/C++를 통해서 구현되었으며, Mac OS, Windows, Linux 모두 실행 가능합니다.

 

다운로드 가능한 세가지 모델의 비교입니다

GGUF 
ggml-model-Q4_K_M.gguf
ggml-model-Q5_K_M.gguf
ggml-model-f16.gguf
Size
6.51 GB
7.65 GB
21.6 GB
Metadata Value Value Value
version 3 3 3
tensor_count 435 435 435
kv_count 24 24 23
general.architecture llama llama llama
general.name LLaMA v2 LLaMA v2 LLaMA v2
general.file_type 15 17 1
general.quantization_version 2 2 4096
llama.context_length 4096 4096 4096
llama.embedding_length 4096 4096 48
llama.block_count 48 48 14336
llama.feed_forward_length 14336 14336 128
llama.rope.dimension_count 128 128 10000
llama.rope.freq_base 10000 10000 32
llama.attention.head_count 32 32 8
llama.attention.head_count_kv 8 8 1E-05
llama.attention.layer_norm_rms_epsilon 1E-05 1E-05 -
tokenizer.ggml.model llama llama llama
tokenizer.ggml.tokens [<unk>, <s>, </s>, <0x00>, <0x01>, ...] [<unk>, <s>, </s>, <0x00>, <0x01>, ...] [<unk>, <s>, </s>, <0x00>, <0x01>, ...]
tokenizer.ggml.scores [-1000, -1000, -1000, -1000, -1000, ...] [-1000, -1000, -1000, -1000, -1000, ...] [-1000, -1000, -1000, -1000, -1000, ...]
tokenizer.ggml.token_type [3, 3, 3, 6, 6, ...] [3, 3, 3, 6, 6, ...] [3, 3, 3, 6, 6, ...]
tokenizer.ggml.bos_token_id 1 1 1
tokenizer.ggml.eos_token_id 32000 32000 32000
tokenizer.ggml.unknown_token_id 0 0 0
tokenizer.ggml.padding_token_id 2 2 2
tokenizer.ggml.add_bos_token TRUE TRUE TRUE
tokenizer.ggml.add_eos_token FALSE FALSE FALSE
tokenizer.chat_template {% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% set system_message = 'You are a helpful assistant.' %}{% endif %}{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in loop_messages %}{% if loop.index0 == 0 %}{{'<|im_start|>system ' + system_message + '<|im_end|> '}}{% endif %}{{'<|im_start|>' + message['role'] + ' ' + message['content'] + '<|im_end|>' + ' '}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant ' }}{% endif %} {% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% set system_message = 'You are a helpful assistant.' %}{% endif %}{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in loop_messages %}{% if loop.index0 == 0 %}{{'<|im_start|>system ' + system_message + '<|im_end|> '}}{% endif %}{{'<|im_start|>' + message['role'] + ' ' + message['content'] + '<|im_end|>' + ' '}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant ' }}{% endif %} {% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% set system_message = 'You are a helpful assistant.' %}{% endif %}{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in loop_messages %}{% if loop.index0 == 0 %}{{'<|im_start|>system ' + system_message + '<|im_end|> '}}{% endif %}{{'<|im_start|>' + message['role'] + ' ' + message['content'] + '<|im_end|>' + ' '}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant ' }}{% endif %}

 

셋중에서 가장작은 Q4 모델을 다운로드 받습니다.

https://huggingface.co/heegyu/EEVE-Korean-Instruct-10.8B-v1.0-GGUF/resolve/main/ggml-model-Q4_K_M.gguf?download=true

 

그리고 modelfile 파일을 만들어서 다운로드 받은 모델을 Ollama에 등록해주면 됩니다.

다운로드 받는 GGML은 Apple M1 및 M2 실리콘에 최적화된 양자화 구현이라고 합니다.

 

다운로드 받은 모델파일을 Ollama에 등록해주기 위해서 Modelfile을 생성합니다.

 

ModelFile

FROM /Users/dongsik/GitHub/llm/eeve/EEVE-Korean-Instruct-10.8B-v1.0-GGUF/ggml-model-Q4_K_M.gguf
​
TEMPLATE """### User:
{{ .Prompt }}
​
### Assistant:
"""
​
PARAMETER temperature 0.1
​
PARAMETER num_ctx 4096
PARAMETER stop "</s>"
PARAMETER stop "### System:"
PARAMETER stop "### User:"
PARAMETER stop "### Assistant:"

 

Modelfile과 다운로드받은 GGUF 파일이 준비되었습니다.

% ll
total 40
drwxr-xr-x  8 dongsik  staff   256 Apr 12 14:28 .
drwxr-xr-x  7 dongsik  staff   224 Apr 12 11:33 ..
drwxr-xr-x  5 dongsik  staff   160 Apr 11 00:11 .ipynb_checkpoints
drwxr-xr-x  3 dongsik  staff    96 Apr 10 23:55 EEVE-Korean-Instruct-10.8B-v1.0-GGUF
-rw-r--r--  1 dongsik  staff   325 Apr 12 14:27 Modelfile
-rw-r--r--  1 dongsik  staff  5957 Apr 11 00:26 ollama_eeve_gguf.ipynb

 

eeve 모델을 Ollama에 등록해줍니다

% ollama create eeve:q4 -f Modelfile
2024/04/12 14:28:50 parser.go:73: WARN Unknown command:
2024/04/12 14:28:50 parser.go:73: WARN Unknown command:
2024/04/12 14:28:50 parser.go:73: WARN Unknown command:
transferring model data
creating model layer
creating template layer
creating parameters layer
creating config layer
using already created layer sha256:5a79b80eb5e2eec5cf5d514dfa32187872dde1dae6a2b9c8
using already created layer sha256:c3de887d2d041bfea1bfed395834ea828839af278003269e
using already created layer sha256:e6b785eab1777ecfc57eab9a85f9b623931e6f1079ae6d75
using already created layer sha256:8b03799cdb5862e5cdfda70f0e116193aa07f2309015a158
writing manifest
success

 

성공적으로 등록되면 모델을 확인가능합니다.

 

gemma:2b 모델과 eeve:q4 모델 두개가 등록된것을 확인합니다.

% ollama list
NAME       	ID          	SIZE  	MODIFIED
eeve:q4    	68f4c2c2d9fe	6.5 GB	8 seconds ago
gemma:2b   	b50d6c999e59	1.7 GB	2 days ago

 

지울때는 rm 명령을 사용합니다 (ollama rm eeve:q4)

 

cli로 ollama eeve:q4 모델에 질문을 해봅니다. 

M1 노트북임을 감안했을때 상당히 빠르게 답변을 해주고 있고 퀄리티도 상당합니다.

아래 추가로 ChatGPT 3.5와 뤼튼의 GPT 4를 통해서 동일한 질문을 해본 결과 입니다.

 

eeve:q4 의 답변

% ollama run eeve:q4 하늘은 왜 파란색일까요?
하늘이 파란색인 이유는 대기 중에 있는 공기 분자와 먼지 입자들이 햇빛을 산란시키기 때문입니다. 태양빛이 지구 대기를 통과할 때, 이 작은 입자들과 상호 작용하여 빛을 다양한 방향으로 흩어지게 합니다. 우리가 하늘에서 보는 빛은 주로
파란색과 보라색의 가시광선 스펙트럼에 집중되어 있습니다.

이 현상은 빛이 공기 분자와 먼지 입자에 부딪힐 때 일어나는 산란 효과 때문입니다. 이 작은 입자들은 태양빛을 다양한 방향으로 흩어지게 하여, 우리가 하늘에서 보는 색깔을 만들어냅니다. 파란색과 보라색 빛은 다른 색들보다 더
산란되기 때문에, 우리 눈에 가장 많이 도달하여 하늘을 파랗게 보이게 합니다.

또한 대기 중에 있는 물방울이나 얼음 결정도 햇빛의 산란에 역할을 할 수 있습니다. 이 입자들이 태양빛과 상호 작용할 때, 빛을 다양한 방향으로 흩어지게 하여 하늘을 더욱 파랗게 보이게 만듭니다.

하늘이 항상 파란색인 것은 아니며, 대기 조건에 따라 색깔이 변할 수 있다는 점을 주목하는 것이 중요합니다. 예를 들어, 일출이나 일몰 때는 대기 중에 있는 먼지 입자와 물방울의 양이 증가하여 햇빛을 더 많이 산란시켜 하늘에 주황색,
분홍색 또는 보라색을 만들 수 있습니다.

요약하자면, 하늘이 파란색인 이유는 공기 분자, 먼지 입자 및 대기 중 물방울이나 얼음 결정과 같은 다양한 요소가 태양빛의 산란에 기여하기 때문입니다. 이 상호 작용은 파란색과 보라색 빛의 높은 농도를 만들어내어 우리가 하늘에서 보는
색깔을 만들어냅니다.

 

GPT-3.5의 답변

 

뤼튼 GPT-4의 답변

 

Ollama Langchain을 이용해서 추론을 잘하는지 질문을 해보겠습니다.

import time
import langchain
from langchain_community.llms import Ollama
import logging

# Configure basic logging
logging.basicConfig(level=logging.INFO)

try:
    llm = Ollama(model="eeve:q4")
    
    # 프롬프트가 잘 정의되어 있는지 확인하는 것이 필요합니다. (모델의 기능에 따라 조정가능)
    start = time.time()
    prompt = ("한국의 수도는 어디인가요? 아래 선택지 중 골라주세요.\n\n(A) 경성\n(B) 부산\n(C) 평양\n(D) 서울\n(E) 전주")
    
    response = llm.invoke(prompt)
    print(response)
    print(time.time() - start)
    
except ImportError:
    logging.error("Failed to import Ollama from langchain_community. Is the package installed?")
except Exception as e:
    logging.error(f"An unexpected error occurred: {e}")

 

정답은 (D) 서울입니다.
3.465108871459961

 

해당 내용은 Mac (M1) 을 기준으로 작성되었습니다.

 

대규모 언어 모델을 로컬에서 실행하기 위해서 Ollama를 설치하고 구글의 최첨단 경량 오픈모델인 gemma를 다운받아서 간단히 돌려보고

LangChain으로 연결해봅니다.

 

간밤에 "LLM RAG Langchain 통합" 채팅방의 권진영님께서 친절하게 설치와 사용방법을 알려주셔서 다른분들도 간단히 설치해서 사용해보면 좋을거 같아서 정리해봅니다.

 

Ollama github에 가면 로컬환경에 설치가능한 설치파일들을 다운받을수있습니다.

https://github.com/ollama/ollama

 

GitHub - ollama/ollama: Get up and running with Llama 2, Mistral, Gemma, and other large language models.

Get up and running with Llama 2, Mistral, Gemma, and other large language models. - ollama/ollama

github.com

 

 

이중 macOS에 해당하는 설치파일을 다운로드 받아서 설치합니다. 설치는 너무 간단해서 의외이기도 합니다. 다운로드 받으면 디렉토리에 설치파일이 생기게되고 더블클릭해서 설치합니다.

다운로드된 설치파일

 

실행하면 아래처럼 설치가 시작됩니다. Next 클릭

설치파일 실행하면 설치시작

설치는 금방 완료됩니다.

 

설치가 완료되면 Terminal을 실행해서 Ollama를 실행합니다.

 

Gemma는 두개 모델을 제공하는데 먼저 가장작은 모델로 시작해 보겠습니다.

% ollama run gemma:2b
pulling manifest
pulling c1864a5eb193...   5% ▕███                     ▏  87 MB/1.7 GB  8.0 MB/s   3m17s

 

설치가 완료되면 메시지를 호출 할수있는 창이 뜨면서 설치가 완료됩니다.

% ollama run gemma:2b
pulling manifest
pulling c1864a5eb193... 100% ▕██████████████████████████████████████████████▏ 1.7 GB
pulling 097a36493f71... 100% ▕██████████████████████████████████████████████▏ 8.4 KB
pulling 109037bec39c... 100% ▕██████████████████████████████████████████████▏  136 B
pulling 22a838ceb7fb... 100% ▕██████████████████████████████████████████████▏   84 B
pulling 887433b89a90... 100% ▕██████████████████████████████████████████████▏  483 B
verifying sha256 digest
writing manifest
removing any unused layers
success
>>> hi
Hi! 👋 How can I assist you today? 😊

Is there anything I can help you with?

>>> Send a message (/? for help)

 

간단하게 "hi"로 인사해 봤습니다.

 

이제 실행창에서 나오도록 합니다 "/bye"를 입력합니다.

>>> /bye
Hello! 👋 It's nice to hear from you. How can I help you today? 😊

Is there anything I can do for you?

>>> /bye
(base) dongsik@dongsikleeui-MacBookPro ~ %

 

바로 내보내주지는 않으면 한번더 "/bye" 합니다

 

설치되어있는 모델을 확인할수도 있습니다.

% ollama list
NAME    	ID          	SIZE  	MODIFIED
gemma:2b	b50d6c999e59	1.7 GB	6 minutes ago

 

또한 설치된 모델을 삭제도 할수있습니다. 설치와 삭제가 너무간단합니다.

% ollama rm gemma:2b
deleted 'gemma:2b'
% ollama list
NAME	ID	SIZE	MODIFIED
%

 

그럼 다시 설치하고 간단히 사용방법을 설명합니다.

 

1. LangChain 으로 실행하기

# LangChain 설치
pip install langchain

 

import langchain
# LangChain 버전 확인
print('LangChain version:', langchain.__version__)

결과
LangChain version: 0.1.12

 

로컬에 설치든 ollama gemma:2b 모델을 사용하도록 설정하고 실행합니다.

from langchain_community.llms import Ollama
import logging

# logging 설정
logging.basicConfig(level=logging.INFO)

try:
    llm = Ollama(model="gemma:2b")
    
    # 프롬프트가 잘 정의되어 있는지 확인하는 것이 필요합니다. (모델의 기능에 따라 조정가능)
    prompt = ("Why is the sky blue?")
    
    response = llm.invoke(prompt)
    print(response)
except ImportError:
    logging.error("Failed to import Ollama from langchain_community. Is the package installed?")
except Exception as e:
    logging.error(f"An unexpected error occurred: {e}")

 

결과 :
The sky appears blue due to Rayleigh scattering. This scattering process occurs when light interacts with molecules in the Earth's atmosphere. 

* **Blue light has a longer wavelength than other colors**. This means it can penetrate further into the atmosphere. 
* **Blue light waves have more energy** than other colors, so they are more likely to scatter. 
* **Water vapor molecules** in the atmosphere absorb blue light more efficiently than other colors. 
* **Scattered blue light** is scattered in all directions equally, giving the sky its blue color.

The amount and intensity of blue scattering depends on several factors, including:

* **Particle size and density of the particles**: Smaller particles scatter light more efficiently than larger particles. 
* **The wavelength of light**: Blue light is scattered more strongly than other colors. 
* **Atmospheric conditions**: Temperature, humidity, and air density can also affect scattering.

Overall, the scattering of sunlight in the atmosphere creates the blue color of the sky.

 

url call로 호출하고 결과를 streaming 방식으로 stand out으로 출력합니다.

from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain_community.llms.ollama import Ollama

llm = Ollama(
    base_url="http://localhost:11434",
    model="gemma:2b",
    callback_manager=CallbackManager(
        [StreamingStdOutCallbackHandler()],
    ),
)

prompt = ("Why is the sky blue?")
response = llm.invoke(prompt)
print(response)

 

결과 :
The sky appears blue due to Rayleigh scattering. Rayleigh scattering is the scattering of light by particles of a shorter wavelength, such as blue light. This scattering causes longer wavelengths, such as red and yellow light, to be scattered more than blue light. As a result, the sky appears blue to us.The sky appears blue due to Rayleigh scattering. Rayleigh scattering is the scattering of light by particles of a shorter wavelength, such as blue light. This scattering causes longer wavelengths, such as red and yellow light, to be scattered more than blue light. As a result, the sky appears blue to us.

 

 

https://github.com/ollama/ollama/blob/main/docs/api.md

2. Command 창에서 curl로 /api/generate

streaming 

Reqeust
% curl http://localhost:11434/api/generate -d '{
  "model": "gemma:2b",
  "prompt":"Why is the sky blue?"
}'

Response
{"model":"gemma:2b","created_at":"2024-04-10T01:46:35.254492Z","response":"The","done":false}
{"model":"gemma:2b","created_at":"2024-04-10T01:46:35.291573Z","response":" sky","done":false}
{"model":"gemma:2b","created_at":"2024-04-10T01:46:35.325664Z","response":" appears","done":false}
... <생략>
{"model":"gemma:2b","created_at":"2024-04-10T01:46:44.741546Z","response":" the","done":false}
{"model":"gemma:2b","created_at":"2024-04-10T01:46:44.775088Z","response":" atmosphere","done":false}
{"model":"gemma:2b","created_at":"2024-04-10T01:46:44.810226Z","response":".","done":false}
{"model":"gemma:2b","created_at":"2024-04-10T01:46:44.845784Z","response":"","done":true,"context":[106,1645,108,4385,603,573,8203,3868,235336,107,108,106,2516,108,651,8203,8149,3868,3402,577,153902,38497,235265,1417,38497,12702,1185,33365,113211,675,24582,575,573,10379,235303,235256,13795,235269,14076,24582,576,23584,578,16175,235265,109,235287,5231,5200,2611,919,476,5543,35571,688,1178,3868,2611,235265,1417,3454,674,1185,33365,30866,573,13795,235269,978,3868,2611,603,30390,3024,774,1167,2116,235265,108,235287,5231,10716,2611,919,476,25270,35571,688,578,603,30390,978,16347,1178,3118,2611,235265,108,235287,714,5231,10526,576,38497,688,12014,611,573,35571,576,573,2611,235265,11569,235269,3868,2611,603,30390,978,1178,1156,9276,235265,108,235287,714,13795,919,978,23584,24582,1178,16175,24582,235269,948,3454,674,978,3118,2611,603,30390,3024,235265,1417,603,3165,573,8203,8149,3868,235265,109,4858,708,1009,5942,4691,1105,573,3868,8203,235292,109,235287,714,3868,2881,603,5231,38131,576,5809,168428,1417,3454,674,573,8203,877,4824,3868,20853,576,1368,5342,689,7033,665,603,235265,108,235287,714,3868,2881,603,1170,5231,1665,10918,731,38636,168428,1417,3454,674,573,8203,877,4824,3868,793,4391,1368,1536,692,708,575,573,2134,235265,108,235287,714,3868,2881,603,476,5231,2667,576,2611,38497,168428,1417,3454,674,2611,603,30390,575,832,16759,731,24582,575,573,13795,235265,108,235287,714,3868,2881,576,573,8203,603,476,5231,28205,44299,168428,1417,603,1861,573,13795,603,780,13596,12876,235269,578,573,38497,2185,12014,611,573,6581,576,573,33365,8761,577,573,16071,575,573,13795,235265,107,108],"total_duration":13164771833,"load_duration":3454744833,"prompt_eval_count":15,"prompt_eval_duration":115430000,"eval_count":282,"eval_duration":9592944000}

 

No streaming 

Request
% curl http://localhost:11434/api/generate -d '{
  "model": "gemma:2b",
  "prompt":"Why is the sky blue?",
  "stream": false
}'

Reponse
{"model":"gemma:2b","created_at":"2024-04-10T01:49:38.296228Z","response":"The sky is blue due to Rayleigh scattering. Rayleigh scattering is the scattering of light by particles of a shorter wavelength. This means that blue light has a greater wavelength and is scattered more than other colors. This is why the sky appears blue.","done":true,"context":[106,1645,108,4385,603,573,8203,3868,235336,107,108,106,2516,108,651,8203,603,3868,3402,577,153902,38497,235265,153902,38497,603,573,38497,576,2611,731,16071,576,476,25270,35571,235265,1417,3454,674,3868,2611,919,476,6561,35571,578,603,30390,978,1178,1156,9276,235265,1417,603,3165,573,8203,8149,3868,235265,107,108],"total_duration":1936533375,"load_duration":3180292,"prompt_eval_duration":272404000,"eval_count":49,"eval_duration":1658092000}

 

3. Command 창에서 curl로 /api/chat 

 

Chat Request (Streaming)

Request
% curl http://localhost:11434/api/chat -d '{
  "model": "gemma:2b",
  "messages": [
    {
      "role": "user",
      "content": "why is the sky blue?"
    }
  ]
}'

Response
{"model":"gemma:2b","created_at":"2024-04-10T01:27:16.070998Z","message":{"role":"assistant","content":"The"},"done":false}
{"model":"gemma:2b","created_at":"2024-04-10T01:27:16.108371Z","message":{"role":"assistant","content":" sky"},"done":false}
{"model":"gemma:2b","created_at":"2024-04-10T01:27:16.142158Z","message":{"role":"assistant","content":" appears"},"done":false}
{"model":"gemma:2b","created_at":"2024-04-10T01:27:16.175229Z","message":{"role":"assistant","content":" blue"},"done":false}
{"model":"gemma:2b","created_at":"2024-04-10T01:27:16.207642Z","message":{"role":"assistant","content":" due"},"done":false}
... <생략>
{"model":"gemma:2b","created_at":"2024-04-10T01:27:25.624649Z","message":{"role":"assistant","content":" higher"},"done":false}
{"model":"gemma:2b","created_at":"2024-04-10T01:27:25.658043Z","message":{"role":"assistant","content":" temperatures"},"done":false}
{"model":"gemma:2b","created_at":"2024-04-10T01:27:25.692536Z","message":{"role":"assistant","content":"."},"done":false}
{"model":"gemma:2b","created_at":"2024-04-10T01:27:25.725932Z","message":{"role":"assistant","content":""},"done":true,"total_duration":12244362334,"load_duration":2484924542,"prompt_eval_count":15,"prompt_eval_duration":103298000,"eval_count":286,"eval_duration":9654690000}

 

Chat request (No streaming)

Request
% curl http://localhost:11434/api/chat -d '{
  "model": "gemma:2b",
  "messages": [
    {
      "role": "user",
      "content": "why is the sky blue?"
    }
  ],
  "stream": false
}'

Response
{"model":"gemma:2b","created_at":"2024-04-10T01:30:16.821415Z","message":{"role":"assistant","content":"The sky appears blue due to Rayleigh scattering. This phenomenon occurs when sunlight interacts with molecules in the Earth's atmosphere.\n\n**Rayleigh Scattering:**\n\n* Sunlight is composed of all colors of the spectrum, including blue, violet, yellow, orange, and red.\n* When sunlight enters the atmosphere, it interacts with molecules such as nitrogen and oxygen molecules.\n* These molecules have different sizes and structures, which cause different wavelengths of light to scatter in different directions.\n* Blue light, with its shorter wavelengths, is scattered more strongly than other colors due to its shorter path length through the atmosphere.\n\n**Blue Sky:**\n\n* As a result, blue light is scattered in all directions from the Sun.\n* This scattering effect spreads out the Sun's light throughout the atmosphere, making the sky appear blue.\n* The intensity of blue light can vary slightly depending on factors such as altitude, temperature, and atmospheric conditions.\n\n**Other Factors:**\n\n* The scattering process depends on the size and density of the molecules, which is why the sky appears blue even though the Sun is a star of much greater temperature.\n* The atmosphere is composed of different gases with varying densities, which influences the scattering process.\n* Cloud and pollution can also affect the sky's color, with clouds reflecting blue light more efficiently than other colors.\n\n**Conclusion:**\n\nThe blue color of the sky is primarily caused by Rayleigh scattering of sunlight by molecules in the Earth's atmosphere. This scattering process spreads out the Sun's light throughout the sky, making it appear blue to us on Earth."},"done":true,"total_duration":11417865583,"load_duration":4883667,"prompt_eval_duration":270041000,"eval_count":324,"eval_duration":11140497000}

1. Expert 방식

# 키등록
% export OPENAI_API_KEY="sk-mX9fc .... "

# 등록한 키를 확인
% echo $OPENAI_API_KEY

# env 설정 정보 확인
% env | grep OPENAI_API_KEY

# env에서 제거하는 방법
% unset OPENAI_API_KEY

 

 

 

2. .env 파일로 관리

- 실행할 프로젝트의 root에 .env 파일을 만드고 아래와 같이 키를 입력

OPENAI_API_KEY="sk-mX9fc .... "

 

코드에서 .env 파일정보를 읽어오기

import os
from dotenv import load_dotenv, find_dotenv

# .env 파일을 찾기
print(find_dotenv()) 
> /Users/dongsik/workspace/OpenAI/langchain-kr/.env

# 찾은 .env 파일을 load 
print(load_dotenv(find_dotenv()))
> True

# OPENAI_API_KEY를 출력
print(os.environ['OPENAI_API_KEY'])
> sk-mX9fc ...

 

 

 

작년 이맘때쯤 같은 질문을 날려봤었는데 그때 엉뚱한 띠를 말해주길래 음~~ 했다.

그때는 Askup과 ChatGPT를 했는데 둘다..잘못된..

언제쯤 내 태어난해의 띠를 맞출려나..

 

그런데 ChatGPT 4에서는 된다고하네요. 지금유료가 아니라 확인은 못해보지만 ㅠㅠ

 

그리고 뤼튼에서 GPT4로 잘찾아주네요. 결국 돈이들어야 하나봅니다.

 

 

ChatGPT 3.5
Upstage solar mini chat

 

 

AskUp을 처음 채널에 추가하지마자, 다른거 안해보고 첫질문을 OpenAI랑 바로 비교해보고싶어서 동일한 유형의 질문을 해봤습니다.

 

둘다 내가 Alibaba Cloud를 사용하고있다고 얘기했는데 CloudWatch(AWS)를 얘기하네요...음.

 

답변들은 비슷한 내용들이지만, AskUp은 대화형이기때문에 질문의 의도와 질문자의 이전 Sentense를 잘 파악해서 다음대화로 이어갈려는 자연스러움이 좋네요. 혼자 이런저런 질문하며 놀아도 괜찮을거같습니다.

 

참고

ADPS - Alternative Deposit Payment System


Askup

 


OpenAI 

+ Recent posts