Skip to content
This repository was archived by the owner on Jun 13, 2025. It is now read-only.

Add lora loading to chat completions#13

Merged
edamamez merged 1 commit intolamini-v0.7.2from
ez-lora-load-chat-completion
Mar 12, 2025
Merged

Add lora loading to chat completions#13
edamamez merged 1 commit intolamini-v0.7.2from
ez-lora-load-chat-completion

Conversation

@edamamez
Copy link
Copy Markdown
Contributor

Train a mome mini model

 % python deployments/helm/lamini-operator/example.py --train
Data pairs uploaded to local.

Your dataset id is: 79290ef13cb7553eaa290add748004019d1fc1b0c6340f68807d3b42b6a6d175 . Consider using this in the future to train using the same data. 
Eg: llm.train(data_or_dataset_id='79290ef13cb7553eaa290add748004019d1fc1b0c6340f68807d3b42b6a6d175')
Tuning job submitted! Check status of job 107 here: http://localhost:8000/train/107

Run chat completions

    response = client.chat.completions.create(
        model="mome_mini/fc080dd3b5e4aad5798c3da7c0d777e6fdb8b5936cc44ed0325fe5ca1023c7eb",
        messages=[{"role": "user", "content": "What color are mangoes?"}],
        max_tokens=100,
    )
% python deployments/helm/lamini-operator/example.py --inference
Response: ChatCompletion(id='chatcmpl-51b8343aef784bd488f131a8496cefc6', choices=[Choice(finish_reason='length', index=0, logprobs=None, message=ChatCompletionMessage(content="Mangoes usually have a yellow color when they are ripe. However, their skin can range from green to purple, depending on the variety and ripeness. Some mangoes may also have a red or a combination of red and yellow skin.\n\nHere are a few common mango skin colors:\n\n- Green: Immature mangoes or some varieties.\n- Yellow: The most common color of mature mangoes.\n- Red or reddish: Some varieties, such as the 'Tommy Atkins' or '", refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None))], created=1741817115, model='hosted_vllm/fc080dd3b5e4aad5798c3da7c0d777e6fdb8b5936cc44ed0325fe5ca1023c7eb', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=100, prompt_tokens=41, total_tokens=141, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None)

Logs

INFO 03-12 22:05:14 serving_chat.py:137] [Lamini] Completion request has LoRA request: lora_name='fc080dd3b5e4aad5798c3da7c0d777e6fdb8b5936cc44ed0325fe5ca1023c7eb' lora_path='/app/lamini/jobs/107'
INFO 03-12 22:05:14 serving_chat.py:138] [Lamini] Loading LoRA adapter...
INFO 03-12 22:05:14 serving_models.py:174] Loaded new LoRA adapter: name 'fc080dd3b5e4aad5798c3da7c0d777e6fdb8b5936cc44ed0325fe5ca1023c7eb', path '/app/lamini/jobs/107'
INFO 03-12 22:05:14 serving_chat.py:141] [Lamini] LoRA adapter loaded
INFO 03-12 22:05:14 serving_chat.py:142] [Lamini] Set model to LoRA name: fc080dd3b5e4aad5798c3da7c0d777e6fdb8b5936cc44ed0325fe5ca1023c7eb
INFO 03-12 22:05:15 chat_utils.py:332] Detected the chat template content format to be 'string'. You can set `--chat-template-content-format` to override this.
INFO 03-12 22:05:15 logger.py:39] Received request chatcmpl-51b8343aef784bd488f131a8496cefc6: prompt: '<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 26 Jul 2024\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nWhat color are mangoes?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=1.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=100, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None), prompt_token_ids: None, lora_request: LoRARequest(lora_name='fc080dd3b5e4aad5798c3da7c0d777e6fdb8b5936cc44ed0325fe5ca1023c7eb', lora_int_id=2, lora_path='/app/lamini/jobs/107', lora_local_path=None, long_lora_max_len=None, base_model_name=None), prompt_adapter_request: None.
INFO 03-12 22:05:15 engine.py:275] Added request chatcmpl-51b8343aef784bd488f131a8496cefc6.
INFO 03-12 22:05:17 metrics.py:455] Avg prompt throughput: 8.2 tokens/s, Avg generation throughput: 20.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
INFO 03-12 22:05:17 serving_chat.py:280] [Lamini] Completion request completed, unloading LoRA adapter...: lora_name='fc080dd3b5e4aad5798c3da7c0d777e6fdb8b5936cc44ed0325fe5ca1023c7eb' lora_path='/app/lamini/jobs/107'
INFO 03-12 22:05:17 serving_models.py:191] Removed LoRA adapter: name 'fc080dd3b5e4aad5798c3da7c0d777e6fdb8b5936cc44ed0325fe5ca1023c7eb'
INFO 03-12 22:05:17 serving_chat.py:283] [Lamini] LoRA adapter unloaded
INFO:     10.42.29.237:59070 - "POST /v1/chat/completions HTTP/1.1" 200 OK

@edamamez edamamez merged commit 5f7d97d into lamini-v0.7.2 Mar 12, 2025
@edamamez edamamez deleted the ez-lora-load-chat-completion branch March 27, 2025 19:56
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant