Tiny LLM

Explored LLM architectures optimized for resource-constrained environments.


๐Ÿ’ฌ Thoughts

This was the very first project I worked on after joining a research lab as an intern โ€” and honestly, it felt like a proper first project. On my first day, I was handed a Jetson Nano and told to set it up. My first thought was: what is this? Itโ€™s so slow and frustrating! That was basically my first real encounter with Ubuntu. It felt like dealing with an old computer, but surprisingly it supported CUDA (though not the latest versions).

At the time, I had no idea what I was even doing. Dataset? HuggingFace? What are those? I only started to get it after seeing others measuring accuracy โ€” oh, so this is on-device AI! Itโ€™s running entirely on this GPU without any server. Thatโ€™s when it clicked. When I saw models getting only 30% accuracy, I was like, is this even working? Feels like random guessing. Turns out it was because no fine-tuning had been done yet.

It was also fascinating to learn that there are so many tiny LLMs out there, and the goal was to compare them in terms of latency and accuracy to see which ones perform best.

I also got introduced to WandB and started learning how to write automation scripts. It was actually my first time automating anything. Since the measurements were slow and involved lots of repetition, I finally understood why automation matters. Ever since then, Iโ€™ve preferred automating workflows in all my projects.

Looking back, I had no idea what I was doing and just followed along at first โ€” but I think my professor intentionally gave me this as a starting point: explore whether LLMs could run on small edge devices and what kind of efficiency they could reach. But along the way, I ended up learning so many valuable things โ€” Docker, WandB, automation, conda environments, data visualization, and more.

It might not seem like a big deal later, but for someone who started out knowing nothing, this was a really important project. It gave me a solid foundation and made me feel like I was finally part of a research group.


๐Ÿ’ฌ ๋А๋‚€ ์ 

์ฒ˜์Œ์œผ๋กœ ์—ฐ๊ตฌ์‹ค ์ธํ„ด์„ ์‹œ์ž‘ํ•˜๋ฉด์„œ ์ง„ํ–‰ํ•œ ํ”„๋กœ์ ํŠธ๋‹ค. ์‚ฌ์‹ค์ƒ ์ฒซ ํ”„๋กœ์ ํŠธ ๋‹ค์šด ํ”„๋กœ์ ํŠธ์ธ ๊ฒƒ ๊ฐ™๋‹ค. ๋“ค์–ด๊ฐ€์ž๋งˆ์ž Jetson Nano๋ฅผ ์ฃผ์…”์„œ ์„ธํŒ…์„ ํ•ด๋ณด๋ฉด์„œ ์™€ ์ด๊ฒŒ ๋ญ์ง€? ์™„์ „ ๋А๋ฆฌ๊ณ  ๋‹ต๋‹ตํ•˜๋‹ค! ํ•˜๋ฉด์„œ ์‹ค์ œ ์šฐ๋ถ„ํˆฌ๋ฅผ ์‚ฌ์‹ค์ƒ ์ฒ˜์Œ ๋งŒ์ ธ๋ดค๋‹ค. ์˜›๋‚  ์ปดํ“จํ„ฐ ๋งŒ์ง€๋“ฏ์ด ๋‹ต๋‹ตํ–ˆ๋Š”๋ฐ GPU๋Š” ๋˜ CUDA๊ฐ€ ์ง€์›์ด ๋˜๋„ค? (๋ฌผ๋ก  ์ตœ์‹ ๋ฒ„์ „์€ ์•ˆ๋จ)

์ฒ˜์Œ์— ์ด๊ฒŒ ๋ญํ•˜๋Š”๊ฑด์ง€๋„ ๋ชฐ๋ž๋Š”๋ฐ, ๋ฐ์ดํ„ฐ์…‹? ํ—ˆ๊น…ํŽ˜์ด์Šค? ์ด๊ฒŒ ๋ญ์ง€? ํ•˜๋‹ค๊ฐ€ ๋‹ค๋ฅธ ๋ถ„๋“ค์ด ์ •๋‹ต๋ฅ  ์ฒดํฌํ•˜๋Š”๊ฑฐ ๋ณด๊ณ  ์•„! ์ด๊ฒŒ ์˜จ๋””๋ฐ”์ด์Šค AI๊ตฌ๋‚˜! ์„œ๋ฒ„ ์—†์ด ์ด GPU๋กœ ๋Œ๋ฆฌ๋Š”๊ฑฐ๊ตฌ๋‚˜! ๊นจ๋‹ฌ์•˜๋‹ค. ๊ทผ๋ฐ ์ •๋‹ต๋ฅ  30% ๋‚˜์˜ค๋Š”๊ฑฐ๋ณด๊ณ  โ€œ์ด๊ฒŒ ๋งž๋‚˜? ์ฐ๋Š”๊ฑฐ๋ž‘ ๋˜‘๊ฐ™์€๋ฐ?โ€ ํ–ˆ๋Š”๋ฐ ๊ทธ๋ƒฅ finetunning ์•ˆํ•˜๊ณ  ๋Œ๋ ค์„œ ๊ทธ๋Ÿฐ ๊ฒƒ ๊ฐ™๋‹ค. Tinyํ•œ LLM๋“ค์ด ์ƒ๊ฐ๋ณด๋‹ค ์—ฌ๋Ÿฌ๊ฐ€์ง€๊ฐ€ ์žˆ๊ณ , ๊ทธ์ค‘์—์„œ ์–ด๋–ค ๊ฒƒ์ด Latency์™€ ์ •ํ™•๋„๊ฐ€ ๊ดœ์ฐฎ์€์ง€ ์ฒดํฌํ•ด๋ณด๋Š”๊ฒŒ ์‹ ๊ธฐํ–ˆ๋‹ค.

๊ทธ๋Ÿฌ๋ฉด์„œ MLOps์ธ WandB๋„ ์•Œ๊ฒŒ ๋˜์—ˆ๊ณ , ์ž๋™ํ™” ์Šคํฌ๋ฆฝํŠธ๋ฅผ ๋งŒ๋“ค์–ด์„œ ํ•˜๋Š” ๋ฐฉ๋ฒ•๋„ ์•Œ๊ฒŒ ๋˜์—ˆ๋‹ค. ์‚ฌ์‹ค ์ž๋™ํ™”๋Š” ์ฒ˜์Œ ์จ๋ดค๋‹ค. ์ธก์ •์ด ์˜ค๋ž˜๊ฑธ๋ฆฌ๊ณ  ๊ฐ™์€ ์ž‘์—…์„ ๋ฐ˜๋ณตํ•˜๋‹ค๋ณด๋‹ˆ ์ž๋™ํ™”๋ฅผ ์™œ ์“ฐ๋Š”์ง€ ์ดํ•ด๊ฐ€ ๋˜์—ˆ๊ณ , ์ด ์ดํ›„ ํ”„๋กœ์ ํŠธ์—์„œ๋„ ์ž๋™ํ™”๋ฅผ ์„ ํ˜ธํ•˜๊ฒŒ ๋˜์—ˆ๋‹ค.

์–ผ๋ ๋šฑ๋•… ์ฒ˜์Œ์— ๋ชฐ๋ผ์„œ ๋”ฐ๋ผ๊ฐ€๊ธฐ๋งŒ ํ–ˆ์ง€๋งŒ, ๊ฒฐ๊ตญ ์ž‘์€ On-device์—์„œ๋„ ๋Œ๋ฆฌ๊ธฐ ์œ„ํ•œ LLM์ด ์žˆ๋Š”์ง€, ์–ด๋А์ •๋„์˜ ํšจ์œจ์ด ๋‚˜์˜ค๋Š”์ง€ ์•Œ์•„๋ณด๋ผ๊ณ  ๊ต์ˆ˜๋‹˜์ด ์ฒซ ์‹œ์ž‘์„ ๋˜์ ธ์ค€ ๊ฒƒ ๊ฐ™๋‹ค. ๊ทผ๋ฐ ์ƒ๊ฐ๋ณด๋‹ค Docker, WandB, ์ž๋™ํ™”, conda, ํ™˜๊ฒฝ์„ธํŒ…, ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™” ๋“ฑ๋“ฑโ€ฆ ๋‹ค๋ฅธ ๋ถ€๋ถ„์—์„œ๋„ ์—„์ฒญ๋‚œ ๋„์›€์ด ๋˜์—ˆ๋‹ค. ๋‚˜์ค‘๊ฐ€๋ฉด ๋ณ„๊ฑฐ ์•„๋‹Œ๊ฒƒ ์ฒ˜๋Ÿผ ๋ณด์ด์ง€๋งŒ, ์•„๋ฌด๊ฒƒ๋„ ๋ชฐ๋ž๋˜ ๋‚˜์—๊ฒŒ ๋งŽ์€ ์ •๋ณด๋“ค์„ ์•Œ๊ฒŒํ•ด์ค€ ์ข‹์€ ์—ฐ๊ตฌ์‹ค์˜ ์‹œ์ž‘์ง€์ ์ด๋ผ์„œ ๊ธฐ์–ต์— ๋‚จ๋Š” ๊ฒƒ ๊ฐ™๋‹ค.