On-Device AI ?

개발/코틀린

On-Device AI ?

흰색텀블러 2025. 2. 10. 22:21

On-Device Ai 란?

인터넷이 되지않는 환경임에도 불구하고 ai 기능을 통해 삶에 편리함을 제공하는 방법 ( ex. Circle to Search 등.)

구현을 해보자 ( by YOLO.pt -> YOLO.tflite)

YOLO 모델을 잘 구현해놓은 ultralytics 에서 가져와서 훈련을 진행합니다!
개인이 정한 parameter를 통해 학습을 진행하고, best.pt 모델을 도출해 냅니다.
best.pt 모델을 device에 넣기 위해 tflite 형식으로 export 해요!
export 할때, 기능 만큼 중요한 것이 경량화이므로, 경량화를 위해 float32를 float16으로 변환시켜봅니다! (상황마다 다르지만, int8로 변환시켜도 됩니다. ) -> 이 과정을 "양자화"(quantization) 라고 합니다.
경량화 한 모델을 안드로이드에서 적용시켜 보아요!!

안드로이드 스튜디오에서 작업을 진행해보자! (Kotlin)

yolo 모델을 쓰기 위해서는 detector, , letterboxInfo 등과 같은 데이터 및 클래스가 필요합니다.

companion object {
        private const val MODEL_INPUT_WIDTH = 384
        private const val MODEL_INPUT_HEIGHT = 640

        // 양자화 파라미터
        private const val OUTPUT_SCALE = 2.8370347f
        private const val OUTPUT_ZERO_POINT = -115

        // 감정 검출 임계값 및 NMS 파라미터
        private const val EXPRESSION_THRESHOLD = 0.5f
        private const val NMS_THRESHOLD = 0.3f

        // 후보 개수 (모델에 따라 조정)
        private const val CANDIDATE_COUNT = 5040
    }

Detector 에서 자주 사용하게 되는 변수를 companion object로 생성한 코드입니다.

    private var inputBufferSize = if (isQuantized) {
        1 * MODEL_INPUT_WIDTH * MODEL_INPUT_HEIGHT * 3
    } else {
        1 * MODEL_INPUT_WIDTH * MODEL_INPUT_HEIGHT * 3 * 4
    }
    private var inputBuffer: ByteBuffer = ByteBuffer.allocateDirect(inputBufferSize).apply {
        order(ByteOrder.nativeOrder())
    }
    private val pixels = IntArray(MODEL_INPUT_WIDTH * MODEL_INPUT_HEIGHT)

    // 색상 RGB
    private val letterboxBitmap: Bitmap =
        Bitmap.createBitmap(MODEL_INPUT_WIDTH, MODEL_INPUT_HEIGHT, Bitmap.Config.ARGB_8888)
    private val letterboxCanvas = Canvas(letterboxBitmap)

색상을 가져오기 위해서 inputBufferSize를 정해야하는데요. 이는, ai 모델이 원하는 tensor의 형태를 맞춰주기 위함이에요!!
색상에 따라 object Detection이 되는가 안되는가, 결정될 수 있기 때문에, RGB도 잘 작성해 놓습니다!

val fileDescriptor = assetManager.openFd(modelPath)
        val inputStream = fileDescriptor.createInputStream()
        val fileChannel = inputStream.channel
        val mappedByteBuffer = fileChannel.map(
            java.nio.channels.FileChannel.MapMode.READ_ONLY,
            fileDescriptor.startOffset,
            fileDescriptor.declaredLength
        )

val fileDescriptor = assetManager.openFd(modelPath)
- AssetManager를 통해서 modelPath 경로에 있는 파일 ( tflite 파일!!)을 열어요.
- AssetFileDescriptor : 반환값이에요. 시작 위치 (offset), 길이 (length), 파일 디스크립터(file descriptor) 포함!
val inputStream = fileDescriptor.createInputStream()
- fileDescriptor.createInputStream()는 AssetFileDescriptor(반환값) 친구로 부터 InputStream을 생성하는데, 이 과정은 "파일을 읽을 수 있도록 하는 입력 스트림을 생성하는 과정"입니다!
val fileChannel = inputStream.channel
- inputStream.channel : InputStream으로 부터 FileChannel을 가져와요. FileChannel은 파일을 읽거나, 쓰는데 사용되고, 메모리 맵핑이 가능해요!
val mappedByteBuffer = fileChannel.map ( ~~~)
- fileChannel.map(~~~) : 파일을 메모리에 매핑해서 MappedByteBuffer 생성하기!
- java.nio.channels.FileChannel.MapMode.READ_ONLY : 읽기 전용으로 파일 메모리에 매핑하기!
- fileDescriptor.startOffset : 파일내 데이터가 시작하는 바이트 단위의 오프셋 (시작위치)
- fileDescriptor.declaredLength : 파일의 길이!

fun detect(bitmap: Bitmap, viewMatrix: Matrix?): FaceExpressionResult {
        // 1) 전처리: 입력 Bitmap을 모델 입력 크기로 letterbox 리사이즈 후 픽셀 추출
        val lbInfo = preprocessBitmapToBuffer(bitmap)

        // 2) 모델 추론
        val floatOutput: Array<Array<FloatArray>> = if (isQuantized) {
            val outputSize = 1 * 7 * CANDIDATE_COUNT
            val outputByteBuffer = ByteBuffer.allocateDirect(outputSize).apply {
                order(ByteOrder.nativeOrder())
            }
            interpreter.run(inputBuffer, outputByteBuffer)

            outputByteBuffer.rewind()
            // quantized 값을 float 배열로 변환
            val outArray = Array(1) { Array(7) { FloatArray(CANDIDATE_COUNT) } }
            for (i in 0 until 1) {
                for (j in 0 until 7) {
                    for (k in 0 until CANDIDATE_COUNT) {
                        val quantized = outputByteBuffer.get().toInt() and 0xFF
                        outArray[i][j][k] = (quantized - OUTPUT_ZERO_POINT) * OUTPUT_SCALE
                    }
                }
            }
            outArray
        } else {
            val outputShape = arrayOf(1, 7, 5040)

            val outBuffer =
                Array(outputShape[0]) {
                    Array(outputShape[1]) { FloatArray(outputShape[2]) }
                }
            interpreter.run(inputBuffer, outBuffer)
            outBuffer
        }

        // 3) 후처리: bbox 좌표 변환 및 소프트맥스 적용 후 NMS 처리
        val detections = postProcess(
            output = floatOutput,
            lbInfo = lbInfo,
            originalWidth = bitmap.width,
            originalHeight = bitmap.height,
            viewMatrix = viewMatrix
        )
        val nmsDetections = nonMaximumSuppression(detections, NMS_THRESHOLD)

        return FaceExpressionResult(nmsDetections)
    }

전처리
- preprocessBitmapToBuffer(추후에 등장합니다!)를 통해, 입력 Bitmap을 tflite 모델의 입력 크기로 letterbox를 resize를 한 뒤, 픽셀을 추출해요!
모델 추론
- output의 형식에 따라(양자화를 했는가, 하지 않았는가?)에 따라 Zero Point, Scale를 추가 계산 해주는가 안해주는가를 따집니다! 양자화를 했으면 Zero Point, Scale를 계산해줘야 하며, 그게 아니면 바로 interpreter에 집어넣어서 추론을 시작해요!
후처리
- 다중 분류인가, 이중분류인가에 따라 좌표 변환을 해주셔야 해요. 또한 Non-MaximumSuppression(비-최대 억제)을 통해서 정확한 box를 선택하도록 하는 방법이에요!
- 비-최대 억제는 IoU(Intersection over Union) 이라는 개념을 알아야 하는데, IoU란, object Detector의 정확도를 측정하는데 이용되는 평가 지표에요! 쉽게 말해, (예측 박스 와 실제 박스의 교집합 / (예측 박스와 실제 박스의 합집합) 이랍니다! IoU가 크면 클수록 좋은 모델이란 소리겠죠?
- 바로 아래에서 nonMaximumSuppression, IoU 코드를 써놓을게요!

private fun nonMaximumSuppression(
        detections: List<FaceExpressionDetection>,
        iouThreshold: Float
    ): List<FaceExpressionDetection> {
        if (detections.isEmpty()) return emptyList()

        val sorted = detections.sortedByDescending { it.score }.toMutableList()
        val finalDetections = mutableListOf<FaceExpressionDetection>()

        while (sorted.isNotEmpty()) {
            val best = sorted.removeAt(0)
            finalDetections.add(best)

            val iterator = sorted.iterator()
            while (iterator.hasNext()) {
                val other = iterator.next()
                if (computeIoU(best.box, other.box) > iouThreshold) {
                    iterator.remove()
                }
            }
        }
        return finalDetections
    }

앞서 말씀드린 것처럼, 여러 박스 중, 가장 높은 확률을 가진 라벨을 가질거기 때문에, 그 부분을 처리하는 코드에요!
물론, 기본적인 iouThreshold 확률을 이기지 못한다면... 탈락이겠죠?
다음은 ioU 계산 코드!

private fun computeIoU(a: RectF, b: RectF): Float {
        val areaA = a.width() * a.height()
        val areaB = b.width() * b.height()
        if (areaA <= 0f || areaB <= 0f) return 0f

        val interLeft = maxOf(a.left, b.left)
        val interTop = maxOf(a.top, b.top)
        val interRight = minOf(a.right, b.right)
        val interBottom = minOf(a.bottom, b.bottom)
        val intersection = maxOf(0f, interRight - interLeft) * maxOf(0f, interBottom - interTop)
        return intersection / (areaA + areaB - intersection)
    }

실제 구역, 예측 박스 구역에 대한 ioU 계산 구하는 방법이에요! 차근차근 확인해보시면 이해하기 쉬우실거에요!

private fun postProcess(
        output: Array<Array<FloatArray>>,
        lbInfo: LetterboxInfo,
        originalWidth: Int,
        originalHeight: Int,
        viewMatrix: Matrix? = null
    ): List<FaceExpressionDetection> {
        val detections = mutableListOf<FaceExpressionDetection>()
        val candidateCount = output[0][0].size


        for (i in 0 until candidateCount) {
            // 모델 출력의 앞 4개 값: center x, center y, box width, box height
            val cx = output[0][0][i] * MODEL_INPUT_WIDTH
            val cy = output[0][1][i] * MODEL_INPUT_HEIGHT
            val bw = output[0][2][i] * MODEL_INPUT_WIDTH
            val bh = output[0][3][i] * MODEL_INPUT_HEIGHT

            val x1 = cx - bw / 2f
            val y1 = cy - bh / 2f
            val x2 = x1 + bw
            val y2 = y1 + bh

            val letterboxRect = RectF(x1, y1, x2, y2)
            // letterbox 좌표를 뷰 좌표로 변환
            var originalRect = letterboxToOriginalCoords(letterboxRect, lbInfo, viewMatrix)

            // 감정 로짓 3개 (예: Negative, Positive, Neutral)
            val exprLogits = floatArrayOf(
                output[0][4][i],
                output[0][5][i],
                output[0][6][i]
            )

            // softmax 확률 계산
            val exprScores = softmax(exprLogits)
            val maxScore = exprScores.maxOrNull() ?: 0f
            val maxIndex = exprScores.indexOfFirst { it == maxScore }

            // 임계값 이상인 경우에만 detection 추가
            if (maxScore > EXPRESSION_THRESHOLD) {
                
                detections.add(
                    FaceExpressionDetection(
                        expression = expressions[maxIndex],
                        score = maxScore,
                        box = originalRect
                    )
                )
            }
        }
        return detections
    }

후보군들을 쭉 뽑아서, 여러 박스별로 비교를 합니다!
yolo.tflite 모델의 경우, [a,b,c] 형식으로 나타나요. b 위치에서 0~3 index 까지는 x, y, width, height 좌표를 나타내고, 그 이후의 index는 label이에요~
letterbox 좌표의 경우, 실제 좌표가 아니기 때문에, View 좌표로 변환해줘야 해요! (letterboxToOriginalCoords)
exprLogits에서 output 라벨을 floatArray로 쭉 받아요! 현재 코드에서 3개만 있는 이유는 라벨이 3개 이기 때문입니다!
3개 중 1개의 값을 가지기 위해서 softmax 확률을 계산해서 라벨 중 가장 높은 확률을 가지는 label을 bbox는 가져가는겁니다!
모든 박스가 다 체크되면 안되기 떄문에 THRESHOLD를 통해서 신뢰도가 높은 박스만 살려놓고, detections에 담아 놓는 거에요!

private fun preprocessBitmapToBuffer(original: Bitmap): LetterboxInfo {
        // 모델 입력 크기로 letterbox 리사이즈 (targetWidth, targetHeight)
        val lbInfo = letterboxResize(original, MODEL_INPUT_WIDTH, MODEL_INPUT_HEIGHT)

        inputBuffer.rewind()
        lbInfo.bitmap.getPixels(
            pixels,
            0,
            MODEL_INPUT_WIDTH, //384
            0,
            0,
            MODEL_INPUT_WIDTH, // 384
            MODEL_INPUT_HEIGHT // 640
        )

        if (isQuantized) {
            // 각 픽셀의 R, G, B 값을 1바이트씩 저장
            for (pixel in pixels) {
                val r = ((pixel shr 16) and 0xFF).toByte()
                val g = ((pixel shr 8) and 0xFF).toByte()
                val b = (pixel and 0xFF).toByte()
                inputBuffer.put(r)
                inputBuffer.put(g)
                inputBuffer.put(b)
            }
        } else {
            // float 모델의 경우 [0,1]로 정규화하여 저장
            val floatBuffer = inputBuffer.asFloatBuffer()
            for (pixel in pixels) {
                val r = ((pixel shr 16) and 0xFF).toFloat() / 255f
                val g = ((pixel shr 8) and 0xFF).toFloat() / 255f
                val b = (pixel and 0xFF).toFloat() / 255f
                floatBuffer.put(r)
                floatBuffer.put(g)
                floatBuffer.put(b)
            }
        }
        inputBuffer.rewind()

        return lbInfo
    }

이 코드가 바로 Bitmap을 Buffer로 바꾸는 코드에요!
bitmap의 pixels을 데이터의 width, height를 통해 맞춰줍니다.
양자화를 했을 경우, toBtye를 통해 R,G,B 값을 1바이트씩 저장해줘요! 앞서 양자화의 inputBufferSize가 더 작다는걸 확인하셔야 해요! -> 그래서 빠른거에요 ㅎㅎ
양자화를 안했을 경우, float 모델을 [0,1]로 정규화 해서 저장을 해요. 그렇기 때문에, 255로 나누어서 [0,1]로 범위를 좁혀주는거에요~!

private fun letterboxResize(src: Bitmap, targetWidth: Int, targetHeight: Int): LetterboxInfo {
        val srcWidth = src.width
        val srcHeight = src.height

        val scale = min(targetWidth.toFloat() / srcWidth, targetHeight.toFloat() / srcHeight)
        val newWidth = (srcWidth * scale).toInt()
        val newHeight = (srcHeight * scale).toInt()

        val padLeft = (targetWidth - newWidth) / 2f
        val padTop = (targetHeight - newHeight) / 2f

        letterboxCanvas.drawColor(Color.BLACK)
        val dstRect = RectF(padLeft, padTop, padLeft + newWidth, padTop + newHeight)
        letterboxCanvas.drawBitmap(src, null, dstRect, null)

        return LetterboxInfo(letterboxBitmap, scale, padLeft, padTop)
    }

bitmap을 이제 target의 높이와 너비크기로 변경해서, 비율을 유지를 해줘요!

private fun letterboxToOriginalCoords(
        box: RectF,
        lbInfo: LetterboxInfo,
        viewMatrix: Matrix? = null
    ): RectF {
        val x1 = (box.left - lbInfo.padLeft) / lbInfo.scale
        val y1 = (box.top - lbInfo.padTop) / lbInfo.scale
        val x2 = (box.right - lbInfo.padLeft) / lbInfo.scale
        val y2 = (box.bottom - lbInfo.padTop) / lbInfo.scale
        val origRect = RectF(x1, y1, x2, y2)

        if (viewMatrix != null) {
            val pts = floatArrayOf(origRect.left, origRect.top, origRect.right, origRect.bottom)
            viewMatrix.mapPoints(pts)
            return RectF(pts[0], pts[1], pts[2], pts[3])
        }
        return origRect
    }

letterbox를 이제 실제 원본 좌표로 변환하는 방법이에요!

이렇게, 오늘은 YOLO 모델을 사용하기 위해 Detector를 알아봤습니다.

다음시간에는 이것을 어떻게 활용할것인가? 에 대해서 돌아오도록 하겠습니다.

읽어주셔서 감사합니다.

저작자표시 비영리 변경금지 (새창열림)

현재글On-Device AI ?

내일의 개발자 HJ's Blog

개발에 필요한 개념, 알고리즘 풀이 정리 통계학 정리 및 분석 내용 정리

BFS, 알고리즘, CS, 자바, 코딩, Java, 개발자, 수학, 프로세스, 프로그래머스, 네트워크, 정렬, 큐, sort, DP, OS, 운영체제, 백준, 동적프로그래밍, stack,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

내일의 개발자 HJ's Blog

On-Device AI ?

On-Device Ai 란?

구현을 해보자 ( by YOLO.pt -> YOLO.tflite)

안드로이드 스튜디오에서 작업을 진행해보자! (Kotlin)

'개발/코틀린'의 다른글

티스토리툴바