::: TTA표준화 위원회 :::

홈 > 표준화 개요 > TTA의 표준현황

표준번호	TTAK.KO-11.0280-Part2	구표준번호
제개정일	2021-12-08	총페이지	31
한글표준명	검증용 데이터세트의 밸런스 기반 인공지능 소프트웨어 신뢰성 평가 방법 - 제2부: 이미지 타입 밸런스 데이터 설계
영문표준명	A Method for Evaluating the Reliability of Artificial Intelligence Software Based on the Balance of the Validation Dataset - Part 2: Design of Image Type Balanced Data
한글내용요약	최근 산업별 다양한 서비스에 인공지능이 포함된 소프트웨어 도입이 확산되고 있으며, 이러한 서비스의 품질을 좌우하는 핵심 요인은 소프트웨어에 탑재된 인공지능의 정확성이다. 정확성을 검증하는 일반적인 방법으로, 학습용 데이터세트 외 별도의 검증용 데이터세트를 구성하여 인공지능 동작 결과의 출력 값과 비교하는 방식을 사용한다. 이 때 검증용 데이터의 표본 격차로 인하여 특정 영역의 데이터가 지나치게 편중되어 생산되거나 혹은 데이터 수집 과정에서의 학습 데이터가 배포된 인공지능 소프트웨어의 운영 데이터를 제대로 대표하지 못하고 특정 영역이 과대 대표되는 경우가 발생하면 이것을 데이터세트의 밸런스가 적절하지 못하여 발생된 샘플링 편향(sampling bias), 샘플링 오류(sampling error)라고 한다. 이 샘플링 편향은 인공지능 소프트웨어의 정확성 검증 결과를 왜곡시키는 주요 원인이기 때문에 현실 세계의 다양한 시나리오를 처리할 수 있는지를 평가하는 관점에서는 밸런스가 확보된 평가용 데이터세트를 활용하여 신뢰 수준을 평가하는 것이 필요하다. 이에 본 표준에서는 평가용 ‘이미지’ 밸런스 데이터세트를 구축하는 과정에서, 입력 데이터의 조건에 부합되는 데이터세트의 분포를 고르게 편성하기 위해 ‘이미지’ 데이터의 다양한 관점 특징을 반영한 밸런스 데이터 설계 방법을 정의한다.
영문내용요약	Recently, the introduction of software including artificial intelligence in various services by industry is spreading, and the key factor that determines the quality of these services is the accuracy of artificial intelligence embedded in the software. As a general method of verifying accuracy, a separate verification dataset is prepared in addition to the training dataset and compared with the output value of the artificial intelligence result. In this case, due to the sample gap of the verification data, the data in a specific area is produced due to excessive concentration, or the training data in the data collection process is not properly representative of the operational data of the distributed artificial intelligence software and the specific area is over-represented. When occurs, this is called a sampling bias or sampling error caused by an improper balance of the dataset. Since this sampling bias is a major cause of distorting the accuracy verification results of artificial intelligence software, from the perspective of evaluating whether it can handle a variety of real-world scenarios, a balanced evaluation dataset is used to evaluate the confidence level. It is necessary to do. Therefore, in the process of constructing the 'image' balance dataset for evaluation, in this standard, in order to evenly organize the distribution of the dataset that meets the conditions of the input data, we define a balance data design method that reflects the characteristics of various viewpoints of the 'image' data is used.
국제표준
관련파일	TTAK_[1].KO-11.0280-Part2.pdf

목록보기

이전: 디지털 트윈 기능 성숙도 모델
다음: 사이버-물리 시스템의 서비스 특성에 따른 자원 관리 참조 모델