Home/Industry/Multilingual Character Detection Dataset

Multilingual Character Detection Dataset

V1.0
Latest Update:
2025-10-15
Samples:
15000 records
File Size:
4.1G
Format:
JPG/PNG/JSON
Data Domain:
Image
Holder:
MOBIUSI INCMOBIUSI INC
Industry Scope:
Quality Control | Language Compliance | Export Verification
Applications:
Character Recognition | Image Classification

Brief Introduction

The current industrial landscape faces challenges in ensuring compliance with export language requirements. Existing solutions often fall short in accuracy and efficiency, leading to potential errors in product labeling and communication. This dataset aims to address the need for high-precision character detection in multilingual contexts, ensuring that products meet export language standards. The data collection involved capturing images from various industrial environments and labeling them with the corresponding text and language. Quality control measures included multiple rounds of annotation, consistency checks, and expert reviews to maintain high accuracy. The dataset is organized in JPG format, with structured metadata for easy access and analysis.

Sample Examples

ImageFile NameResolutionLanguage TypeCharacter CountText PositionFont SizeFont TypeOCR Recognition AccuracyDistortion Level
5c7c120a68d7fc68702a79e30b0fdfc0.png1280*1516English, Japanese, ChineseApproximately 80 charactersTop, RightMedium-small fontSans-serif0.85No distortion
1a150c1da065ef048bc4202bf4693d7a.png1280*1387Chinese, EnglishAbout 50Top, Center, BottomMediumSans-serif0.95No Distortion
2a7109194c0b29e7214fb66159bc44ca.png1280*1411Chinese, EnglishAbout 120Top, Center, BottomMediumSans-serif0.95No Distortion
1ceb4f11a0dc00b6982863bc9472042e.png1280*1552Chinese, English40Top, Center, BottomSmall to Medium SizeRegular Sans-serif0.95No Distortion

Data Structure

FieldTypeDescription
file_namestringFile name
qualitystringResolution
language_typestringThe type of language contained in the image, such as English, Chinese, French, etc.
character_countintThe total number of characters in the image.
text_positionstringThe specific position of characters within an image, such as top, center, bottom, etc.
font_sizeintThe font size of characters within an image.
font_typestringThe specific font type used in an image.
ocr_accuracyfloatThe accuracy of optical character recognition, ranging from 0 to 1.
distortion_levelstringThe clarity and distortion level of characters in an image, such as no distortion, slight distortion, severe distortion.

Compliance Statement

ItemContent
Authorization TypeCC-BY-NC-SA 4.0 (Attribution–NonCommercial–ShareAlike)
Commercial UseRequires exclusive subscription or authorization contract (monthly or per-invocation charging)
Privacy and AnonymizationNo PII, no real company names, simulated scenarios follow industry standards
Compliance SystemCompliant with China’s Data Security Law / EU GDPR / supports enterprise data access logs

Can't find the data you need?

Post a request and let data providers reach out to you.