Independent Office Meeting Minutes Text Extraction Dataset

#Text Classification #Information Extraction #Natural Language Processing #Corporate Meeting Minutes #Administrative Document Management #Automated Report Generation

500 records
1.3G
TXT
CATL
MOBIUSI INC

Updated:2026-06-20

AI Analysis & Value Prop

In modern enterprises, meeting minutes are a crucial component of daily operations. However, extracting key content poses challenges due to the richness of information and the need for structured data. Existing solutions largely rely on manual review, which is inefficient and prone to errors. This dataset is constructed to support automated text information extraction, enabling efficient information classification and management. Data collection involved collaboration with multiple enterprises, using transcription devices and software to capture real meeting minutes in natural office environments. During data annotation, multiple rounds of annotation and consistency checks were conducted, and a team of experienced linguists reviewed the data to ensure high accuracy and reliability. The annotation process involved more than 5 experts with a linguistic background to maintain the dataset's professional level. Data preprocessing involved text cleaning, syntactic analysis, and semantic annotation, utilizing cutting-edge natural language processing techniques to enhance model training effectiveness. Data is stored in a uniform TXT format, with entries classified using embedded labels, facilitating easy access to information and model training.

Dataset Insights

Sample Examples

{}

Technical Specifications

Field	Type	Description
file_name	string	File name
meeting_topic	string	The main topic or agenda mentioned in the meeting record.
participants	string	The list of attendees in the meeting.
key_decisions	string	Important decisions or conclusions reached during the meeting.
action_items	string	To-dos or follow-up tasks assigned from the meeting.
meeting_location	string	The specific location where the meeting took place.
meeting_duration	string	The total duration of the meeting.
meeting_summary	string	A brief overview of the meeting content.
content	text	The core main part of the document, including various key information of the meeting (time, venue, participants, etc.) and detailed contents such as meeting discussions, resolutions and plans
title	string	Identify the core theme of the meeting minutes document, clarify the department, cycle and meeting type to which the meeting belongs, so as to quickly identify the purpose and ownership of the document

Compliance Statement

Authorization Type	Proprietary - Commercial AI Training License (No Redistribution)
Commercial Use	Requires exclusive subscription or authorization contract (monthly or per-invocation charging)
Privacy and Anonymization	No PII, no real company names, simulated scenarios follow industry standards
Compliance System	Compliant with China's Data Security Law / EU GDPR / supports enterprise data access logs

Frequently Asked Questions

What types of text data does this dataset mainly include?: This dataset mainly includes text data from day-to-day meeting records of independent offices.

In which scenarios can this dataset be used?: This dataset can be used to improve extraction and classification of meeting record texts and is suitable for text analysis applications in everyday office scenarios.

What are the main benefits of using this dataset?: The main benefits of using this dataset include improved automation of independent office meeting records and enhanced accuracy in text classification and information extraction.

How can this dataset be used for text classification research?: Researchers can use the meeting records provided in the dataset to train models and develop more accurate text classification algorithms.

What is the significance of this dataset for natural language processing?: For NLP researchers, this dataset aids in developing and testing new information extraction and classification models, optimizing natural language text analysis techniques.

Can't find the data you need?

Post a request and let data providers reach out to you.

Get this Dataset

Verified for Enterprise Use

Cite this Work

@dataset{Mobiusi2026,
  title={Independent Office Meeting Minutes Text Extraction Dataset},
  author={MOBIUSI INC},
  year={2026},
  url={https://www.mobiusi.com/datasets/27f0cf4da25a439cdea933b34fccf7be?cate=3},
  urldate={2026-02-04},
  keywords={Meeting Minutes Dataset, Text Extraction, Text Classification, Information Management, NLP Dataset},
  version={1.0}
}

Using this in research? Please cite us.

placeholder

Products

Scene

Domain

Modality

Task