テキストの言語を自動検出する方法

読了時間 8分

自動言語検出の仕組みと無料検出ツール。

What is language detection and what it's for

Language detection automatically identifies what language a text is written in. Essential for: translation (Google Translate detects source language), content routing, moderation, data analysis, SEO verification.

Detect any text's language with the NexTools language detector.

How language detection works

1. N-grams: Analyzes 2-3 character sequences. "th" = English, "qu" = Spanish, "sch" = German.

2. Character frequency: ñ → Spanish. ü → German. ç → French/Portuguese.

3. Common words: "the/and/is" → English. "el/de/que" → Spanish.

4. Machine learning: FastText (Facebook) detects 176 languages at 95%+ accuracy.

Count words with the NexTools word counter.

Accuracy: how much text is needed

LengthAccuracy
1 word~50-70%
1 sentence~90-95%
1 paragraph~98-99%
1 page~99.9%

Hard cases: Similar languages (Spanish/Portuguese), very short text, mixed languages, romanized CJK.

Libraries and APIs for language detection

JavaScript: franc (offline, 187 languages). franc('Hello world'); // 'eng'

Python: langdetect (Google port), FastText (most accurate).

APIs: Google Cloud Translation, AWS Comprehend, Azure Text Analytics.

NexTools detects in-browser without external APIs. See our Base64 guide for encoding text before API calls.

Language detection in multilingual sites

Accept-Language header: Browser sends preferred language. Most reliable signal.

GeoIP: Location suggests language. Less reliable (tourists, VPNs).

URL-based: NexTools uses subdirectories (/es/, /ja/) — language explicit in URL. Best for SEO.

Cookie: Save user preference for future visits.

See our slug and SEO guide for URL translation.

Advanced use cases

1. Content moderation: Apply correct language filters.

2. Support classification: Route tickets to correct language team.

3. Web scraping: Classify scraped content by language.

4. Sentiment analysis: NLP models need language before sentiment. English model won't work on Spanish text.

How to use the NexTools language detector

The NexTools detector: paste text, get automatic identification with confidence level. All in-browser — text never leaves your computer.

Limitations and false positives

Proper nouns: "Madrid" could be any language — it's universal.

Technical text: Code mixed with comments confuses detection.

Transliteration: Japanese in romaji detected as another language.

Dialects: Can distinguish Spanish from Portuguese but not Mexican vs Spain Spanish.

Change text format with the NexTools case converter.

このツールを試す:

ツールを開く

よくある質問

How many words does the detector need to identify the language

1 sentence (5-10 words): ~90-95%. 1 paragraph: ~98-99%. Single words: ~50-70% accuracy.

Can it detect very short text language

1-3 word texts have low reliability. 'Hello' could be English or a proper noun. Short text detection is an estimate, not certainty.

Can it detect multiple languages in one text

Basic detectors identify the DOMINANT language. Advanced ones (Google CLD3) can identify language switches within text.

Does NexTools send my text to a server

No. Detection runs in your browser using JavaScript. Your text never leaves your computer.

Can it distinguish Spanish from Portuguese

Yes, with enough text (1+ sentence). N-grams and vocabulary differ enough. Words like 'não', 'você' clearly indicate Portuguese.

What algorithm is most accurate for language detection

FastText (Facebook, 176 languages, 95%+) and Google CLD3. For simpler projects, franc.js or langdetect are sufficient.