Wikisource:Tesseract OCR

Tesseract OCRツールは、人気のあるTesseract OCR engineのJavascriptへの移植版であるTesseract.js OCR engineにより、現在のページの画像からテキストを抽出するためのツールバーボタンをPage名前空間に追加します。

限られた言語のみサポートされていることに注意してください。

設定する

以下の行を各自のcommon.jsに追加することにより、Tesseract OCRが使用できるようになります。

mw.loader.load( '//wikisource.org/w/index.php?title=User:Putnik/TesseractOCR.js&action=raw&ctype=text/javascript' );

ガジェットのメッセージをお使いの言語に翻訳したい場合は、以下のように記述します:

var tesseractOcrI18n = {
	'loading tesseract core': 'Loading Tesseract core',
	'initializing tesseract': 'Initializing Tesseract',
	'loading language traineddata': 'Loading language traineddata',
	'initializing api': 'Initializing API',
	'recognizing text': 'Recognizing text',

	'no text': 'No text retrieved from Tesseract',
	'image not found': 'No image found on this page',
	'button label': 'Get text via Tesseract OCR',
	'loading indicator': 'Animated loading indicator',
};

mw.loader.load( '//wikisource.org/w/index.php?title=User:Putnik/TesseractOCR.js&action=raw&ctype=text/javascript' );

開発

oldwikisource:Wikisource:Tesseract OCR
スクリプト: oldwikisource:User:Putnik/TesseractOCR.js
ツールバーアイコン: 、