How do I make a PPC Dictionary for myself? - Windows Mobile Development and Hacking General

How do I make a PPC Dictionary for myself? - Windows Mobile Development and Hacking General

Can anyone provide advice/tool/pointers on how to go about making a dictionary for PPC.
I'd like to have a langauge translation dictionary that I can use on my PPC (I-Mate Jam). I'd prefer to buy one for PPC (as I had on Palm), but the particular langauge pair I want is not available (at least not with any decent sized lexicon) on the PPC platform in any decent size.
I have a very good of a langauge dictionary on that pair as a Windows XP program (Ultralingua, specifically). So I guess that means by inference I've got a word database sitting there in its lexicon file (.uld format, but seems to be just a grabled looking text file when i open it with notepad).
So I'd like to take that lexicon that I have sitting, and create a dictionary database file which I can utilize on an existing PPC dictionary program - or even just a generic PPC reader with dictionary capability.
Can anyone give me some advice on how to do this for myself?
- what tools/programs do I need to look for?
- what do I need to do to execute this?
Any advice is helpful. Thanks!

it's not really that simple here are some posts about it
http://forum.xda-developers.com/viewtopic.php?t=8231&highlight=dictionary
http://forum.xda-developers.com/viewtopic.php?t=15059&highlight=dictionary
http://forum.xda-developers.com/viewtopic.php?t=12721&highlight=dictionary
http://forum.xda-developers.com/viewtopic.php?t=9191&highlight=dictionary
http://forum.xda-developers.com/viewtopic.php?t=15875&highlight=dictionary
http://forum.xda-developers.com/viewtopic.php?t=3839&highlight=dictionary

I'm not talking about adding word to T9 or something like that. I'm learning another language and I use my phone/PDA as a portable dictionary when I am reading foreign publications for practice.
In other words, what I am talking about is taking a 20-30,000 word datanbase from a Windows OS program, grad the words and index table and then port that over to a form at which can be ready by a PPC reader ri dictionary of some sort.

i know as it says in the links then
you should be able to change the pocketword dic
but outlook use a different one and i havent heard about
anybody being able to change it
of cause there are links to 3th party programs in some of the posts
maybe they can

Just looking at a FreeCab website and came across this:
- DM Dictionary 4.0.1
(Added: 08.09.04)
http://dmdictionary.sourceforge.net/index.php?pageid=4
(Allows you to use a simple text file (.txt) to find
the translation of every word you need. It finds
a word in 3 seconds on a file with 11'000 words)
ARM: DM_Dictionary_ARM_4_0_1.zip (419KB)
MIPS: DM_Dictionary_Mips_4_0_1.zip (582KB)
SH3: DM_Dictionary_SH3_4_0_1.zip (464KB)
Language Files:
http://dmdictionary.sourceforge.net/index.php?pageid=4

Related

Hebrew localization for Universal /WM5.0

Hi all,
It is for quite some time now that i am looking for a hebrew localization pack for my Universal and apparently there is no cheep solution for that. (well there is a 130$ one but it is too expensive for me...) .
I've searched the net for a cheep solution but it seems that all software companies that develop these kind of localization do not know all about Hebrew and thus i get a software that write in mirror like way and some more little things (like no hebrew Email/sms support, no hebrew on PIE etc...).
all i want is to be able to send/recieve emails/sms and have the hebrew keboard layout (in addition to the english one) so that i can write documents or sms/emails (PIE is less important to me ) - in the right hebrew direction (right to left).
I am looking for someone able to give me some guidelines on how to start this project of hebrew localization - if i will finnish this, i will share it freely. i look for no profit.
thank you all who can help me.

Hi Yol!
As I understand it Hebrew is a 'right to left' language, a functionality which Windows Mobile does not support. (unlike it's big brother where it is built in).
Now it's a noble idea you have to create a free version, but I suspect there is a reason software companies charge as much as they do without going out of business.
Basically, any localization has 3 parts:
1) keyboard - that's relatively easy. It's just a DLL that implements a COM object. There is an article on this at www.pocketpcdn.com with code sample. Look under 'SIP' (Software Input Panel).
2) The interface or MUI's - these are DLL's that contain the interface resources in your language. You can create those from a ROM dump using Visual Studio or any other resource editor. Search for Asukal's posts on this forum, he is an expert at this.
3) This is the tricky part - for a language that works right to left you need to reverse the system input and output of text somehow. This is something beyond my level of programming sorry.
Well hope that starts you off. By the way do you have much experience in programing? This sounds like a really ambitious undertaking. 8)
Good luck!

hebrew for qtek
www.eyron.net
works perfect with my qtek 9000

Dictionary

Hola guys;
I seeked a bit for some quick dictionnaries (like, I can use my PDA to translate the word I see written while I walk the street)
Of course, I searched everywhere and beside really nice paidware for 29.9$, I didn't find what I seeked.
As it was really simple, I decided to try to make it. This is so a first try of WM programation for me, use it if you like it.
In short : it's just a "CSV" (comma separated values) reader that will let you search one entry easily.
After installing it, only the fr<->en dictionnary is installed. You're free to put other and remove it.
Options:
* "Filter" let you choose the column that is searched (ie. the one on which words are filtered by the text entered)
"start/contain" switch between filtering the words "containing" or "starting by" the entered text.
* "Dictionaries" let you choose one of the dictionary file you installed.
I used the same format as this website: http://www.dicts.info/uddl.php
You can download here the dictionary content you wish.
Note:
- The website has a lot of dictionaries but they're not really complete.
- Once downloaded the file, name it "AA-BB.dic" where AA and BB are the language reference (fr-en.dic ; français-anglais.dic ; french-english.dic ; ... as you wish), then copy it to the installation folder (the one that contain fr-en.dic at the begining).
- For the dictionary to be usable you have to restart the program.
If you know a better source for dictionaries data, tell me, I'll use it.
The program is just a draft: it will have poor performances with big dictionaries.
If somebody is really interested, I can fix some stuff.
Here it is : http://www.4shared.com/file/61316832/644de7d3/UDic.html

You rock! I love simple and brilliant software.
How many words by the way?

don't know, it depends on the dictionarry you download ... the french-english should have between 5000-7000 words.
But some are really missing ... that's why I search another source of "dictionary data"

Developer's Notepad

Heyas!
Here's a rough of the notepad of my dream
Now it have:
a support of ANSI, UTF-8, UCS-2 for reading
use UTF-8 when saving into a file
syntax highlighting for C/C++/Java and a bit of HTML
tested (few) on megabyte files
a piece of slowness in some actions
I just want ask you, All, for opinion - does it have a right to live with current performance of the editor? Yeah, it should be equipped with
editing of a multiple documents simultaneously (tabbed)
customizable highlighter for a more languages
actions with binding on hardware keys
"recent files" list
search and replace
or evenmore ide-stuff: autocomplete, quick navigation thru source code, an external compiler support
(...)
But it will be later. I'm not sure if it too slow and huge for a pocket source code editing Or maybe there is a good and feature-rich notepad/IDE with all of I've mentioned above?

Nice idea. I'll try it out now, and see what happens.

I was able to open HTCHomeSettings.xml file that is too big for Total Commander or notepad in my phone (59.3kb).
Now I have a tool I can tweak my Manila2D settings while on the road.
I did notice the soft buttons are not in English, but everthing else shows up as English.
By the way, I used UPX standard compression and got your file to 144k before I put it on my phone.

NICE! one, this goes into my "useful apps" folder on my card

Really nice fast app.
I currently use cke (http://www.animaniak.com/cke/cke_main.asp) as my text/code editor on my WinMo device.
Have a look at the website and this app for some really good ideas to implement into yours.
The biggest problem I have with yours is that I can't browse the full file system.

Its a great start, and it sounds like you have a plan to make it a great piece of software.
It looks like you made space for the line numbers but they are not showing.... is that a bug or something you have just not completed?
An integrated ftp uploader/dowloader may also be really usefull...... So you can edit your website easily on the go.

Alternative dictionaries

The Nook dictionary is pretty handy in that you can look up words directly from the book that you are using. There are any number of dictionary applications that you could install, but none of them have the integration of the built-in dictionary.
It has been worked out how to replace the built-in dictionary with one generated from public domain bilingual dictionary databases. freedict.org is one source.
So, is there interest in say, French/English, German/English, Spanish/English dictionaries?
The downside is that you can't swap dictionaries in real time. Pick one.

Where do I download the dictionaries?

Would like an english->french dict, do you have a link to something about it ?

Oh, great, there is some interest!
I haven't posted the converter or any of the databases yet. As a newbie here I wasn't sure about posting links or 15 Meg attachments.
The dictionaries from freedict.org are pretty basic word translations. There are some multi-word phrases, but of course the lookup on the Nook only works on single words.
The original dictionary also has a database of inflected words, so that a lookup of "cows" will go to the headword "cow". I know of no source for inflected words in other languages.
For an English/French dictionary you could keep the original English inflected word database and replace the headword database with an English/French database.
I also want to make a companion EPUB book dictionary so that you could search (if necessary) arbitrarily for words.

Ok, Dimdamm, you can try this:
It's an English->French dictionary. It only has 7461 entries, sorry.
Download it, unzip it to eng-fra.db, ADB it to /system/media/reference/basewords.db
For safety:
Do this after a reboot and before loading the reader app.
Backup the old basewords.db
Good luck.
I'm not sure how it will react to finding words in the inflected database but not finding the corresponding entry in the headword database.
Note: Deleted obsolete file. See below for newer version.

Nice ! I downloaded the fr-en dictionary on freedict.org and opened it on my pc with goldenDict to be able to compape.
So :
- with some words, it's working, but there is still a bug, see below.
- with some (mostly the differents verbs form, but not only), the window appears, but is empty And the word doesn't exist in the dictionary on my pc. There is probably a conflict with the remaining b&n files.
- and there also a lot of "word not found", but it's not a bug.
The bug : it does not show what is on the 2nd/3rd/.. line in the dict file. And if there are differents definitions on the same line, with a ";" between them, it will show them on different lines on the nook. Picture, coz with my terrible english, I'm not sure you will understand : s4.noelshack.com/old/up/bug-b7cc55012.png -> the 2 words from the 2nd line are missing.
Im using another en->fr dictionary with coolreader, with a lot more words, stardict format, can you convert it ? It's attached on my post.

Yes, a blank page will show up if you lookup a word that is in the inflected database but not in the headword database. I might write something to prune down the English inflected words so that it will just say "not found".
Ok, thanks for the feedback. I missed the case where there are multiple senses, like in "madness". I'll get on that.
I'm still mystified why on that PNG that you posted the pronunciation is not there. I guess that that is not an actual screen shot?
Thanks for your help.
I'll look at that file, but there is the question of format and copyright.
P.S. Oh! I know why I missed that multiple sense. All the dictionaries are quite different. The first one I worked on just had everything under all the same sense.

Renate NST said:
I'm still mystified why on that PNG that you posted the pronunciation is not there. I guess that that is not an actual screen shot?
Click to expand...
Click to collapse
Yes, the nook part is from paint, the pronunciation is here, even if pretty buggy

I've got it so that it is picking up all the senses.
I'm working on making nice markup but still keeping the size minimal.
The HTML display for definitions does not support <ul> or <ol>
The pronunciation is "buggy" because the Ascender Sans font that is used does not have the full IPA pronunciation glyph set. The actual HTML is correct.

Ok, here's a new version of the English-French dictionary with all the senses in it. I hope that it has sufficient madness in it for you.

Renate NST said:
Ok, here's a new version of the English-French dictionary with all the senses in it. I hope that it has sufficient madness in it for you.
Click to expand...
Click to collapse
You can do:
mv /system/media/reference /system/media/reference_
ln -s /data/reference /system/media/reference
And put all new stuff into /data/reference - it is quite more space there.
Although, I think NST built-in dictionary browser is pos (1)
Better get something else, but you’ll need an alternative reader as well.

Mmm, linking to another directory is not a bad idea, but since the original dictionary is 53 Megs and the biggest dictionary that I've converted yet is only 17 Megs, I'm not sure about space considerations.
Yes, the supplied English dictionary is quite stinky. The problem is that they did a very haphazard conversion from a PDF document. Much of the markup does not have any whitespace between elements. e.g. Cows<b>are</b>animals should be Cows <b>are</b> animals. There are also problems that they render café as caf{eacute}.

Renate NST said:
Yes, the supplied English dictionary is quite stinky.
Click to expand...
Click to collapse
Hyperlinks don't work, tons of other things are wrong - no need to discuss.
Yes, it's relatively simple to convert DSL dictionary to NST.
But after short time, I'm sure, you'll come to the same conclusion - better use something else...

Yes, the lookup has a problem with either links in general or else supporting the custom dictionary:// protocol.
The Nook native reader mostly does what I want, that is, display sequential pages with a minimum of flashing. The OverDrive application works on DRM library books (through a different path, not using Adobe Digital Editions), but flashes much more. It seems to me that any other reader would require cracking the DRM of the books that you want to read.
I started this thread mostly to get a line on free and distributable dictionaries that could be converted and posted. I began this project on a small dictionary that I had bought years ago.
Not having heard of it before, I Googled "DSL dictionary". The Dictionary of Scots Language looks very interesting!

Renate NST said:
Not having heard of it before, I Googled "DSL dictionary". The Dictionary of Scots Language looks very interesting!
Click to expand...
Click to collapse
From Wiki: DSL, the format of user friendly dictionaries for ABBYY Lingvo before compilation into LSD format
http://informationworker.ru/lingvo.en/dsl_main_dlg.htm
You can find many dictionaries in DSL format, than convert to something else.

Hi all,
I'm Spanish and I'd like to change the English dictionary by Spanish dictionary.
Firstly I have to say that I don't have any idea of dictionary formats and I'm totaly lost... but I've found that the "dict" format is pretty extended.
I got a dict Spanish dictionary so... Is there any way to convert this "dict" format (.dz .idx .ifo) to the nook database format (.db)?
Thanks in advance!
Edit: Also... could be possible to convert an ebook dictionary (ePub, MOBI, or whatever) to the nook dictionary format?

Right now, the converter I have works on the TEI-XML format, which is what freedict.org uses. Freedict.org only does bilingual dictionaries.
There are always dictionaries floating around, but the question is which of them are really free?
Were you looking for a straight (not bilingual) Spanish dictionary?

I've found this one:
http://code.google.com/p/tokland/downloads/detail?name=drae-2009-1.tgz&can=2&q=
It was collected by script from the online official dicctionary of Spanish language (www.rae.es).

Mmm, since they sell books and CDs of this and they don't offer download, a scraped version from a website is probably not really legit.
I will probably buy a real dictionary on CD and convert it for my own use, but I wouldn't distribute it.
As far as I know, freedict.org is about the best choice right now.
I'm looking into using Wiktionary.

I am spanish too. I created my own ones...
English-Spanish dictionary from google translator. The script I programmed would be valid to create from english to any other language such as french or german.
It works great with the special chars, and gets fully integrated. I used the original nook word list to perform the requests so no problem at all.
Now I am working with spanish definition dictionary, and I used a similar system, with a spanish word list (around 100.000 words) and using RAE for the definition requests.
Problem here comes with the inflected words... I am trying to program the ortografy rules so I can generate all the regular ones from the basic word list but it works with more errors than I would like. How are you managing with this table when you change the original baseword list?

Disclaimer 1: I am neither a lexicographer nor a linguist. The dictionaries I have put together or organized are as much demonstrations as they might be practical tools (maybe more demonstrations). I've tried not to introduce any errors, but I am not responsible for erroneous material that was already present.
Disclaimer 2: No attempt has been made to alter or "improve on" the internal structure of the dictionaries, which are modeled after the stock dictionary and comparable to it in file size. These dictionaries may not be suitable for all users.
Note: An earlier forum thread with the most information on alternative dictionaries is here.
I first became interested in the structure and potential production of dictionaries for the NST/G when I was working on updating the UK version of the ROM for FW 1.2.2. I saw that dictionary management was built into the Settings app and that got me thinking. Of course it doesn't work on the UK ROM and I now think it's doubtful that there ever were any non-English dictionaries available for download. Still, the seed was planted.
In conjunction with the release of a Dictionary Management app for the NST/G, I am making available a set of single language dictionaires and three sets of translation dictionaries for the languages originally supported on the UK ROM. These dictionaries are NOT, however, for the UK ROM but rather the more common US version of the OS.
The first set of dictionaries I built from scratch using the "translation" table of Wiktionary databases and an adapted Python routine I discovered while researching. These contain more words than the second or third sets of dictionaries, but they include incomplete entries as well as complete ones. The simplest entries are just word→word with no other information. The more complete entries include a short contextual sense in the first language (practically a definition), the part of speech, gender (if applicable and available), as well as a list of possible words in the second language.
Just as I had wrapped up my "final" pair of dictionaries my wandering searches delivered me to Wikdict. There the clever people had done some amazing cross referencing of the three databases that are involved with each language pair. From this they generated translation dictionaries in Stardict format. After decompiling one I was so impressed I started again on another set of dictionaries. This second set covers fewer words since only the complete entries from Wiktionary are used.
And then there is the Wiktionary site of Matthias Buchmeier. Among others, he has assembled translation dictionaries in the ding format (text--yeah, I never heard of it either). I liked these also, and although they were more difficult to work with, I gave them a whirl to produce a third set of translation dictionaries. Now I'm in rehab.
Edit: and from my little padded cell I managed to sneak a peek at the WWW and finally found a source for single-language dictionaries based on Wiktionary, thanks to Mickaël Schoentgen et. al. Although the Italian and Spanish dictionaries are on the small side, these are in a format similar to the Oxford English dictionary in which all senses and part of-speech variations of a word are in a single citation, so you can't really compare "word" counts with the translation dictionaries.
Each dictionary consists of two files: basewords.db and inflectedwords.db. If you're not interested in the Dictionary Management app you can still use these dictionaries. However, contrary to some of the posts in the long-ago thread on alternative dictionaries, it is not wise to simply replace the stock files with these dictionaries (even after backing up the stock files). During the development of the Dictionary Management app I noticed that over time the available space in /system decreased during dictionary swaps. It took me a lot of fooling around and research to sort out the remedy, so if you elect to skip the app, be sure to follow the manual dictionary installation instructions carefully. Each of the basewords.db files that go from English→whatever use the same inflectedwords.db (not the same as the stock file). If the dictionary goes from whatever→English, there is an inflectedwords.db specifically for that language (the same inflectedword files are used for any of the three sets of dictionaries).
For each dictionary you must download one basewords file and one inflectedwords file. There is only one inflectedwords file for each language, however, so you don't need to download duplicates. You want an inflectedwords file for the first language in a pair.
Single Language (Wiktionary, after Mickaël Schoentgen et. al.)
basewords.db:
French (437,983 entries!)
German (135,658 entries)
Italian (52,626 entries)
Spanish (55,631 entries)
(you also need an inflectedwords.db file from below matching the language)
Translation
basewords.db:
Set 1 (direct from Wiktionary translation table)
English→French (114,000 entries, 70,244 complete)
English→German (81,200 entries, 68,874 complete)
English→Italian (74,190 entries, 65,281 complete)
English→Spanish (67,647 entries, 65,623 complete)
French→English (103,760 entries, 90,230 complete)
German→English (113,542 entries, 73,286 complete)
Italian→English (54,743 entries, 29,204 complete)
Spanish→English (51,124 entries, 13,101 complete)
Set 2 (Wiktionary via Wikdict)
English→French (44,208 entries)
English→German (41,233 entries)
English→Italian (35,971 entries)
English→Spanish (39,855 entries)
French→English (73,751 entries)
German→English (52,115 entries)
Italian→English (21,112 entries)
Spanish→English (8,726 entries)
Set 3 (Wiktionary after Mattias Buchmeier)
English→French (81,464 entries)
English→German (79,068 entries)
English→Italian (59,474 entries)
English→Spanish (72,176 entries)
French→English (93,969 entries)
German→English (83,569 entries)
Italian→English (150,625 entries)
Spanish→English (108,852 entries)
inflectedwords.db:
English (105,660 forms)
French (284,435 forms)
German (631,222 forms)
Italian (313,537 forms)
Spanish (488,956 forms)
For comparison with the values shown above, the stock basewords.db contains 86,301 entries. That is supplemented by the biographical and geographical dictionary which contains 15,745 entries. The dictionaries based on Wiktionary databases contain bio/geo entries (too many, if you ask me...) along with the other words, so for typical words you might look up the number of entries is somewhat inflated.
Other dictionaries
There are many, many, many language pairs among the databases and files from Wiktionary and Wikdict. I didn't look very closely and some might be quite small like Spanish (or even smaller!), but if you have an interest in trying your hand at a different combination, there is more information in the technical section on making dictionaries. There are single-language databases but they don't include definitions or even senses. They seem to mostly focus on statistics and also sometimes include genders and parts of speech. It is possible to construct a very simple dictionary using, say, the German-English pair, ignoring the English translations in the construction of the database. This would only work for the complete citations, of course, so you'd have smallish dictionaries. However, I think the data represented in the single-language dictionaries by M. Schoentgen are much more extensive.
Problems, Probleme, problèmes, i problemi, problemas
Single-word lookup for translation is rife with challenges, especially where multi-part verbs are concerned. It's also true that some words don't have single-word equivalents in some languages. Anyone who has ever struggled with a translation dictionary while enrolled in a first-year language course knows this already. Therefore the more contextual information available in the lookup citation, the better. That's one reason why I like the Wiktionary databases--when the citation is complete. Like Wikiwhatever, Wiktionary is, however, a work in progress. Data dumps are annual and one might expect some progress each year. Exactly why Spanish has received so little love is a mystery to me, considering the large number of Spanish speakers around the world. The small size of the Italian databases is a little easier to understand.
For the most part, the Lookup function behaves as you might expect but French presents some problems. Lookup is equipped to deal with hyphenations that span lines (there are hyphen dictionaries for various languages buried in the NST). But Lookup bracketing eschews any other punctuation as far as I can tell. The rather extravagent use of the apostrophe (or what looks like an apostrophe--sorry, my French is limited to culinary and operatic) in written French means that only part of a word will sometimes be selected. It is possible to drag the selection brackets across an apostrophe and on to the terminus of a word, but it is difficult.
Finally, the function of the inflectedwords.db is spotty. Based on my tests even the stock dictionary sometimes fails to pick up an inflection and properly direct it to the baseword ("word not found"), even though I can see with my own eyes the correct inflection and referral in the database. Easy enough in your native language to backup on an ending you recognize to find the baseword. But in a translation dictionary you may not be as familiar with inflections. Your mileage may vary.
Irony \ˈīrənē\
While working with the UK ROM got me started down this path, I have not been successful in creating any dictionaries for it. There is more discussion of this in the technical section on stock dictionaries. Supposedly it has been done in the past (see: https://forum.xda-developers.com/t/...nario-en-espanol-para-nook-glowlight.3554472/). While the post is not specifically about the NST/G, the dictionary format is apparently the same. I sent the member a message but as he has not been seen since 2019, the likelihood of a reply is pretty small.
Edit: 9-6-22--Thanks to @backup2 for refreshing the download link to the Spanish language dictionary. This should work on the UK version of the ROM. Manual dictionary installation instructions below should be adapted to the appropriate directories. The name of the dictionary file must remain ox_en_GB.db or the NST will not recognize it.
Manual dictionary installation
My key discovery in the development of the Dictionary Management app is that com.bn.nook.reader.activities must be killed prior to moving any dictionary files. This process generally runs in the background and it has tendrils attached to the dictionary databases. If you don't kill the process, you may not get back all of the free space due when removing the stock dictionaries, or removing custom dictionaries and replacing with the stock dictionaries. At some point, you will not even have enough free space left to restore the stock dictionaries, even though there should be plenty. If reading this makes your head hurt already, you'd be happier installing the app. Otherwise, read on.
You cannot safely swap dictionaries without ADB (or the app)
Copy the basewords and inflectedwords files from the dictionary you want to the root of your sdcard.
Connect to your device with ADB (via USB or WiFi). Execute the following:
Code:
adb shell pidof com.bn.nook.reader.activities
[a four digit number is returned*]
adb shell kill [the four digit number--no square brackets]
*If you get a blank response rather than a 4-digit number, it is likely that com.bn.nook.reader.activities has died a natural death. If so, you can skip the "kill" command.
Now you need to backup the stock dictionary. If you use an sdcard that is easiest:
Code:
adb shell mkdir /sdcard/Dictionary
adb shell mount -o rw,remount /dev/block/mmcblk0p5 /system
adb shell cp /system/media/reference/basewords.db /sdcard/Dictionary/basewords.db
adb shell cp /system/media/reference/inflectedwords.db /sdcard/Dictionary/inflectedwords.db
If you are uneasy about this, check with a file manager that the operations above did what they should have (don't leave the ADB connection or you might have to kill com.bn.nook.reader.activities again!)
Now to delete the stock dictionary (not the backup):
Code:
adb shell rm /system/media/reference/basewords.db
adb shell rm /system/media/reference/inflectedwords.db
Now to copy the new dictionary to its proper place:
Code:
adb shell cp /sdcard/basewords.db /system/media/reference/basewords.db
adb shell cp /sdcard/inflectedwords.db /system/media/reference/inflectedwords.db
adb shell chmod 644 /system/media/reference/basewords.db
adb shell chmod 644 /system/media/reference/inflectedwords.db
That should do it. When you access a book, com.bn.nook.reader.activities will restart and you'll be using a new dictionary.
To restore the stock dictionary:
Code:
adb shell pidof com.bn.nook.reader.activities
[a four digit number is returned]
adb shell kill [the four digit number--no square brackets]
adb shell mount -o rw,remount /dev/block/mmcblk0p5 /system
adb shell rm /system/media/reference/basewords.db
adb shell rm /system/media/reference/inflectedwords.db
adb shell cp /sdcard/Dictionary/basewords.db /system/media/reference/basewords.db
adb shell cp /sdcard/Dictionary/inflectedwords.db /system/media/reference/inflectedwords.db
adb shell chmod 644 /system/media/reference/basewords.db
adb shell chmod 644 /system/media/reference/inflectedwords.db
You can remove the backup files in /sdcard/Dictionary with a file manager.
Spoiler: Technical (stock dictionaries)
The stock dictionary for the US version of the NST/G consists of four databases: basewords, inflectedwords, bgwords, and fwp. The files are found in /system/media/reference. The first two files are of primary interest. "basewords.db" contains the words that will first be checked once the Lookup function in the stock reader has selected a word.
If no match is found, then the word is sought in "inflectedwords.db". This table consists of variations on the basewords based on case, number, tense, etc. (in other languages, gender also). If a variation is found it will point to an uninflected baseword. Some inflected forms are targeted to specific basewords to reduce spurious usages. For example, the inflected form "butterflied" points specifically to the verb baseword which means to spread out flat, as in "to butterfly a leg of lamb". The inflected form "butterflies", on the other hand, points to the noun baseword which refers to the insect as well as those funny feelings in your stomach. Many inflected forms are not so targeted.
Lookup also checks bgwords.db, which contains biographical and geographical names. Finally, the fwp.db contains "common" foreign words and phrases which one might encounter in English. As the Lookup function can only select one word at a time (without a lot of fuss), this database is mostly useless. These last two databases can remain in place when alternate basewords and inflectedwords databases are substituted for stock. They will continue to function normally.
The structures of the four databases are similar only in that the first field is populated by the potential Lookup word(s). In inflectedwords.db, the second field is populated by the baseword pointers, with [n] (where "n" is an integer) following pointers that are targeted to specific baseword senses.
{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
In the other three databases, the first field is likewise populated by potential Lookup words. The second field, however, consists of a binary file (or BLOB, in SQlite terminology) which contains the explanatory information (part of speech, pronounciation, definition, etc.). That information is formatted as HTML and then (pk)zipped. The same Lookup word with a different sense (take the "butterfly" example from above) will have a numerical indication after it, e.g., butterfly[1] and butterfly[2]. In the HTML structure this number will appear as a superscript before the word. So if one looks up "butterfly", both senses will be displayed, but in separate groupings. However, if an inflected form is the Lookup word, it is possible that only one sense will be shown since the inflected form might be targeted at just one sense for which the inflection would only make sense (as in the butterfly example).
In the UK version of the NST/G, there is a single database file which serves the same purposes as the four files in the US version. The database is also located in /system/media/reference/. The filename appears to be somehow "locked", and the OS will not recognize any file that is not named "ox_en_GB.db". Also, there is a table in the database called "nook_metadata" that contains information which the OS apparently looks for when deciding whether to accept the database or not. These are mostly suppositions, but are reasonable in light of the behavior of the system when a different database is provided. The Nook App for Android uses a dictionary of the same structure, although it is a version of the Merriam-Webster Collegiate, 11th ed. (mw_11_en_US.db). That dictionary is not recognized by the UK version of the NST/G unless it is renamed and the nook_metadata table is replaced by the one from the stock dictionary.
At first glance the structure of the ox_en_GB.db seems familiar, as if the basewords and inflectedwords tables as used in the US version have simply been combined in one database. Indeed, the structures of the two tables in the database do seem to be like the structures of the two individual databases in the US dictionary. But they are not.
There are subtle differences. For example, there are no targeted inflected forms. This can be seen from the fact that there are no [n] values after any baseword pointers. So it should not be surprising that there are no [n] values after any of the basewords. Instead, all possible senses of the baseword are contained in the BLOB. If you look up "butterflied", you're going to read about insects, fluttery stomachs, AND legs of lamb.
Biographical and geographical names are included in the basewords. There are a few foreign words and phrases listed also, but because of the one-word Lookup, you're unlikely to encounter them.
Finally, the BLOB is (g)zipped, not (pk)zipped. Substitution of the wrong zip format results in an unreadable file.
The UK version of the ROM also has the theoretical ability to download additional single-language dictionaries and exchange them for the stock dictionary without any technical magic on the part of the user. There is a Settings page for managing the dictionaries (which is how I got started on all of this) and the Lookup window also shows an option to change dictionaries. From a scan of the Reader smali files it is clear that downloaded dictionaries were supposed to end up in /data/media/B&N Downloads/Dictionary and that Oxford versions of German, French, Spanish and Italian were planned. Also, the Merriam-Webster English variant is mentioned. The two English dictionaries have code numbers, very long strings reminiscent of the strings used to identify B&N downloaded books. The same number string is still used to identify the M-W dictionary in the Nook App for Android. But that's about as far as I got. I can't say so definitively, but I suspect there never were any other dictionaries available (so the "All dictionaries are free" declaration on the Settings page is a classic case of an ironic "you get what you pay for"). Had there been, there surely would have been talk of them on the forum or in similar other venues online. The Nook App for Android still touts the ability to download non-English dictionaries (for free...) but when I tried it out there was never anything offered but the default M-W. I called B&N and eventually spoke to a human being who quoted chapter-and-verse from the user guide when I asked about this. I replied that I did exactly what it said but saw no other dictionaries. Then the person quickly pivoted and said "oh, you can't do that". Of course, I knew that already. Perhaps you have to be outside the US? I didn't ask and because I had significant doubts about the existence of said (free) dictionaries, I
didn't pursue the bother of a VPN to test the idea.
The UK ROM does not recognize other dictionaries placed in /data/media/B&N Downloads/Dictionary, not even the mw_11_en_US.db. I'm guessing that this is because downloaded dictionaries would be entered into the various reader databases just like dowloaded books. Without the proper entries (including that long number, no doubt), there is no recognition. So the dictionary management capability of the ROM is moot until someone a lot more clever than I am can suss out the information from the smali files.
Edit: 9-6-22--I recently acquired another NSTG and had to go through the usual stuff to get it registered and so forth. One day I was poking around the directories and came to /data/media/B&N Downloads. Generally there are folders there like "Books", "Magazines" etc. What I didn't expect was "Dictionary" since the US ROM has no way of dealing with additional dictionaries. Inside the dictionary folder was another folder and in that one a file. So, like this: /data/media/B&N Downloads/Dictionary/2940043623508/dictionary.db. The long number string is the one associated with the MW Collegiate dictionary and is mentioned in several places in the code for the reader (there's a different number for the Oxford dictionary). I looked at the reader databases but saw no indication that this file existed. I've never seen this in any other update to 1.2.2, so it's pretty odd. Anyway, I copied the relevant folders and files to a device running the UK ROM and rebooted. But the settings app with the dictionary management feature did not see it. So now we have a structure for where additional dictionaries are supposed to be but still no idea of how to get the device to see them.
Spoiler: Technical (making dictionaries)
Finding free-use databases that make this endeavor worthwhile is not easy. Having a way to manipulate and eventually make this data into the basewords.db and inflectedwords.db which the NST/G uses is also a significant challenge. I did a lot of searching before happening on a Python routine for making dictionaries from tab-separated text files: https://github.com/geoRG77/nook-dictionary. As I know nothing about Python I was a bit skeptical of the authors claim that the script could be readily customized to suit! It turns out that with a good understanding of if/else conditionals and loop structures (and a more-than-healthy dose of chutzpah), it was possible for me to make the changes I needed, much trial and error later. I eventually also added a one-item table, nook_metadata, which holds source/date or edition information. This is for display purposes in my Dictionary Management app and has no effect on the function of the database.
The next happy discovery was that there were annual SQlite database dumps of Wiktionary language pair data: https://download.wikdict.com/dictionaries/sqlite/. The "translation" table in the database was in many cases all the data that was needed to construct simple dictionaries. For languages in which nouns are gendered, the information was sometimes found in the translation table (Spanish) or sometimes only in the single-language database. In some cases gender information was missing altogether and I had to scrounge around for something online and integrate the genders of any matching nouns into the database. I did most of my work with a combination of Notepad++ and SQlite Database Browser, moving back and forth between the two formats as needed (SQlite can import and export TSV text and the software includes a place to enter sqlite commands directly). Only a few SQlite manipulations seemed to require SQlite from the command prompt. These either involved pragma changes or comparisons of data in two different tables which appeared to overtax the SQlite Database Browser and result in spinning circles and "Not Responding" messages (although sometimes a walk-away for a snack seemed to make all that eventually work out). Even Notepad++ sometimes found the large text files a bit unwieldy (BTW, Notepad++ can be configured to show tabs and spaces--very handy).
When I had finished with my "last" dictionary I discovered that a data dump for 2022 had arrived but I didn't go back to compare. Instead I continued to scrounge around and found two other Wiki sources worthy of notice. The first is the Wiktionary site of Matthias Buchmeier. There are various formats of dictionaries built from Wiktionary data found there. The text dictionaries in ding format are fairly straightforward and in the general format:
baseword {part-of-speech/gender} \pronunciation\ [some contextual info about source, auxiliaries for verbs, etc.] :: translation
("nouns" are not explicitly identified; instead a gender is given--a problem in languages that use neuter gender as both the gender and "noun" are given the same symbol). There's just barely enough markup to allow creative search/replace to generate a TSV structure (or even HTML, I guess), although there is not always a part-of-speech/gender entry or the [...] entry so that would need to be dealt with. Also, these dictionaries are sometimes significantly larger than others. Spanish→English, one of the smallest I have made from the translation table, contains 108,000 entries! So while these are quite simple, they may be worth a look. In fact, they are so straightforward you could probably do everything you needed to prepare them for the Python script with Notepad++ alone if you are very clever and careful. I was neither (or not enough of either) and these turned out to be very frustrating with lots of botched lines. But it can be done.
Perhaps most interesting on that page is the link to Wikdict. The team there has very creatively cross-referenced the three databases for each language pair (well, four, I guess, if you count English) and created StartDict format dictionaries that are pretty impressive.
In the end I decided to create yet another set of dictionaries based on the Wikdict output along with the simpler dictionaries I prepared myself from the database translation tables and the dictionaries after Matthias Buchmeier. In addition to the two other tools mentioned above I used stardict-editor for Windows. That was needed to decompile the dictionary from Wikdict. This results in a TSV file of the form baseword→HTML. The HTML is almost usable as is if the Python routine is altered yet again to accept it as the already-complete HTML string. That's what I did but not before changing a few things. The most egregious problem is that all proper nouns are listed as "pronouns". This required a case-sensitive SQlite approach to fix (looking for capitalized words listed as "pronouns"). I also removed the pronunciation guides with their arcane symbols few can interpret (and for which the NST/G might lack font support). I objected to grammatical genders being given as "male, female, neutral". Never in my various studies of languages have I ever encountered those terms. I opted instead for m, f, n. I also removed all instances of \n (probably a newline character). Finally I tightened up the HTML, preferring <p> to <div> which otherwise adds a lot of white space to a citation without some css to calm it down. If you were going to do this yourself and were not so persnickety, you could just replace all \n with <br/> and you'd have a working HTML citation string.
Edit: and then...So I finally stumbled on what I had been looking for all along: single language dictionaries! These, too, are based on Wiktionary, but not on the database dumps, rather the massive complete data dumps of the site. The github site of Mickaël Schoentgen contains a number of dictionaries in Kobo, Stardict and other formats. The data is actually updated nightly! Of course, it's not a simple step from the Stardict format to basewords.db, but it's not that bad (except for the French dictionary which is HUGE). My only bone to pick is the lack of part-of-speech data.
A general outline of how each of the sources was prepared for the Python routine would involve a lot of steps and I'm not proposing to list them here. If you are interested in trying the process yourself, let me know and I will provide more detailed information. Suffice it to say that the goal is to prepare a tab-separated text file consisting of whatever information you want to retain and free of any words or characters that either Windows or Python finds objectionable (there are quite a few...).
Inflected forms are essential for a dictionary that is not to be endlessly frustrating, unless you are working with a language that is not inflected. I got lucky with this by closely reading an old XDA thread. There are some inflected word lists here: https://github.com/Tvangeste/dsl2mobi/tree/master/wordforms. These are in the format baseword:form1,form2,form3..... It was a simple matter to convert all the punctuation to tabs and then rejigger the Python routine to create an inflectedwords.db for a given language. In the case of English, I fell back on the stock database, but I needed to strip out the targets to specific senses since there would be no way of knowing if these were even valid in the various dictionaries. That part was easy but left me with multiple entries of the same thing. I eventually found a way using SQlite to eliminate all but one of each duplicate set. This is the inflectedwords.db that should be used with all of these dictionaries where English is the first language.
If you want to create a dictionary you need an inflected word list unless your baseword source includes inflected forms (very unlikely) or the language is not inflected. If your choice is not among those in the link I gave in the previous paragraph, the hunt is on. First, check the single-language databases at Wiktionary. These are mostly statistical but also contain the word list used to create the translation dictionaries and sometimes gender and part-of-speech data. Also, the "form" table in the database sometimes has a table of inflected forms. It may need some massaging, but if it's there you will have a way to create a dictionary. If your language is not inflected you will still need a database with at least a single entry in the table or the NST/G will not recognize the dictionary at all.

Nice work.

Does this work wiht a Nook Glowlight plus BNRV500 - 2015?

Johny3x said:
Does this work wiht a Nook Glowlight plus BNRV500 - 2015?
Click to expand...
Click to collapse
I do not think so. I would have to see the dictionary structure. Let me try and find an update zip and check.

nmyshkin said:
I do not think so. I would have to see the dictionary structure. Let me try and find an update zip and check.
Click to expand...
Click to collapse
Thank you very much i will appreaciate that, i want to install a spanish dictionary, what i have found so far is this thread, but the download link is dead
Spanish Dictionary for Nook Glowlight (Diccionario en Español para Nook Glowlight)
Hi, I make this database dictionary which contains 93076 entries and 505277 inflected words. The idea was to keep the internal Reader but with a custom dictionary because the owner of the device don't have a address on US. I was reading the post...
forum.xda-developers.com

nmyshkin said:
I do not think so. I would have to see the dictionary structure. Let me try and find an update zip and check.
Click to expand...
Click to collapse
Can you read the thread i linked above? that person explains what he did to make the dictionary, i dont know how to do that, but maybe you can help me out after you read that!

Johny3x said:
Can you read the thread i linked above? that person explains what he did to make the dictionary, i dont know how to do that, but maybe you can help me out after you read that!
Click to expand...
Click to collapse
I tried to contact the OP awhile back to find out more details, but have not heard from him.
If you have a rooted device and can see the contents of /system/media/reference then you can maybe tell me most of what I need to know.
Is there one file there or four?

nmyshkin said:
I tried to contact the OP awhile back to find out more details, but have not heard from him.
If you have a rooted device and can see the contents of /system/media/reference then you can maybe tell me most of what I need to know.
Is there one file there or four?
Click to expand...
Click to collapse
i am using ES FILE explorer (i can see hidden files) and all the reader has is /system/media - and inside media dolfer there is only a file named bootanimation.zip

Johny3x said:
i am using ES FILE explorer (i can see hidden files) and all the reader has is /system/media - and inside media dolfer there is only a file named bootanimation.zip
Click to expand...
Click to collapse
So no /system/media/reference? According to the OP in the thread you referenced that is where the dictionary was for the BNRV500.

nmyshkin said:
So no /system/media/reference? According to the OP in the thread you referenced that is where the dictionary was for the BNRV500.
Click to expand...
Click to collapse
Yeah, weird, because it is the same model but as i mentioned before, all i got inside media is that boot.zip file :/ i dont know if it has something to do with the nook being up to date, he mentioned in the post that he was using a different version, i guess that was the lastest version by that time

Johny3x said:
Yeah, weird, because it is the same model but as i mentioned before, all i got inside media is that boot.zip file :/ i dont know if it has something to do with the nook being up to date, he mentioned in the post that he was using a different version, i guess that was the lastest version by that time
Click to expand...
Click to collapse
Right. So I have no idea. There were not manual update zips for those models and beyond, as far as I can find, so there's no way to peek at things. But it seems like the dictionary must be hidden somewhere else and that means we know nothing about it. You could keep poking around for it, I suppose, but I'm not sure where to begin looking. Maybe data/media/B&N Downloads, if that even exists.
Sorry I can't be of more help.

nmyshkin said:
Right. So I have no idea. There were not manual update zips for those models and beyond, as far as I can find, so there's no way to peek at things. But it seems like the dictionary must be hidden somewhere else and that means we know nothing about it. You could keep poking around for it, I suppose, but I'm not sure where to begin looking. Maybe data/media/B&N Downloads, if that even exists.
Sorry I can't be of more help.
Click to expand...
Click to collapse
I will look for it in every folder lol, i just want to have a spanish dictionary, yeah my nook is rooted and i could use koreader i know but i dont like the fact that Koreader does not have real page number like the default reader from nook

Johny3x said:
I will look for it in every folder lol, i just want to have a spanish dictionary, yeah my nook is rooted and i could use koreader i know but i dont like the fact that Koreader does not have real page number like the default reader from nook
Click to expand...
Click to collapse
You know I just tried out Koreader on the NST running CM11 and was impressed. I might be misremembering, but it seems to me there were page numbers (like "8/256") but you have to turn off a lot of stuff and then the footer cycles through various kinds of info every time you tap it.

Would search for *.db shorten the time for locating the exact folder where dictionary is?

SJT75 said:
Would search for *.db shorten the time for locating the exact folder where dictionary is?
Click to expand...
Click to collapse
there is only folder_app.db, appinfo.db, appinfo.db-journal and folderapp.journal

nmyshkin said:
You know I just tried out Koreader on the NST running CM11 and was impressed. I might be misremembering, but it seems to me there were page numbers (like "8/256") but you have to turn off a lot of stuff and then the footer cycles through various kinds of info every time you tap it.
Click to expand...
Click to collapse
i mean it shows page turns, wich is what i dont like to see lol, i rather to stay in the same page and taping int 2 or 3 times until it changes page numbers, but sadly i dont know much about modying stuff, i am good at following instructions lol....If its possible can you tell me what to turn off or what to do, if possible...to get real page numbers in Koreader?

Johny3x said:
i mean it shows page turns, wich is what i dont like to see lol, i rather to stay in the same page and taping int 2 or 3 times until it changes page numbers, but sadly i dont know much about modying stuff, i am good at following instructions lol....If its possible can you tell me what to turn off or what to do, if possible...to get real page numbers in Koreader?
Click to expand...
Click to collapse
Ah....I see what you mean. I hadn't noticed that. But aren't page numbers sort of a fluid artifact for ereaders? If I open the same book in three different reader apps I get three different page totals. But you're right, this is the first time I recall seeing the "page number" change with every screen. I think I actually like that. It gives me a better sense of how far I have to go.
I haven't played enough with Koreader to know if there is another option, but if I hit on it, I'll let you know.
As to the suggestion of @SJT75 above, he is spot on. I never use the "search" function of ES File Explorer so I didn't think of it. I'm sure we have different versions, but I've attached a few screenshots below to illustrate a search of /system for .db files. The result is nearly immediate and although the initial display does not give the full path, a long-press on the file name brings up a context menu that includes "Properties" and there the full path is shown.
If /system yields no results for you the obvious next place is /data.

I am not sure what appinfo.db is but if that is some database of installed applications on device even that can be of some assistance as digging through there you could find out is there a dictionary on device and possibly where does it resides. Also those ".journals" could be handy if they are what I mean. If those are app log files analysis of those should lead you to a placement of dictionary .db file. Happy hunting!

Johny3x said:
i mean it shows page turns, wich is what i dont like to see lol, i rather to stay in the same page and taping int 2 or 3 times until it changes page numbers, but sadly i dont know much about modying stuff, i am good at following instructions lol....If its possible can you tell me what to turn off or what to do, if possible...to get real page numbers in Koreader?
Click to expand...
Click to collapse
Alright, so it looks like that is possible but only if the ebook supports it. See https://www.mobileread.com/forums/showthread.php?t=337616.
None of the books I currently have loaded up in Koreader show the "Reference page" option.

nmyshkin said:
Alright, so it looks like that is possible but only if the ebook supports it. See https://www.mobileread.com/forums/showthread.php?t=337616.
None of the books I currently have loaded up in Koreader show the "Reference page" option.
Click to expand...
Click to collapse
Thank you very much! i will try this

Database Users

welcome