XIP - XDA-developer encyclopedia

In computer science, execute in place "XIP" is a method of executing programs directly from long term storage rather than copying it into RAM. It is an extension of using shared memory to reduce the total amount of memory required.
Its general effect is that the program text consumes no writable memory, saving it for dynamic data, and that all instances of the program are run from a single copy. (wikipedia)

i am new hand

What is the extension of "XIP" ? Good description... People posting about Acronym or any term should atleast tell a line about the term and then refer to source like how 02xda2 had done.

Related

Size matters - how to make png's, exe's, tsk's and cabs smaller

Hello, this is one of the first posts, I don't know if you have any interest on this, but here it goes:
fact 1: EXE's and DLL's may be compressed
fact 2: PNG's may be compressed without any quality loss
based on that, I will explain what I do to make smaller CAB's and do not waste so many space when you have those installed.
tools used:
UPX (http://upx.sf.net)
pngout (http://advsys.net/ken/util/pngout.exe)
msceinf (http://www.codeppc.com/telechargements/msceinf/msceinf.htm)
cabwiz (from Microsoft)
let's put this together with a simple example with HTC Audio Manager CAB file:
size before: 1.135.294
size after: 327.204
(yes, 347% reduction, and all works well)
1. used msceinf to decompile the cab, and decompress it to a directory
2. used command "upx *.exe *.dll" on the decompressed directory
3. used command "for %i in (*.png) do pngout "%i" /kp" on the same directory
4. recreated the cab using "cabwiz Audio Manager.inf /compress"
that's it, until now I could regain a lot of space, of course that using exe compression makes programs a bit slower, but I believe there are more advantages than disavantages.
On Themes you can also achieve better compression using the same technique, decompiling the file, then using pngout on them and recreating the cab's back.
some results on a theme file:
Htc_New_Default.tsk original: 106.828
Htc_New_Default.tsk optimized: 44.835
I also saw some problems on some icon packages: sometimes the authors compile them with thumbs.db on it, resulting in complete waste of space.
Regards.
interesting. I never thought about it this way.
Optimized rom chefs, start your engines!!!!
In other news, dig the title.
It should be pointed out that one of the main negatives to upx'ing files is that they take up more memory. Example..a 100k program upx'd to 50k takes up 150k when ran and not all of the ram is released. This is typically why everyone doesn't just go crazy with compressing everything in site. If you need the space and are willing to sacrifice some memory then upx it..otherwise you'd be better off leaving it be. Also..if you are going to use upx or one of the other utils make sure you are using the most recent version to get best performance.
Yes, that's why I wrote that last sentence regarding EXE compression, some figures for CommManager that occupies 508KB as a process, it was tested by reading the in use memory when CommManager was running, and after I forced it's close (to check real memory usage, since process memory reads the same):
using UPX EXE: 33.77-32.89 = 0,88MB
original EXE: 33.67-32.89 = 0,78MB
so it's about 100KB difference for this EXE (since I don't usually run a lot of programs at the same time, I don't care about this), but for PNG's that are used more and more on pocket pc programs, it really makes a difference.
I've updated this with how to optimize Themes as well...
BullGates said:
I've updated this with how to optimize Themes as well...
Click to expand...
Click to collapse
You know there is a upx --ultra-brute switch that basically tries everything to get the smallest size?
Yes thanks. But as you pointed well, it's better to use EXE and DLL compression evaluating your needs. I've rebuilt many cab's concentrating on the graphics compression and I'm quite happy with the results. Smaller CAB's means smaller install times (not very noticed, but it's a fact) less need for storage - if you build your setups based on extended rom storage, this is quite handy. Also you have better performance since less bytes are involved on the loading (so better loading times).
Actually, I was looking to do exactly the same!
I had already found upx and was going to combine this with infocabxp and my own simple parser for the *.000/_setup.xml files, until I found the free msceinf tool - and about an hour later I find your post.
I just wish I found your post first, it would have saved me a lot of time
It is probably best to apply upx compression on exe's and dll's of standalone applications that you start manually or for short time use, and not all background processes as well, but I see there are already some thoughts on that subject
Thanks
How to do this
. used command "for %i in (*.png) do pngout "%i" /kp" on the same directory
Would u please explain more?
Regards
my 3 cents..
there's no reason to upx small files, lol...
40 kb of space may be equal of 40 kb mem less, if dll is persistent/resident(works all the time) - also not every dll/exe may be upxed - it may be broken after upx(i.e. isilo).
.net apps CANNOT be upxed for now.
DO NOT upx today plugins' dlls! they may lose stability/ you may have loading problems etc + mem loss, of course.
you can upx cpl's too.
always crunch at maximal pack setting!
do not pack exe icons, and relocs.
do not try to make xip modules from upxed files!
gfx resources like bmps may became smaller after pallette decrease(it is quite useful while reshacking - i.e. commgr may be smaller = faster launch times etc) regardless of saving method(24,32bit).
almost 100% of pngs, that are used in software/system etc usually loses its size after reedit = faster software loading/ working/less mem used(dialpad res's, home plug, etc) - usually palette decreasing works in same way, as with bmps - smaller file size(yeah, i know, that png is saved as 8bpp, but..sometimes palette may be 4 bit i.e.).

Batch upx compress (compress whole folders of exes)

Hi folks,
In the course of making my own ROM (something I admit to being new at) I found it rather tedious to individually compress exes with the upx utility and batches included with HyperCore kitchen. I've made a batch job that will look through all files in an input folder and compress all the exes (so you can just copy your entire SYS and OEM folder out of your kitchen, compress their contents and copy back). Perhaps something to do this already exists, but if so I couldn't find it. I've also included a more recent version of upx with this than was in HyperCore.
Hopefully this will save someone a little time.
To use:
1. Copy/Move folders containing exes (I suggest just copying the whole SYS and OEM folders) into the input folder.
2. Run compress.bat
3. Copy folders in output back to your kitchen.
While UPX is good for compressing PE files (which results in a smaller output file), one should point out, that it's a tradeoff. You get a smaller file, but a higher startup latency, and sometimes a tad larger memory footprint. So UPXing is not good for responsivenes of a device.
UPXing small .exe files may then be beside the point. You'll gain for eg. 5KB but you'll loose a second every time you load the file.
UPXing usually makes sence on executable files larger then 200-400 KB, and smaller then 15MB.
You can google for UPX performance for more information.
Just my 5 cents, and I think it's important for cookers.
NIXin said:
but a higher startup latency
Click to expand...
Click to collapse
mehehehe.
heh. it was not really good text, imo(maybe it is, on e-105..but even omap have VERY fast decrunch speed).
4 cents are ok, in general...
my one cent:
test EVERY upxed dll, EVERY upxed exe, some things will be broken after operation(isilo - i.e.).
very good idea is to pack office pack exes(much space for useful soft while cooking).
.net apps CANNOT be uxped for now.
imo, batch upxer is just NOT good idea...
Good input from both of you, particularly in regard to testing the executables. There is an option to force byte-identical decompression, perhaps it would be a good idea to enable this? I am aware that there is a trade-off in performance with compression, but reading the readme on upx one notes that decompression is very fast. Even assuming our ppcs are 10 times slower than the Pentium 133 decompression would still be 1MB/sec, and most of our exes are much smaller than that.
Just curious if this case where files don't work after compression is a common one (noting I'm not compressing dlls as they are unable to share data when compressed; this is noted in the upx documentation, where they comment it's probably only a good idea to compress things like plugin dlls). Perhaps someone with more experience can comment on this?
caeci11ius said:
Good input from both of you, particularly in regard to testing the executables. There is an option to force byte-identical decompression, perhaps it would be a good idea to enable this? I am aware that there is a trade-off in performance with compression, but reading the readme on upx one notes that decompression is very fast. Even assuming our ppcs are 10 times slower than the Pentium 133 decompression would still be 1MB/sec, and most of our exes are much smaller than that.
Just curious if this case where files don't work after compression is a common one (noting I'm not compressing dlls as they are unable to share data when compressed; this is noted in the upx documentation, where they comment it's probably only a good idea to compress things like plugin dlls). Perhaps someone with more experience can comment on this?
Click to expand...
Click to collapse
While compression is necessary to fit some things in it really ought to be a last resort. I recall the author stating that a 100k program compressed to 50k requires 150k of memory to expand and run and it doesn't free up all of the extra memory needed...so there has to be a balance or you are going to be using up all your memory to save storage....if the app HAS to be installed to phone storage then upx is something to consider but I wouldn't recommend globally upx'ing all your apps on your minisd for instance.
.net apps CANNOT be uxped for now.
Click to expand...
Click to collapse
Because .NET files don't have the standard PE header. It's a requirement for UPX (or any other executable file packer) (PL: pozatym witam )
Mr. famewolf just confirmed what I have said and even added some more. It's perfectly okay for PCs, where you don't notice the difference, but PPC's are just way to slow and the gain is too little (if any). Batch method is not the way to go, unless you wan't to "cripple" your PPC.

UPX and paging, performance effect?

Hi,
I am thinking about the sense behind UPX for mobile devices.
From what I understand about the mechanism in WM5/6, the following should be correct. I post this not as fact, but as my personal thesis on the effects of UPX, so please correct me if I am wrong.
The devices have a small pagepool, and rely on discarding and reloading of code segments.
So, the operating system needs to "forget" parts of a loaded executable if the place in the pagepool is small, and reload these parts if they are to be executed again. With UPX, this could be a real pain as these parts of the file are not stored in the rom in their real-time-executable form, but need some "decompression". So the device is either not able to discard parts of upxed files, resulting in less pagepool-space for other files, or needs to expand them at reloadtime, resulting in a lot of cpu-load as parts of large executables may need to be expanded several times over the execution span of a program.
Has anyone measured the effects of the use of a lot of UPX-ed executables in a device? I would be very interested in the effect upxing has on modules.
Regards,
TG.

Is there any way around the WM5/WM6 emulator storage space limitations?!?!

This has come up many times in many forums, but still cannot find an answer.
We develop commercial apps for windows mobile. We would like to be able to use emulators for testing. I realize we cannot rely totally on emulators for testing and do need to test on real devices as well, but it would be very beneficial to do some testing without having to buy umpteen devices for every developer.
Currently we can only use emulators for PPC2003. The problem is that WM5 and WM6 emulators only allocate 32 MB for storage space on the device, which is ridiculous. My HTC Fuze has over 290 MB for storage.
32MB is not enough to store our application plus the database.
Before someone tries to tell me about MEMSIZE parameters and settings in visual studio to increase memory, that is RAM, not storage space. Microsoft seems to have made no provision for changing the amount of storage space.
You can of course set up a virtual storage card using a folder on your hard drive. The problem is that the driver is buggy and does not work correctly for storing a database. I have seen references to this bug in relation to SQLCE and SQLite and it also affects DB2E (the database we use).
So is there either:
- A way to get more than 32MB storage space (not RAM) in the emulator
- A fix for the virtual storage card bugs
This pretty much renders the emulator useless for serious testing.
Workaround using Ramdisk
I have found sort of a workaround that lets me do testing using an emulator.
If you go to this thread there is a discussion of a Ramdisk driver that allows for large ramdisks.
Unlike the previous ramdisk driver which would only give me 17MB of ram disk, with this one I could get up to a 127MB ram disk.
The original source of this new RamDisk is gone, but go to that thread you can download a zip containing the executable and dll but none of the other files.
I did some work and figured out the lnk files you need and attached them to this post.
The exe and dll go into \Windows
The Ramdisk.lnk goes into \Windows\Startup. The link I provided sets up a 64 MB RAM Disk. You can edit the text in the lnk file to change that.
The Ramdisk-Unload.lnk is used to unload the RamDisk.
The downside is that you lose the contents when soft resetting, but that is not often with an emulator.
I know we had some trouble with the storage size limitation as well because we use a large database, but creating the folder on the hard drive and using that as an "storage card" word. Although ours isn't a sql database, it's a binary one. We have noticed writing a lot to the "storage card" that is on the hard drive, tends to fail every once in a while, not sure why. But if you try again two seconds later, it works.
So beyond doing good try...catches and maybe having a retry method. As far as I know, there isn't a way of changing that.

[Info] Boot Time per Paging Pool size depending on imgfs compression

I want to share with you some findings on how the size of the paging pool influences the usability of a WM device.
Setup:
Device: HTC Tornado (Smartphone, original WM5, 64MB RAM, 64MB ROM, TI OMAP 850, DiskOnChip from M-Systems)
ROM: my WM6 build (based on Nitrogen's WM6 - see my signature)
ROM Kitchen: OS Builder 1.48
Paging Pool changer: PPSmartchanger (adapted for use without ULDR partition)
Method:
Create builds with OSB selecting the relevant imgfs compression options:
lzx and xpr
with and without having the code sections of modules uncompressed
Change Paging Pool with PPSmartchanger (works only for devices with M-Systems DiskOnChip - so old stuff only - changed with pdocwrite)
retrieve boot-time via devhealth call after device has rebooted to selected standard (for all tests) - 2 repeats have proven enough as there is very little deviation between reboots.
I have taken the boot time because this is easily repeatable and devhealth delivers standardized results.
Results:
{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
lzx delivers the smallest ROM builds but also takes longest time to boot
xpr delivers reasonable sized ROM builds and is the fastest
lzx with uncompressed code sections of modules has large ROMs (larger than xpr) and takes longer time to boot
xpr with uncompressed code sections of modules has giant ROMs (had to cut out all optional parts not impacting the procedure) and is not faster than xpr for all modules
Discussion of the results:
You may ask why that setup first? To learn about the paging pool, look also here.
The Paging Pool serves as a dedicated (limited) area of RAM where the current executed code sections of the loaded modules reside. During boot a lot of modules are loaded. While the boot process continues, the demanded use of the paging pool will intermittently exceed its size and pages (parts of the code sections) are discarded to make room for new code to be loaded to the paging pool. As the discarded pages can easily be re-read from their original location on the imgfs - this is no loss of information.
The time it takes to read a module from imgfs, decompress it (lzx, xpr or no compression) and load it to the paging pool for execution determines the total of the boot time. So if you have an unlimited paging pool or you can afford to switch off the demand paging - then your boot time is limited only by read + decompress.
If now the paging pool is smaller than the maximum sequentially loaded modules need it to be, then the discarding of pages in the paging pool will eventually require a re-load of pages for active modules from the imgfs again. The probability for this to happen will increase with a smaller paging pool, so the boot-times will rise with a smaller paging pool. This is what you see for all 4 setups.
Why is lzx imgfs slower than xpr? lzx files are smaller than xpr on imgfs, so they should be read faster from imgfs! True, but obviously the overhead of the decompression to the paging pool is taking more time over all here for my device. So lzx takes more CPU than xpr for decompression.
Why is xpr + uncompressed code sections slower than if the whole imgfs is xpr? The saved overhead of decompressing the code sections should give an advantage here! So here you see that there needs to be balance of imgfs-read performance and CPU power to decompress. For this device it seems that uncompressed larger code sections are no advantage as they take longer to read from imgfs.
Your device's results may vary, but it seems that you only need to measure for one paging pool setup to judge the different options of OS Builder.
Nice comparison. Which boot time marker did you look at?
Just the first in the summary:
<Perf Gates>
<Gate Name=Bootup>27813</Gate>
<Gate Name=Memory>33894400</Gate>
<Gate Name=Storage>13049856</Gate>
</Perf Gates>
This is the same as later:
Bootup Time Markers (Milliseconds):
Available RAM on boot = 34242560
Calupd: Exiting = 30226
Initialized appt ssupdate = 29832
Initialized msgcount ssupdate = 29764
SSUpdate: Thread now in message loop = 29231
Calupd: Starting = 28781
SSUpdate: Shell Ready = 28717
Start Menu Icons Fetched = 28687
Appman: Finished Fetching Icons = 28664
Appman: Started Fetching Icons = 27831
Home Painted = 27813
Cprog: Done Booting = 25188
Last Boot Type = 2
I intend to use that setup also to look a little deeper in the devhealth memory report later, to possibly learn about differences that other OSB options are setting to relevant memory allocation of modules - especially dealing with code that can be paged out or not.
From the MSDN/Kernel Blog articles I have read, there are also other (internal) properties of the code that determine if paging may occur or not. Usually SW production tools set the relevant flags of those automatically so you don't have to care yourself.
I am not enough expert to judge here, but from a glance at the devhealth reports I see that e.g. gwes.exe process has no pageable parts (due to OSB setting it like that?) while other processes (e.g. home.exe) do have pageable code.
I was thinking of e.g. setting rilgsm.dll on the list of modules to prevent demand paging (to have the device immediately respond to incoming calls), but I noticed in the report that it already has no pageable (p) pages listed.
To avoid urban legends I wonder if there is a way to determine or even guide which modules/files are candidates for the list of advanced operations in OSB. It could also help to indicate which are clearly not taking any benefit of those.
From what I understand, OSB has a bunch of options that deal with demand paging one or the other way:
Set Kernel Flag 0x00000001 (MSDN: Demand paging is disabled when the first bit of ROMFLAGS is set. When demand paging is disabled, a module is fully loaded into RAM before running. For OEMs that need real-time performance, turning off demand paging eliminates page faults that can cause poor performance.)
Disable demand paging list (need to find out how this is done - dump directory files are still identical to the SYS folder) - I suspect that relevant code section properties are set here. gwes.exe is internally added automatically (see your build log).
imgfs compression options (see the initial compare in the first post) with the related setting to not compress code sections in modules. It may be helpful to balance the ROM size with potential benefit here if there was option to select the modules where the code sections should stay uncompressed. As seen from below comparison it is no benefit for me - but it may for others.
paging pool size (bigger is better - if you can afford it - even set to "unlimited")
In the context of above I still think that wealthy devices with variable use (like smartphones) better profit from PP = 0 than from Kernelflag = 0x00000001 because an "endless" PP still has the potential recovery of paging out when RAM limit is reached while disabled demand paging prevents further loading of modules at all. The latter may be suitable for fixed purpose devices like navigation devices
Nice, I found this useful. I'm looking forward to your future benchmarks.
I'm wondering how much lzx does impact loading applications after the device has booted-up.
Jackos said:
I'm wondering how much lzx does impact loading applications after the device has booted-up.
Click to expand...
Click to collapse
As long as applications just have to start it does not matter much imho. For such cases I even prefer to UPX these to get even more compression than with the imgfs itself. Here just one time read+decompress has to be taken into account.
The key to the boot-time (and related other module loading and discarding pages later from the paging pool) is that this will happen repeatedly and automatically without you able to steer it.
---------- Post added at 11:37 AM ---------- Previous post was at 11:30 AM ----------
tobbbie said:
I am not enough expert to judge here, but from a glance at the devhealth reports I see that e.g. gwes.exe process has no pageable parts (due to OSB setting it like that?) while other processes (e.g. home.exe) do have pageable code.
I was thinking of e.g. setting rilgsm.dll on the list of modules to prevent demand paging (to have the device immediately respond to incoming calls), but I noticed in the report that it already has no pageable (p) pages listed.
Click to expand...
Click to collapse
I looked up the devhealth report of an old build and the gwes.exe is just the same (from first glance). So I suspect that devhealth is just depicting the current state of the device and not which pages could be paged out (or not). On first fast search I have not found resources to clarify on that yet.
You don't need to run devhealth to get the boot-time, just look at HKCU\Performance\millisecondstoidlethread.
I actually run a mortscript that records each time I reset my device, and how long it takes (there's a shortcut to it in the startup folder). This is the script, if you want to try something similar. It's kind of nice; I've had a few times where my startup time increased a lot, and I like being able to tell when it happened.
Code:
MkDir("\Performance")
If(FileExists("\Performance\bootlog.txt"))
writefile("\Performance\bootlog.txt","^NL^",True)
writefile("\Performance\bootlog.txt",RegRead("HKLM","Comm","BootCount"),True)
writefile("\Performance\bootlog.txt"," boots, booted ",True)
writefile("\Performance\bootlog.txt",FormatTime("m:d:Y:H:i:s a"),True)
writefile("\Performance\bootlog.txt",", reboot time ",True)
sleep(30000)
writefile("\Performance\bootlog.txt",RegRead("HKCU","Performance","Millisec to idle thread"),True)
else
writefile("\Performance\bootlog.txt",RegRead("HKLM","Comm","BootCount"))
writefile("\Performance\bootlog.txt"," boots, booted ",True)
writefile("\Performance\bootlog.txt",FormatTime("m:d:Y:H:i:s a"),True)
writefile("\Performance\bootlog.txt",", reboot time ",True)
sleep(30000)
writefile("\Performance\bootlog.txt",RegRead("HKCU","Performance","Millisec to idle thread"),True)
endif
If(FileExists("\Performance\ramlog.txt"))
writefile("\Performance\ramlog.txt","^NL^",True)
writefile("\Performance\ramlog.txt",RegRead("HKLM","Comm","BootCount"),True)
writefile("\Performance\ramlog.txt"," boots-reset, RAM ",True)
writefile("\Performance\ramlog.txt",FreeMemory(MB),True)
writefile("\Performance\ramlog.txt"," MB, ",True)
writefile("\Performance\ramlog.txt",FormatTime("m:d:Y:H:i:s a"),True)
else
writefile("\Performance\ramlog.txt",RegRead("HKLM","Comm","BootCount"))
writefile("\Performance\ramlog.txt"," boots-reset, RAM ",True)
writefile("\Performance\ramlog.txt",FreeMemory(MB),True)
writefile("\Performance\ramlog.txt"," MB, ",True)
writefile("\Performance\ramlog.txt",FormatTime("m:d:Y:H:i:s a"),True)
endif
This is what the output looks like (the time is milliseconds, so we're looking at~45 seconds per boot):
2 boots, booted 08:11:2011:12:24:24 pm, reboot time
3 boots, booted 09:04:2011:08:04:31 am, reboot time 49245
4 boots, booted 09:04:2011:09:18:14 am, reboot time 46642
5 boots, booted 09:04:2011:13:39:35 pm, reboot time 42994
6 boots, booted 09:12:2011:15:31:33 pm, reboot time 45625
7 boots, booted 09:12:2011:17:30:11 pm, reboot time 45564
Click to expand...
Click to collapse
Edit: Oops, I forgot, only the top part writes the boot log. The second part logs the device free ram whenever I wake it up. I just like to follow that for the hell of it. It's pretty cool to have that log when you go 24 days between soft resets, lol.
...thanks for the scripts - maybe useful as these take very little time to execute. I do a reset every night on my device - so long time RAM "loss" is easily worked around this way. Need to adjust for the keys present in my \performance path though (old WM5 Native Kernel with WM6 OS on top) - have not seen the one you read from there.
I wanted to utilize the devhealth logs later as well to compare more between different settings - this is why I had chosen devhealth to read it for me.
I noticed that some performance gates are linked to the amount of appointments or tasks. These log entries are after "Home painted". So for my compare I simply picked the devhealth selection of "Home painted" as the performance gate for bootup.
This is not the true time until you can use your device, but a good one to compare the influence of a single parameter (the PP size or the imgfs compression).
BTW: I checked if the latest XIP compression from OSB takes an influence on boot time - it does not I have only applied it as recommended to the 2 big ones in my XIP - not sure yet if I want to squeeze the remaining bytes from there.
The following post was initially done in the OSB thread, but I thought it is off topic there, so moved it here:
I read about the use of the paging pool in this blog post but in essence I thought that mostly modules code would be candidates for that memory region. Looking at the output of devhealth there are also other parts (aside modules) marked as "p" which I suspect to bei either "paged out" or "pageable" (have not found a description here).
For me the relevant question are:
Can normally loaded executables (from imgfs as files or elsewhere, e.g. storage card) be utilizing the paging pool? I guess yes - as the OS acts according to the information present in the executable's structure, just like it does for the modules. So if this is the case - what is difference then regarding the use of the paging pool if a certain dll/exe is in "module" format or as a file? Are sections' attributes in a file incarnation set differently regarding the use of paging pool?
Does a potential UPX compression of a file (exe/dll) then impact the utilization of the paging pool and later performance of the exe/dll when paging applies? In a wild speculation and linked to the analogy of imgfs decompression needed when reading LZX compressed code sections parts to be paged in again: would a upx-ed exe/dll need to be re-read and decompressed from storage in case paging of code would apply? I guess no as this would severely impact performance and I have not noticed it for my UPXed (usually BIG) files.
So I wonder if I sacrifice OS memory management flexibility when I UPX compress a large executable, like Opera, that it e.g. needs more RAM and that parts maybe page-able if not UPX-ed and not page-able when UPX-ed. My guess (and hope) is that dll/exe files do not have page-able sections that are compressed. Possibly something to discuss with the UPX-team.
Can I do any tests myself to find out? I could use e.g. Opera and load it one time as UPX-compressed file and another time as normal file. Will the devhealth memory report be suitable to compare these cases? Is the "p" in the devhealth report telling that these pages in memory are "pageable" (so could be paged out if needed) or that they are "paged out" at the time of report?
tobbbie said:
Can I do any tests myself to find out? I could use e.g. Opera and load it one time as UPX-compressed file and another time as normal file. Will the devhealth memory report be suitable to compare these cases? Is the "p" in the devhealth report telling that these pages in memory are "pageable" (so could be paged out if needed) or that they are "paged out" at the time of report?
Click to expand...
Click to collapse
...well I just tried it (using PocketPlayer from Conduits) and the differences are there - and they are as expected:
The normal exe delivers a map that has most memory parts marked either as "-: reserved" or "p-page" and the exe ends at a size of 2 882 088 bytes
The UPX-ed exe delivers a map where executable parts are marked as "E" but ending at a size of 2 775 552 bytes
Strange to notice that the devhealth dump delivers maps that are missing certain memory regions within the sequence of the report. So for the normal report the lines for 2e160000, 2e200000, 2e240000, 2e470000, 2e850000 are missing - compared to the UPX-ed. No clue what these "gaps" mean actually - probably memory that is simply not allocated to the process.
I cannot interpret any detail here, but the obvious difference is that the UPX-ed version seems to allocate the memory in a fixed way (i.e. it cannot by dynamically reclaimed by the OS), while the "normal" version seems to allocate memory in a way that allows "paging" to happen.
UPX:
plus side: no paging happens for the main executable parts - much similar to the OSB module option "exclude from demand paging".
minus side: in total more RAM is required as the whole executable size is loaded (to RAM?).
Normal:
plus side: less RAM is needed as the executable seems to utilize the paging pool
minus side: the paging pool is used, so it should be dimensioned to cope with the parallel use of all normal running applications, not just the modules from ROM.
My speculations are based on the assumption that the "p" indication of the DevHealth report is depicting the utilization of the paging pool. If so then the total number of "p" must not exceed the defined paging pool size. I have no easy way to count these "p" and skip others in the report - so now way to confirm that.
Take-home message: UPX your dll/exe if you want to exclude them from demand paging.
See attachment with the relevant excerpts from the DevHealth reports.
tobbbie, too long way to disable paging for selected exe. You just have to set "unpageable" flag on all sections using apps like LordPE.
ultrashot said:
tobbbie, too long way to disable paging for selected exe. You just have to set "unpageable" flag on all sections using apps like LordPE.
Click to expand...
Click to collapse
Well - for the pro's that only want to have demand paging disabled your suggested way to hack the exe/dll is viable as well. The UPX method has a positive side effect however - the main purpose of UPX is to make the exe/dll smaller - and it does it very well. So you have smaller footprint, faster load time and exclusive RAM with just one activity - and this can be done by anyone, without hacking!
The real downside for those with tight memory (like myself) is that such exclusive memory allocation eats much more precious RAM than if you have the exe/dll allow their code to be paged. So you have to balance the snappy application behavior (UPX) with the number of concurrent running applications demanding memory.
For my part (small RAM) I have taken the following conclusions:
Free RAM is not the goal to optimize for. A small paging pool makes free RAM grow, but it limits the amount of paged-in parts of concurrent running programs.
UPX.ing dll/exe is only useful in case you have plenty of RAM to always keep the whole code in (virtual) memory. So if you have LARGE programs (like my example Pocket Player or Opera) it is advisable NOT to UPX them, despite their storage footprint will be larger. Get these programs on the memory card instead and accept longer loading times. Their RAM footprint will be lower as the OS loads the code to the paging pool (not normal RAM) and it can discard non accessed code parts from the paging pool and reload them from the file again if needed.
Optimize the size of the paging pool to the usually running concurrent programs. This does not mean to minimize the paging pool so that you can still somehow "use" your device, but find a balance for the case that all your concurrent applications are loaded. Mind that any UPX-ed programs (or such marked to be excluded from demand paging) do not count here on the paging pool as their code is put elsewhere! So make sure first that your programs are loaded from a non UPX-format file.
Sigh - all my efforts to get the smallest ROM footprint were done in vain as the price paid is simply the amount of RAM used when such exe/dll are loaded. I need to re-think many optimizations and do a better balancing of ROM size with RAM use.
It is really worth to note that all normal loaded applications have their CODE part loaded to the paging pool, while their other demand for memory (data, heap) is put to other virtual memory. The OS demand paging utilizes this limited amount of RAM (real RAM, not just virtual memory) in an optimal way for the concurrent running programs. So while these execute (and the instruction pointer passes along the memory pages reading the code for execution) the pages have to be in RAM. The overall minimal RAM use (without paging activity) is thus defined by the active code-traces along the memory pages only. All code that is NOT accessed when the programs run can be on discarded pages (so they occupy no RAM).
Now for a mysterious part (to me): The allocations of memory are done for "virtual memory". As you know, each process has 32MB of contiguous virtual memory in its assigned slot - but how does that map to used RAM? So how can I tell how much of precious real RAM is utilized for a certain process? I guess the RAM use for the paging pool is easy understood and also visible in the devhealth report (the "p" - if my assumptions are true), but what about RAM outside the paging pool? Is it just that any part marked in the devhealth report (except the "-" for reserved, as I would count this for discarded code pages) is mapping virtual memory to real RAM 1:1 - so the advantage for the application is just the contiguous address space which in reality may be put on RAM pages that are located anywhere in real RAM address space. From other OS you may remember that there are "paging files" which could also be utilized for mapping virtual memory - but this is not present in WinCE - so is it just 1:1 VM allocation to RAM allocation?
I remember reading that modules loaded from several processes are not loaded several times in the relevant process slot but just referenced from all the processes to their slot1 - so what about the applications data in their slot?
I would really like to learn more about the DevHealth report meaning for the various symbols used when drawing the virtual memory maps. Any hints for me to MSDN or other sources?
I remember reading that modules loaded from several processes are not loaded several times in the relevant process slot but just referenced from all the processes to their slot1 - so what about the applications data in their slot?
Click to expand...
Click to collapse
All module sections except r/w occupy only one space in VM (higher slots on 6.5 kernel, slots 1/0 for older kernels). R/W section is copied to every process using this dll (if that r/w section isn't shareable), but, unfortunately, every process NOT using it will still have this memory reserved.
btw, if you want to read some more interesting reading, here is a nice link
ultrashot said:
All module sections except r/w occupy only one space in VM (higher slots on 6.5 kernel, slots 1/0 for older kernels). R/W section is copied to every process using this dll (if that r/w section isn't shareable), but, unfortunately, every process NOT using it will still have this memory reserved.
Click to expand...
Click to collapse
...well but reserved pages eat no real RAM. It just keeps the process from allocating contiguous memory where the reserved pages are located in VM. As the dlls go top-down and the process bottom up - they have 32MB of VM in the middle. A very nice tool to see this visualized is here: http://www.codeproject.com/KB/windows/VirtualMemory.aspx The "source code" link at the top of the article also contains compiled versions - otherwise I could not have used it.
ultrashot said:
btw, if you want to read some more interesting reading, here is a nice link
Click to expand...
Click to collapse
...oops - looks like a source rip of an old wiki of 2007 - quite hard to get something from it. I found however what I needed on DevHealth here:
https://evnet.svn.codeplex.com/svn/C9Classic/WebSite/Wiki/WikiBases/CEDeveloper/BSPVMDevHealth.wiki
tobbbie said:
...well but reserved pages eat no real RAM.
Click to expand...
Click to collapse
exactly.
10 char
ultrashot said:
exactly.
10 char
Click to expand...
Click to collapse
instead of 4k page size, or?
Ultrashot, if you feel bored sometime you could further give insight to the VM use of a running device's memory snapshot with your devhealthanalyzer tool. The following could as well be discussed in the linked thread of yours if you prefer that. For me it is still linked to the paging pool mainly - so I have it here for a start.
I guess the sum of all "p" give the currently committed pages from the paging pool and so you can see the current percentage of paging pool utilization (as you know the total size from the header of the report. Usually this should be 100% (or close) but if you have huge paging pools it may be less.
I noticed that the devhealth output already reports on the pages utilized per module and process - even giving totals for the system. The use for a 5MB paging pool (1285 pages) is 1267 pages, so 98.6% for a sample I took.
Now the problem is to put that utilization in ratio to the potential demand. You could tell this by comparing the total code size of all loaded exe/dll with the given paging pool how the ratio is between these two.
Evaluating even per loaded exe/dll the ratio of "total code / loaded code to paging pool" could tell how much RAM you would loose/gain if you decide for excluding that from demand paging.
Furthermore the ratio of code (utilizing the paging pool) and other data (stack, heap) eating remaining memory may give advice on the recommended ratio of paging-pool to RAM for a given devhealth snapshot. I think that on average this ratio should be a guide for dimensioning the paging pool - like "reserve 20% or your available RAM for the paging pool".
I don't know which pages from the report should be put in relation to estimate on the above. I had guessed the ratio of "- reserved" to "p" could do it, but the reserved pages are also listed outside the areas which I would attribute to potential paging pool use.
Would not the following be a way to guide a potential finding?
I can grow the paging pool as long as no other memory demand would be limited for my use-case. I know that some applications (video playback, web browsing) have dynamic needs for RAM, but that would be visible in the snapshot taken with devhealth. For an honest report you must not use any memory cleanup tools before taking the snapshot, obviously.
Growing the paging pool beyond the size that all loaded code needs makes no sense and is wasting RAM for a pool that is not exploited.
Shrinking the paging pool to sizes where even the OS (and fancy UI for newer devices) fight for allocated RAM in the paging pool on their own behalf is the lower limit and makes no sense either.
My fault was to assume that you should go as small as possible - but what does it help to have a small paging pool which makes your device slow due to a lot of OS demand paging activity if you still have RAM available that your loaded processes do not utilize for heap and data?!
There are several means to manipulate memory use (mark R/W sections as shared) and avoid the paging pool use (kernel flags, paging pool size = "0", exclude from demand paging - via section flags or simply UPX). I think that only a tool based evaluation of the devhealth output allows to discuss the consequences. Side-by-side compare of different devhealth reports are hard to get insight from - at least for non-pros like myself.
Your module report already gives something but I wonder what conclusion to take from what you list there (module - in use - size - pageable[Y,N,Partly]). Not that much detail as I would need to answer the questions raised further up.

Categories

Resources