Related
http://www.pcworld.com/businesscent...x_kernel_patch_delivers_huge_speed_boost.html
http://forum.xda-developers.com/showthread.php?t=844458
could this be worked into Epic 4G kernels as well?
tyl3rdurden said:
http://www.pcworld.com/businesscent...x_kernel_patch_delivers_huge_speed_boost.html
http://forum.xda-developers.com/showthread.php?t=844458
could this be worked into Epic 4G kernels as well?
Click to expand...
Click to collapse
WOW. I am seriously impressed by your "keeping up with the times" mentality. Good job on noticing this!
So...
"n tests by Galbraith, the patch reportedly produced a drop in the maximum latency of more than 10 times and in the average latency of the desktop by about 60 times. Though the merge window is now closed for the Linux 2.6.37 kernel, the new patch should make it into version 2.6.38."
Along with an Overclocked Froyo kernel (once source is out) this should REALLY improve our experiences.
I mentioned in another thread that I am in talks with Paragon software
http://www.paragon-software.com/exp...ocs/technologies/Paragon_UFSD_for_Android.pdf
for NTFS and HSF access. I think that is is POSSIBLE that this is actually a software patch, although it may need to be placed into the kernel itself as a driver. I promise to update as soon as they get back to me as I just spoke to the devs there yesterday.
Looks like our experience is about to improve dramatically!
Already in IntersectRavens latest kernel and wildmonk's latest beta kernels for nexus one. Check the threads
Click to expand...
Click to collapse
From the other xda thread someone mentioned that some kernels have already implemented. I am sure some of them would be glad to share how it is implemented and how easily it can be done. I know it is different phones/kernels but the idea behind it should be similar.
Dulanic said:
From the other xda thread someone mentioned that some kernels have already implemented. I am sure some of them would be glad to share how it is implemented and how easily it can be done. I know it is different phones/kernels but the idea behind it should be similar.
Click to expand...
Click to collapse
We don't have a source kernel for Froyo yet to do this. Someone correct me if I am wrong please.
Edit: I can't find anything mentioning this patch. If anyone has a link post it. I don't believe this is implemented anywhere yet.
I found the below info here:
http://www.reseize.com/2010/11/linux-kernel-patch-that-does-wonders.html
Below is the video of the Linux desktop when running the kernel and the patch in question was applied but but disabled:
As you can see, the experience when compiling the Linux kernel with so many jobs is rather troubling to the Linux desktop experience. At no point in the video was the 1080p sample video paused, but that was just where the current mainline Linux kernel is at with 2.6.37. There was also some stuttering with glxgears and some responsiveness elsewhere. This is even with all of the Linux 2.6.37 kernel improvements up to today. If recording a video of an older kernel release, the experience is even more horrific! Now let's see what happens when enabling the patch's new scheduler code
It is truly a night and day difference. The 1080p Ogg video now played smoothly a majority of the time when still compiling the Linux kernel with 64 jobs. Glxgears was also better and the window movements and desktop interactivity was far better. When compiling the Linux kernel with 128 jobs or other workloads that apply even greater strain, the results are even more dramatic, but it is not great for a video demonstration; the first video recorded under greater strained made the "before look" appear as like a still photograph.
This could be potentially patched into our Eclair kernel if the changes aren't too intrusive, and by the sounds of it they're not.
The mainline patch was against 2.6.39 kernel however, our froyo kernel will be 2.6.32 and eclair is 2.6.29 - so we're several revisions behind in eclair.
It's definitely interesting, but it's geared toward desktops using the group scheduler - absolutely worth a try if that scheduler works with android easily ( most of the community kernels are using BFS scheduler however )
cicada said:
This could be potentially patched into our Eclair kernel if the changes aren't too intrusive, and by the sounds of it they're not.
The mainline patch was against 2.6.39 kernel however, our froyo kernel will be 2.6.32 and eclair is 2.6.29 - so we're several revisions behind in eclair.
It's definitely interesting, but it's geared toward desktops using the group scheduler - absolutely worth a try if that scheduler works with android easily ( most of the community kernels are using BFS scheduler however )
Click to expand...
Click to collapse
Sniff...
It did sound a little too good to be true. Well, eventually we will get 2.6.38 and that has it built in, if the desktop group scheduler can even be used at all it seems.
but because its in other peoples' kernels cant it be easily ported into ours?
tyl3rdurden said:
but because its in other peoples' kernels cant it be easily ported into ours?
Click to expand...
Click to collapse
It's very possible to patch in. If it's been done before, anyway.
But, because it is based on the .39 kernel, it might be a little buggy. Or a lot buggy. You wanna link me to a kernel that has it and I'll look into it? I probably will wait for Froyo source for at least the .32 kernel.
Here's what Linus himself had to say about the patch:
Yeah. And I have to say that I'm (very happily) surprised by just how small that patch really ends up being, and how it's not intrusive or ugly either.
I'm also very happy with just what it does to interactive performance. Admittedly, my "testcase" is really trivial (reading email in a web-browser, scrolling around a bit, while doing a "make -j64" on the kernel at the same time), but it's a test-case that is very relevant for me. And it is a _huge_ improvement.
It's an improvement for things like smooth scrolling around, but what I found more interesting was how it seems to really make web pages load a lot faster. Maybe it shouldn't have been surprising, but I always associated that with network performance. But there's clearly enough of a CPU load when loading a new web page that if you have a load average of 50+ at the same time, you _will_ be starved for CPU in the loading process, and probably won't get all the http requests out quickly enough.
So I think this is firmly one of those "real improvement" patches. Good job. Group scheduling goes from "useful for some specific server loads" to "that's a killer feature".
Click to expand...
Click to collapse
DevinXtreme said:
It's very possible to patch in. If it's been done before, anyway.
But, because it is based on the .39 kernel, it might be a little buggy. Or a lot buggy. You wanna link me to a kernel that has it and I'll look into it? I probably will wait for Froyo source for at least the .32 kernel.
Click to expand...
Click to collapse
Devin- I agree with waiting until the Froyo source is out for attempting to implement this. I'm not sure that group scheduling is even an option in the Android kernel. But I don't think anyone has done this so I doubt any links are coming your way.
Edit: Found this here- http://groups.google.com/group/android-kernel/browse_thread/thread/f47d9d4f4e6a116a/ab1a8ab42bb0b84a
Android is using the CFS.
They are combine with RT scheduling.
When you playing the audio and video service, paltform change the
scheduling policy and change the schedule prority.
search the platform code
dalvik has policy n proiorty setting code, also framework related with
audio n video
check the init.rc and cutil folder
u need to search the platform after eclair release (Froyo)
cicada said:
( most of the community kernels are using BFS scheduler however )
Click to expand...
Click to collapse
Actually, no Epic kernel uses BFS. It isn't stable on our hardware, and its not worth porting. Android uses CFS by default, and then the CFQ scheduler I think, but most have switched from CFS/CFQ to CFS/BFQ combination. I know mine & Devin's kernels have.
Geniusdog254 said:
Actually, no Epic kernel uses BFS. It isn't stable on our hardware, and its not worth porting. Android uses CFS by default, and then the CFQ scheduler I think, but most have switched from CFS/CFQ to CFS/BFQ combination. I know mine & Devin's kernels have.
Click to expand...
Click to collapse
Ok then, so in your professional opinion is this patch a possibility still?
Enter your search termsSubmit search formWeblkml.org
Subject [RFC/RFT PATCH] sched: automated per tty task groups
From Mike Galbraith <>
Date Tue, 19 Oct 2010 11:16:04 +0200
Greetings,
Comments, suggestions etc highly welcome.
This patch implements an idea from Linus, to automatically create task groups
per tty, to improve desktop interactivity under hefty load such as kbuild. The
feature is enabled from boot by default, The default setting can be changed via
the boot option ttysched=0, and can be can be turned on or off on the fly via
echo [01] > /proc/sys/kernel/sched_tty_sched_enabled.
Link to code: http://forums.opensuse.org/english/...ernel-speed-up-patch-file-mike-galbraith.html
Thanks for the clarification Geniusdog254.
ZenInsight, any chance you can prune down that post and just use a link? The patch is all over the web right now, and it's hard to scroll by on a phone
ZenInsight said:
Ok then, so in your professional opinion is this patch a possibility still?
Click to expand...
Click to collapse
I'm sure its possible, I just haven't looked at it yet. Like I stated before, until we get 2.6.32 FroYo kernel source I'm not doing any devving besides app work (maybe)
EDIT: Devin said on the last page that he'll look into it. I know IntersectRavens Nexus kernel has it, but I haven't looked into any reports of how much it helps.
Also found this:
Phoronix recently published an article regarding a ~200 lines Linux Kernel patch that improves responsiveness under system strain. Well, Lennart Poettering, a RedHat developer replied to Linus Torvalds on a maling list with an alternative to this patch that does the same thing yet all you have to do is run 2 commands and paste 4 lines in your ~/.bashrc file. I know it sounds unbelievable, but apparently someone even ran some tests which prove that Lennart's solution works. Read on!
Lennart explains you have to add this to your ~/.bashrc file (important: this won't work on Ubuntu. See instructions for Ubuntu further down the post!):
CODE:
if [ "$PS1" ] ; then
mkdir -m 0700 /sys/fs/cgroup/cpu/user/$$
echo $$ > /sys/fs/cgroup/cpu/user/$$/tasks
fi
Linux terminal:
mount -t cgroup cgroup /sys/fs/cgroup/cpu -o cpu
mkdir -m 0777 /sys/fs/cgroup/cpu/user
Further more, a reply to Lennart's email states that his approach is actually better then the actual Kernel patch:
I've done some tests and the result is that Lennart's approach seems to work best. It also _feels_ better interactively compared to the vanilla kernel and in-kernel cgrougs on my machine. Also it's really nice to have an interface to actually see what is going on. With the kernel patch you're totally in the dark about what is going on right now.
-Markus Trippelsdorf
The reply also includes some benchmarks you can see @ http://lkml.org/lkml/2010/11/16/392
Found all this here (Ubuntu patch info too):
http://www.webupd8.org/2010/11/alternative-to-200-lines-kernel-patch.html
preface: i'm no expert and i've had a beer or two, but this is something that's been bugging me for a while. = P
Obviously the linpack scores between the Hummingbird and Snapdragon is a huge difference, but 2.3 does not improve that (2.3 does appear to slightly improve linpack score, but I've seen more variation between various 2.2 roms than I've seen between something like Bionix V 1.2 or EDT's 2.2.1 stable beta and CM7). Both use the ARMv7 ISA, so a performance difference would be in the hardware implementation of the instructions used meaning that an optimization would be based on calling instructions that perform better for a specific task for a specific architecture. But since the actual instructions are the same, an 'optimization' for one hardware implementation would hurt another implementation of that same instruction set.
This sounds unlikely.
First, it is unlikely that Google is trying to optimize which instructions are called in which order just to improve a specific hardware implementation.
Second, without really bloating android or leaving this up to the hardware manufacturer in kernel development, this is not practical. If it's left to the hardware manufacturer and developing the kernel, then they should be trying to optimize. Otherwise Google wants to cover as much hardware as possible with as little code as possible (and thinking more about this, it's probably left up to the manufacturer; to elaborate the current kernel for Ubuntu 10.10 is 139MB which is the size of some Android ROMs and covers a vast array of hardware, most kernels for the Vibrant are in the 5-10MB range).
Now this leads to JIT, which really falls to the same pitfalls. Android apps are based off Java, Java is a hybrid language that is compiled and interpreted. JIT basically compiles parts of code that are commonly ran in an application to reduce interpretation (because interpretation is slower). But in compiling it, it's still going to eventually be compiled to ARMv7 instructions. I'm not exactly sure how the Dalvik JIT is implemented, but my guess is that since APKs are already 'compiled' it compiles parts of the APKs that would normally be interpreted to machine code and compiles them for faster execution. In which case, this is still the ARMv7 ISA and the difference between the Hummingbird and Snapdragon is hardware implementation, not software.
So, i don't see how some optimization specifically for the Hummingbird is likely or plausible (and keep in mind that Moto uses the OMAP, which would require optimization as well).
There's my rambling, for now.
I already posted this in general, but this might be a more suitable place.
The way I see it, android (especially the newer versions) are capable of distributing several processes over the dual cores. Apps however don't utilise both cores.
Another question: what kind of apps would benefit from using both cores? I could imagine that heavy games and home launchers could benefit from using both cores.
Are there tools available for android that enable multithreading apps? And what are the average prices of app development tools?
Im currently working in Java, but how do most people write Android apps?
Just found some information that confirms my thoughts on how android uses parallel structures, but this still leaves open my other questions:
"Android 2.2 already takes advantage of multicore. Anything that multitasks and multithreads already takes advantage of multicore. But this exploitation is a matter of degrees.
Android 2.3 takes further advantage of the multicore, because unlike Android 2.2, 2.3's file system is multithreaded one, not single threaded. When it comes to file I/O or database searches. 2.3 will be a lot faster.
Android 2.4 or 3.1 as rumored to be, will take even greater advantage of multicores with further "architecting" parts of the OS to use more multithreads."
Android 2.3 has concurrent garbage collection which I imagine will take advantage of dual core phones. This should really help to reduce any lag or stuttering in apps and games.
First, if you develop something that requires much CPU power, then you should always try to do it using multiple threads. This is a general rule, not only related to Android.
Second, main thread of an app is UI thread and you should never run CPU-consuming tasks in it. So actually you are forced to use multi-threading in Android apps.
Third, there are many things that Android itself could run in parallel to your app: garbage collector, UI changes, animations, background apps, etc.
Brut.all said:
First, if you develop something that requires much CPU power, then you should always try to do it using multiple threads. This is a general rule, not only related to Android.
Second, main thread of an app is UI thread and you should never run CPU-consuming tasks in it. So actually you are forced to use multi-threading in Android apps.
Third, there are many things that Android itself could run in parallel to your app: garbage collector, UI changes, animations, background apps, etc.
Click to expand...
Click to collapse
Thanks for you quick answer! Could I use a Java solution that makes multithreading obsolete to make this all easier? And then just pack it into an apk? Sorry, but im pretty new to this.
Stitch! said:
Thanks for you quick answer! Could I use a Java solution that makes multithreading obsolete to make this all easier? And then just pack it into an apk? Sorry, but im pretty new to this.
Click to expand...
Click to collapse
I don't understand. How does Java makes multithreading obsolete? Besides, MT isn't really that hard if you have good tools for asynchronous processing of tasks. Java/Android gives you such tools.
Brut.all said:
I don't understand. How does Java makes multithreading obsolete? Besides, MT isn't really that hard if you have good tools for asynchronous processing of tasks. Java/Android gives you such tools.
Click to expand...
Click to collapse
Not Java, but an Java extension called Ateji Parallel extensions. There is a demo here: http://www.youtube.com/watch?v=8MDbqTgCDIA
I was just wondering, if it would be worth developing for android. The video is a demo on a dual core, and the new quadcore dev kit just came in. Additions to the that I thought about now are a timer and perhaps some other figures that can indicate the difference. Do you have any ideas on this?
Really appreciate you input, thanks!
Stitch! said:
Not Java, but an Java extension called Ateji Parallel extensions. There is a demo here: http://www.youtube.com/watch?v=8MDbqTgCDIA
I was just wondering, if it would be worth developing for android. The video is a demo on a dual core, and the new quadcore dev kit just came in. Additions to the that I thought about now are a timer and perhaps some other figures that can indicate the difference. Do you have any ideas on this?
Really appreciate you input, thanks!
Click to expand...
Click to collapse
Yes it would be worth it to develop for android. The newer android phone's dual core processors are utilized by games but only when a new version of android (future ice cream sandwich and later so i have read), will be able to support multiple processors. Also android really needs some 3D HD games like what Apple has made for the Iphone. I hope you decide to develop for android.
I still don't understand why it's so important to you. You don't need Ateji to utilize multiple cores, actually their demo is just a few lines of pure Java code. Ateji could make things easier, but it doesn't do any magic.
Stitch! said:
Another question: what kind of apps would benefit from using both cores? I could imagine that heavy games and home launchers could benefit from using both cores.
Click to expand...
Click to collapse
Anything involving image processing is a good candidate. For example, if you want to sharpen a photo, you can have one core processing the top half and one core processing the bottom half. Saying that though, I've found the single threaded performance of newer processors is fast enough for typical image filters.
This one will be of more interest to kernel developers.
Link to Spreadsheet: Results
RcrdBrt asked me to see which memory allocator performed best, and provided me with four identical kernels except for the allocators being SLQB, SLAB, SLUB, and SLOB.
SLUB turns out to have a slight performance advantage, as RcrdBrt had suspected.
The rank was:
SLUB - 980.56
SLQB - 972.17
SLAB - 967.72
SLOB - 967.72
Not much difference really, but every little helps, as they say at Tesco (Britain's leading super market/Religion).
Thanks to RcrdBrt for trusting me with this nice little study, and to Chainfire for the very useful benchmarking app. (See here)
Methodology
Technical detail (it's not necessary to read this!)
I used CF-Bench to obtain values for:
Native MALLOCS
Native Mem Read
Native Mem Write
Java Mem Read
Java Mem Write
...and used statistical methods to boil 10 passes for each allocator down to one final score: 10 passes is enough to get the mean and median within one percent of each other, i.e. establishes a confidence in the mean, assuming normally distributed data. Three standard deviations are subtracted from the mean, to show the minimum score we'd expect 99.7% of results to be above. This is to penalize variability (high peaks are less significant than a good consistency). The geometric mean is taken of the five end results to provide one final score.
Bedalus can't live without benchmarks. Closed UX benchmark yesterday and opened a new one today
One more instructive and helpful benchmark buddy
Most kernel developers prefer using SLQB. Any thoughts on why this is the case? Do they have a special reason for that or just a matter of not knowing better? Cheers!
RcrdBrt said it was the fashion! Tbh i don't know the differences between them, but some casual googling showed that slub has had success in other Linux arenas
apatal said:
Most kernel developers prefer using SLQB. Any thoughts on why this is the case? Do they have a special reason for that or just a matter of not knowing better? Cheers!
Click to expand...
Click to collapse
Buzzwordism. The other 3 are in kernel.
Sent from my Nexus Prime.
morfic said:
Buzzwordism. The other 3 are in kernel.
Click to expand...
Click to collapse
You mean all those allocators are present but they mention only SLQB because it's the trend?
Sent from my Nexus S
apatal said:
You mean all those allocators are present but they mention only SLQB because it's the trend?
Sent from my Nexus S
Click to expand...
Click to collapse
Correct.
Sent from my Nexus Prime.
Man, bedalus is really fast!
@morfic:
So, in the ICS stock 3.0.8 kernel, how many and which allocators are present?
Am I correct to say that all those four allocators are present, but only one is used?
Sorry for these dumb questions.
glennkaonang said:
Man, bedalus is really fast!
@morfic:
So, in the ICS stock 3.0.8 kernel, how many and which allocators are present?
Am I correct to say that all those four allocators are present, but only one is used?
Click to expand...
Click to collapse
SLAB, SLUB, SLOB are in most (mainline) Linux Kernel, SLQB is patched in.
bedalus, did you run this on GB too?
morfic said:
bedalus, did you run this on GB too?
Click to expand...
Click to collapse
I can do it if i have 4 identical gb kernels (except for the allocator) ... If you'd like it testing you'll have to build them for me?
Some guys found a huge optimization for Linux kernel and Dalvik as well on ARM platforms. Actually it is not made by optimizing the code, but by the way GCC compiles, and it increased the performances from 30% to 100%. There is a little video of them running a benchmark on two Android development platforms, the two development platforms are the same. Here it is :
So the ROM used is the same used by the Galaxy Nexus, and Cyanogen Mod now uses it to gain these 30%~100%. What are your feelings about it ? Are you pessimistic, optimistic about the implementation for example for stock Atrix ROMs ? Or community ROMs maybe ? Also tell us if you have some news about it.
So this optimization was made mainly for the Linux kernel on ARM devices, which means it will be way more efficient on ARM computers/servers. This is a great step forward for Linux on embedded platforms. They also worked on Dalvik, so now even Android apps will run faster.
(Sorry for my grammar if I made some mistakes, just tell me I'll correct them.)
Very impressive performance increase. Looking forward to seeing these optimizations make there way into custom roms.
Can you post more info about how this works? Or a link to the original GCC discovery?
Linaro is a hot topic in the Samsung forums. Even the OG SGS (basically half the specs of the Atrix) users are begging support for it...
Sent from my MB860 using XDA Premium App
It will be implemented in the aokp #39 release.
Inviato dal mio Atrix con Tapatalk
AkaGrey said:
It will be implemented in the aokp #39 release.
Inviato dal mio Atrix con Tapatalk
Click to expand...
Click to collapse
how do you know that???
facuxt said:
how do you know that???
Click to expand...
Click to collapse
http://www.droid-life.com/2012/06/1...stem-performance-boosts-are-quite-noticeable/
Sent from my MB860 CM7.2 RC3 36p Radio
It seems it is not easy to get this to work on every CM device. Some people report issues with this patch: http://r.cyanogenmod.com/#/c/17535/
v.k said:
It seems it is not easy to get this to work on every CM device. Some people report issues with this patch: http://r.cyanogenmod.com/#/c/17535/
Click to expand...
Click to collapse
It may not be easy, but I got a feeling that many people from CM teams everywhere are gonna work round the clock to get this to work on their devices.
I'm definitely not a "benchmark guy", and generally shrug off those topics, but even I was blown away after watching this video...
Sent from my MB860 using XDA Premium App
Wow. Awesome!
Imagine that guy talking one on one with a girl though......
Sent from my MB860 using XDA
rancur3p1c said:
Can you post more info about how this works? Or a link to the original GCC discovery?
Click to expand...
Click to collapse
Well all is about the compilation, they didn't change the code but the way GCC compiles it, it is now optimized for the ARM instruction set. So, why wasn't it optimized before the "discover" ? Well I think they didn't take enough time on build optimization when they made GCC working with ARM. First it was made for x86, x64 etc. These are other instruction set, another list of commands the CPU is able to work with. Imagine the instruction sets like different languages. x64, the first one, has a rich vocabulary, and ARM the second one has a more restricted vocabulary but the two languages have the same syntax. The difference will be that you will need to use more words with ARM than with x64 to describe something complex, so now it has to be optimized to use the fewer words possible to be faster. And that's basically what the Linaro Team did.
So the optimization has been used for the Android System (Linux kernel + Dalvik, etc.) but it can also be used for any other ARM program. This is a great step forward also for ARM computers, and maybe ARM servers that will continue to use less energy for bigger tasks because of the optimization.
Slymayer said:
Well all is about the compilation, they didn't change the code but the way GCC compiles it, it is now optimized for the ARM instruction set. So, why wasn't it optimized before the "discover" ? Well I think they didn't take enough time on build optimization when they made GCC working with ARM. First it was made for x86, x64 etc. These are other instruction set, another list of commands the CPU is able to work with. Imagine the instruction sets like different languages. x64, the first one, has a rich vocabulary, and ARM the second one has a more restricted vocabulary but the two languages have the same syntax. The difference will be that you will need to use more words with ARM than with x64 to describe something complex, so now it has to be optimized to use the fewer words possible to be faster. And that's basically what the Linaro Team did.
So the optimization has been used for the Android System (Linux kernel + Dalvik, etc.) but it can also be used for any other ARM program. This is a great step forward also for ARM computers, and maybe ARM servers that will continue to use less energy for bigger tasks because of the optimization.
Click to expand...
Click to collapse
You lost me at compilation...lol
Sent from my MB860 using XDA
So they found a way to optimize compilation for arm architecture yielding massive performance boosts over current standards.. do want =D
These dudes rock.
Sent from my MB860 using XDA
michaelatrix said:
You lost me at compilation...lol
Sent from my MB860 using XDA
Click to expand...
Click to collapse
They found a way to talk to the system by saying less. Like if I would say to you, " hello, how are things in your life" but now I say, "how's things" and you understand both phrases mean the same thing. You get to the conclusion faster because you process less information but reached the same outcome. It takes less processing for the shorter phrase and improves overall response time.
Sent from my MB860 using xda premium
i don't think it'll be easy to use it for our beloved atrix, the linaro code uses a 3.2 kernel, and we're still stuck on the crappy froyo 2.6.32 kernel =/