We are having a problem and hopefully someone in this
newsgroup with some internals of WinCE experience can
help.
Platform: National Semiconductor's SC1200 Geode (actually
an SC1200UCL-266, now owned by AMD), 266 MHz, 64MB SDRAM,
NTSC video out, no hard drives, USB network.
Executive summary: Browser performance of Windows CE.NET
4.2 is 50% of the performance of Windows ME. While there
are smaller contributing factors (temp files and the CE
filesystem compression, USB network drivers a bit
slower), the largest factor appears to be the division of
the CPU between running the browser and spending time in
various threads of drivers.exe.
This characteristic will impact any application that
interacts with the screen.
According to the folks that actually build the image, the
platform is setup in release mode, not in debug mode.
The root question is "How can this be fixed or tweaked?"
Details (of which I've eliminated a lot in an effort to
cut to the chase):
One critical question: "Is the CPU running as fast in
both environments?"
To answer this, the Sieve of Eratosthenes Benchmark
program was located on the web, compiled and put into the
CE build. Calculating primes up to 28 million was the
value used in all the sieve tests.
On Windows ME, it ran in 16 seconds.
On Windows CE, it ran in 20 seconds.
If you pipe the output to a text file:
On Windows ME, it ran in 16 seconds.
On Windows CE, it ran in 9 seconds.
If you telnet into the WinCE system and run the Sieve to
the telnet session's output, it runs in 9 seconds.
The output of the sieve is a message every time it finds
2000 primes totaling about 20 KB of text, almost all of
it is on one line that overwrites so the screen does not
scroll during the calculations.
When this piped to text file output is sent to the screen
via the cmd.exe prompt's type command, it takes less than
2 seconds to display the entire file.
The sum of the parts (9 to crunch, 2 to display) is
significantly less than the whole (display and crunch)
even though the first case has extra work that is
happening not present in the case where both happen at
once -- opening up the file, writing data, compressing
the data, closing the file, opening the file, reading &
expanding)
There is something about WinCE and how it interacts with
the screen that makes a program that is both crunching
data and displaying information iteratively cost nearly
twice as much time as a program that first does all the
crunching and then all the displaying in one burst.
The iterative nature of a browser (get the HTML, display
some text, spawn two threads getting two graphics, #1
arrives so render it while #2 is still arriving, #1 is
done, start downloading #3, #2 is now here so render
it, ...) follows the same compute-display-compute-display
pattern the sieve is going through and pays the same
penalty for this behavior.
To further prove and attempt to isolate, both the sieve
to a file (fast), sieve to the screen (slow), and
browsing (slow) were run and thread activity captured
with the Windows CE Remote Kernel Tracker.
With sieve printing to the screen, the CPU alternates
between sieve.exe and device.exe, looking like a square
wave with 70% of the time in device.exe and 30% of the
time in sieve.exe. I have the actual screen captures and
data files if anyone is interested.
When the output is piped to a file, 95% of the time is
spent in sieve with what look like very short "clicks"
(viewing audio on an oscilloscope) of CPU time up to
device.exe.
Curiously, a modified version of sieve that doesn't print
anything until the final number of primes is found
visually looks the same as when the output was piped to a
file.
It almost looks like when a process wants to do something
to the screen, it calls some CE driver that formulate the
screen activity into chunks of commands which are then
queued up into another process' input queue. When that
other process wakes up, it takes the messages off its
input queue and actually renders the screen data. That
message passing overhead and scheduling between the two
processes seems absent when the whole screen activity is
sent out all at once, as when sending the saved output
text file to the screen.
Device.exe has > 50 threads -- if someone can explain how
to translate thread ID into something more meaningful
(like a .DLL name, or driver name, or anything) I'd love
to answer exactly where lots of the CPU time is spent and
have further clues as to where to look. But it bounces
all around ~8 of device.exe's threads with lots of "Wait
for multiple objects" synchronization icons according to
the kernel tracker.
WinCE's internet explorer exhibits similar though much
more complicated patterns as it is getting data off the
network and has many internal threads (12) bouncing all
around. Bottom line is IE looks like it is suffering from
lack of CPU due to drivers.exe just like sieve is.
Performance was measured by comparing the download of the
same data repeatedly over time from the same server
confirms it runs at 50% of the speed Windows ME is
running at on the same platform.
I really expected WinME on a hard drive to be slower, not
faster than WinCE running entirely out of RAM.
If anyone has any ideas on how to make things work
better, any assistance is greatly appreciated.
Thanks in advance!
David Soussan
My Email is the first two letters of my first name, the
first letter of my last name, at yahoo dot com.
-------
Other interesting performance characteristics found
during this investigation:
The impact of WinCE's filesystem compression as applied
to network throughput were measured by downloading /
uploading both a very large .zip file [expensive to run
compression on] and a file filled with 00 bytes [very
cheap to run compression on].
Pushing data into WinCE, we measured 353 KB/s for a zero
filled file and 211 KB/s for the big .ZIP file. The cost
of compression hit the network bandwidth down to 65% of
its maximum potential.
The reverse was not true; for pulling data out of WinCE
we measured ~253 KB/s no matter which file was pulled.
-------
It makes no difference to either ME or CE if the graphics
are easily compressed or actual graphics that aren't as
easily compressed on the net throughput browser speed. It
also makes no difference if the image is actually fully
rendered to the screen or not -- shrinking the window so
the image doesn't have to be displayed did not improve
the page refresh rate by a significant amount.
-------