I'm getting units back from the field with corrupt flash cards, so I've
been doing some reading in the news groups for "wince corrupt" to see
what I can learn.

First, is a DOC (disk on chip) effectively the same thing as a compact
flash card? I know they are different technology, but from WinCE's
view?

Okay, back to the corruption issues. Our units *may* be turned off
periodically with no thought or concern as to what the unit is doing.
Obviously, if I'm in the middle of a write operation, the file content
may be hosed, but is it possible to lose the file system itself or a
portion of some other file? We have seen three types of corruption:

1) File system: CF is no longer bootable, garbage characters in the
filenames, etc.
2) Portions of executable files : Our main application file has lost
two major portions.
I'm still analyzing this for patterns of any sort, but the only
time this file is accessed
is when the system boots.
3) Corrupt nk.bin - system is on it's way back to us.

Ideas or suggestions?

Thanks

Re: Yet another random FAT corruption posts - WinCE 4.1 by Bruce

Bruce
Tue Jan 03 15:26:42 CST 2006

With simple FAT, it is very possible to corrupt a disk.

There are alternative file systems available that are less suseptible to
corruption. TFAT maintains multiple copies of the FAT, which is better.
Datalight has their Reliance file system which is transaction based and very
good.

It is probably too late for this, but you might want to consider a scheme
that includes a battery backup that would allow you to gracefully shutdown
when power is lost.

I am not an expert on DOC, but I think that it works very much like a CF
Memory card.

--
Bruce Eitman (eMVP)
Senior Engineer
beitman AT applieddata DOT net

Applied Data Systems
www.applieddata.net
An ISO 9001:2000 Registered Company
Microsoft WEP Gold-level Member

<cgilley@bravesw.com> wrote in message
news:1136317135.272999.284250@z14g2000cwz.googlegroups.com...
> I'm getting units back from the field with corrupt flash cards, so I've
> been doing some reading in the news groups for "wince corrupt" to see
> what I can learn.
>
> First, is a DOC (disk on chip) effectively the same thing as a compact
> flash card? I know they are different technology, but from WinCE's
> view?
>
> Okay, back to the corruption issues. Our units *may* be turned off
> periodically with no thought or concern as to what the unit is doing.
> Obviously, if I'm in the middle of a write operation, the file content
> may be hosed, but is it possible to lose the file system itself or a
> portion of some other file? We have seen three types of corruption:
>
> 1) File system: CF is no longer bootable, garbage characters in the
> filenames, etc.
> 2) Portions of executable files : Our main application file has lost
> two major portions.
> I'm still analyzing this for patterns of any sort, but the only
> time this file is accessed
> is when the system boots.
> 3) Corrupt nk.bin - system is on it's way back to us.
>
> Ideas or suggestions?
>
> Thanks
>



Re: Yet another random FAT corruption posts - WinCE 4.1 by Bill

Bill
Tue Jan 03 21:40:00 CST 2006

We have a similar situation and found that corruptions occurred exclusively
with Toshiba "MPG" compact flash cards What kind of CF cards are you using?
What file system are you using, fat12, fat16,fat32,TFAT?

Also, we are using CE 5.0 and it has auto scandisk capability. This proved
very useful in identifying when a FATFS became corrupted and also attempted
to fix it. Frequently the FS gets corrupted but the corruption is not
detected for a very long time. If 4.1 provides this or at least scandisk
capabiltity, you mayb want to consider using it.

A two partition CF card architecture had been considered where static files
(nk.bin, config files, etc.) would be on one partition and all files that
get written routinely would be on a second partition. But, we did not have
solid info indicating that this would in fact be better and did not pursue
it becasue it did not address the underlying problem.


<cgilley@bravesw.com> wrote in message
news:1136317135.272999.284250@z14g2000cwz.googlegroups.com...
> I'm getting units back from the field with corrupt flash cards, so I've
> been doing some reading in the news groups for "wince corrupt" to see
> what I can learn.
>
> First, is a DOC (disk on chip) effectively the same thing as a compact
> flash card? I know they are different technology, but from WinCE's
> view?
>
> Okay, back to the corruption issues. Our units *may* be turned off
> periodically with no thought or concern as to what the unit is doing.
> Obviously, if I'm in the middle of a write operation, the file content
> may be hosed, but is it possible to lose the file system itself or a
> portion of some other file? We have seen three types of corruption:
>
> 1) File system: CF is no longer bootable, garbage characters in the
> filenames, etc.
> 2) Portions of executable files : Our main application file has lost
> two major portions.
> I'm still analyzing this for patterns of any sort, but the only
> time this file is accessed
> is when the system boots.
> 3) Corrupt nk.bin - system is on it's way back to us.
>
> Ideas or suggestions?
>
> Thanks
>



Re: Yet another random FAT corruption posts - WinCE 4.1 by Piet

Piet
Wed Jan 04 02:45:48 CST 2006

If your changing the file system, your fat table get written too and if
that gets corrupt you can lose your filesystem completely.

What appears strange to me is that your nk.bin gets corrupt because I
don't suppose you ever overwrite that.

Do you know if your flash supports atomic writes? I think all recent
DOCs do but I'm not sure.

DOC is a flash implementation from M-Systems that offers its own
filesystem called TrueFFS. It offers extra functionality but to CE it
looks like any other flash. Is this what you use?

Two remarks:
1) I agree with Bruce that you should try very hard to avoid losing
power completely during a write operation.
2) Minimizing the number of writes that actually happen could have a
positive effect too. Of course not all applications allow this.

Regards,
Piet


Re: Yet another random FAT corruption posts - WinCE 4.1 by Remi

Remi
Wed Jan 04 03:58:49 CST 2006

I roughly agree with Bruce, Bill and Piet.

1) CF and DOC have the same underlying technology, i.e. NAND-based Flash
memory. However, M-System uses patented TrueFFS algorithms to operate DOC's
Flash, while CF uses different algorithms. Furthermore, algorithms vary from
one CF brand to another, with different speeds and robustness. The only way
to evaluate these parameters is extensive testing with different brands.
During these tests, get prepared to have your CF completely unusable when
you power-off the CF in the middle of a write: this happens with many
brands! In the past, we used now-obsoleted SanDisk "Industrial Grade"
devices. We now use Silicon Systems CF, which appear to behave correctly.

2) To get rid of 'logical' issues, running Scandisk (or equivalent) is not a
bad idea, but make sure you have a fixed version of this code, which had
quite a severe bug in it (search Google for "wince scandisk bug" for more
information.)

3) Restricting the number of writes to the CF is a very good idea, if
feasible... If not, you migth consider using pre-allocated, fixed-size files
that you can operate as FIFO. This is convenient for holding fixed-sized
records and can greatly improve CF longevity, as write accesses to the FAT
are not neaded anymore. In any case, keep in mind that NAND Flash are far
from beeing eternal devices...

HTH
--
Remi de Gravelaine
gravelaine at aton dash sys dot fr



Re: Yet another random FAT corruption posts - WinCE 4.1 by Charlie

Charlie
Wed Jan 04 09:11:22 CST 2006

Well, my initial response to all of the posts us "oh, holy crud..." -
more in a bit.


Re: Yet another random FAT corruption posts - WinCE 4.1 by Charlie

Charlie
Wed Jan 04 09:49:38 CST 2006

Well, my initial response to all of the posts us "oh, holy crud..." -
more in a bit.


Re: Yet another random FAT corruption posts - WinCE 4.1 by Paul

Paul
Wed Jan 04 10:27:25 CST 2006

It's the same as if you were trying to prepare for having the power to your
desktop PC cut in the middle of writes to your hard disk. CE has more
options for trying to protect you but the only real way to protect against
it is with a different hardware design (short-term battery backup allowing
you to shut down cleanly when A/C power goes away), or maybe a
uninterruptable power supply.

Paul T.

"Charlie" <cgilley@bravesw.com> wrote in message
news:1136389778.190910.86480@g43g2000cwa.googlegroups.com...
> Well, my initial response to all of the posts us "oh, holy crud..." -
> more in a bit.
>



Re: Yet another random FAT corruption posts - WinCE 4.1 by Charlie

Charlie
Wed Jan 04 10:48:26 CST 2006

Bruce,

Appreciate the feedback. The CE device we are using is an off the
shelf device from a 3rd party. It is likely we have little control
over this area. One thing I can pursue is investigating adding a more
reliable FS into our image.

chg


Re: Yet another random FAT corruption posts - WinCE 4.1 by Charlie

Charlie
Wed Jan 04 11:05:45 CST 2006

I could like with a two partition approach. My application writes
very, very little to the CF unless we are in debug (logging) or some
error occurs (dumping forensics). The majority of the writes are to a
user settings file that provide persistence across power cycles.

CF Cards we have used include SanDisk and some Lexar. We've lost file
systems on both.


Re: Yet another random FAT corruption posts - WinCE 4.1 by Charlie

Charlie
Wed Jan 04 11:18:21 CST 2006

I can alter some of my code to minimize writes to the flash. But I
really cannot predict when the power might go off. The loss of power
is usually not a random event - something is wrong in the system, and
we need to restart. It is highly unlikely that we are in the middle of
a write operation when the power is pulled. Even if I restructure
things, that fact is that the OS will do what it needs to do to the FAT
- this is transparent to me, as it should be. I'm kind of surprised
that one would even bother to include FATFA in the system at all, given
its obvious issues.

However, what I am more concerned about is that WinCE may "do things"
in the background. Now, I have no idea what, but it IS the operating
system and as such is considered the intelligence in the hw. It has
its own threads, and I have little direct control over it. Is it
reasonable to be concerned about some background operation nailing my
file system?


Re: Yet another random FAT corruption posts - WinCE 4.1 by Charlie

Charlie
Wed Jan 04 11:31:05 CST 2006

Paul - I understand what you are saying. I can rework my code to
minimize the chance of this happening.

Do you think it is reasonable for me to be concerned about the OS doing
a FAT operation independent of any of my application code? Maybe I'm
being overly pessimistic....


Re: Yet another random FAT corruption posts - WinCE 4.1 by Paul

Paul
Wed Jan 04 12:33:05 CST 2006

Probably not a write, unless you are using hive-based registry. This is a
problem that really can only be fixed with the assistance of hardware. If
you need to be 100% sure, that's the *only* way you can get there.

Paul T.

"Charlie" <cgilley@bravesw.com> wrote in message
news:1136395865.044637.67530@g49g2000cwa.googlegroups.com...
> Paul - I understand what you are saying. I can rework my code to
> minimize the chance of this happening.
>
> Do you think it is reasonable for me to be concerned about the OS doing
> a FAT operation independent of any of my application code? Maybe I'm
> being overly pessimistic....
>



Re: Yet another random FAT corruption posts - WinCE 4.1 by Charlie

Charlie
Wed Jan 04 13:08:58 CST 2006

Paul,

Okay, I'll not worry about what the OS is doing behind my back (if
anything) for the moment. I agree, if I want 100% certainty, I need
to make h/w changes. We'll see how much pain it takes to trigger that
decision.....

Everyone - thank you for your comments and experiences....

chg


Re: Yet another random FAT corruption posts - WinCE 4.1 by Piet

Piet
Thu Jan 05 04:56:49 CST 2006

Two things from your posts:

> My application writes very, very little to the CF unless
> we are in debug (logging) or
> some error occurs (dumping forensics)

> The loss of power is usually not a random event - something is wrong in the system,
> and we need to restart.

So when something is wrong, you log some information and restart. Are
you certain that the write operation is complete before you restart?

Regards,
Piet


Re: Yet another random FAT corruption posts - WinCE 4.1 by Charlie

Charlie
Thu Jan 05 06:17:06 CST 2006

Piet,

My system isn't that simple :). My touch screen panel is powered
from another embedded system. There is no signal (yet) between us
telling my device it is about to lose power. So, even though I write
very little to the flash, a relative thing I know, it is possible that
I am in the middle of a write operation when I'm powered down. The
corruption we have seen have all been in non-debug mode.

chg