QA Tech-Tips: OOPS! - When disaster strikesSafely removing large USB Flash Drives (Part 5 of a Series)

Hello again!

Everyone has, and loves, the USB flash drives, (also known as "thumb-drives" or "keychain drives"), because they are both small in size, and (relatively), massive in capacity.

Right now, 128 gigabyte flash drives are common, and I wold not be surprised if 256 gig flash drives are available when I make my next visit to Micro Center.

And this is good. I can pick up an entire copy of either my, or my wife's e-mail store and move it from one machine to another - sans network. Likewise, I can go visit a client and carry what used to be a whole briefcase of CD's and DVD's in my pocket. I can even place entire operating systems on relatively small flash media - I have a 16 gig thumb-drive that can cold-boot any one of eight different Linux images, (four different versions, in both 32 and 64 bit), of Mint 17.1.

And so on. . . . .

We all know to "safely remove", (unmount / eject), the flash-drive before we just yank it out, to prevent the data from being scrambled. We do the "safety dance" routine, and remove our media, confident that by the time the little task-bar pop-up comes along, we're golden. Right?

Ahhh. . . Not really. . . .

You see, it's not that simple anymore. Especially with flash drives.

Long ago the computer's "silicon" - processing chips - became faster than the hard drives. To reduce hard drive latency and I/O bottlenecks, operating systems implemented a policy of "Lazy" writes, and cached data.

What this means is that the operating system would allocate a fairly large chunk of memory as a very fast buffer for frequently used data. Windows would read and write to the memory buffer and then - later on when things weren't so crazy - it would write any changes made back to the hard drive. So when the computer was "busy" - refreshing the screen, or doing something complicated - it would postpone refreshing the disk until it was finished and waiting for you to do something like typing or moving the mouse.

However. . . .

As I mentioned in a previous article - The 2000 Gigabyte Gorilla - hard drives themselves began implementing a "lazy write" technology to help ease the I/O burden. Fast cache memory on the drive makes the drive look faster than it really is. The result is that the hard drive would report success before the data was actually and truly written.

Oops!

So. . . . The operating system people and the hard drive people got together and implemented a new hard drive command - flush data - which is supposed to absolutely, positively, guarantee that any changed data was written before the "success" signal was received.

The whole idea behind "safely remove", (or "shutdown"), is that the user's command to remove the media, (or shutdown the computer), would force a data cache flush to the physical drive platters, updating everything before the drive was removed, guaranteeing consistent data on the drive.

However, as the author of the article, How important is the hard drive cache? mentions, the things that make this true for actual, physical, hard drives are no longer true for NAND / NOR flash memory drives.

Double oops!

The wherefore behind the "why" in all of this is due to the way flash memory works.

Flash memory is organized into "read blocks", larger "write blocks" which are groups of many read blocks, and huge "erase blocks" which contain a very large number of write blocks.
"Read blocks" are tiny blocks of data that you can read whatever and whenever you want.
"Write blocks" are large groups of "read blocks". To write even the smallest amount of data, you have to read and modify the entire write block, copy it to an unused space within its local erase block, and mark the old write block as "dirty" (unusable until erased). Once a write block has been written to, it cannot be reclaimed and re-written until the entire erase block is purged. The only thing you can do is mark it "dirty" and try to find another unused write block. If you have a lot of data to copy, you do this write block by write block, over and over again.
"Erase blocks" are gigantic groups of write blocks. If there are no free write blocks within the existing erase block, you have to copy the entire erase block to a free area, (an entirely empty erase block), with the new data included - and then erase the entire old area, since you can't erase anything smaller than an entire "erase block".

So, for any kind of serious data access where there is both reading and writing going on, this can become a non-trivial, time consuming process. And flash drives hide this process from the O/S.

When the flash drives were small - measured in megabytes, or even small numbers of gigabytes - this time lag was not really a problem because the actual size of the memory array was small enough so that this overhead was not noticeable. Now that flash drives have become hundreds and hundreds of gigabytes in size, the overhead for writing - and the associated time lag - can become huge, even when measured by human time standards.

What this all adds up to is that the operating system - and this includes Windows, OS/X, and Linux - can no longer reliably predict when the flash media has been fully updated, and the "safe to remove" prompt may not be true anymore.

So far, (as far as I can tell), the ONLY way you can tell if the flash drive is really ready to be removed is to watch the activity LED and wait for it - eventually - to stop. And that might not even be long enough, as I have seen the LED on a flash drive flash and stop on a "remove" command, and then suddenly re-start just as I was about to remove it.

The result of all of this, is that unless you are very, very careful when removing flash media, you can inadvertently end up with a corrupted flash drive.

What say ye?

Jim (JR)