CodeWalrus

Development => Calculators => Calculator News, Coding, Help & Talk => Topic started by: Dream of Omnimaga on July 01, 2016, 03:14:10 AM

Title: Prizmpocalypse: The mystery of the mass Casio fx-CG10/20 bricking
Post by: Dream of Omnimaga on July 01, 2016, 03:14:10 AM
A few years ago, if you visited Cemetech on a regular basis, you probably have read many user reports about Casio fx-CG10 and fx-CG20 graphing calculators mysteriously stopping working after being turned OFF and never turning ON again. There was a lot of speculation and rumors about what could be the cause, with people suggesting the possible following causes:

-OS 2.00 bugs (kinda like TI-Nspire OS 2.0.0, 3.0.1 and HP Prime Firmware v10077)?
-Overclocking causing capacitor damages?
-A problem specific to certain third-party ASM/C add-in bugs?
-Faulty hardware on early batches of Prizm calculators?
-Transfering files over USB too often?
-Powering the calculator via USB instead of batteries?
-Turning the calculator OFF requiring more power and causing one of the aforementioned hardware issue?
-Very short flash chip lifespan only allowing a few rewrites (like on the early Algebra FX 2.0 and FX 1.0 Plus hardware versions)

It was not very clear to most of us (unless you visited Cemetech Prizm-related topics on a regular basis) what was the exact cause, but in 2015, TeamFX might have found the exact cause: A certain hardware component located near the Reset button (https://www.cemetech.net/forum/viewtopic.php?p=234487#234487). In this post, it is suggested that overclocking the calculator just once could shorten its lifespan significantly, but that only the Prizm seemed to be affected, despite other models having the same faulty component. A new hardware revision came out in 2015, according to his picture, which fixes the issue.

Recently, @Juju got his Casio Prizm back and started developping softwares again, only to see his calc die for good two days later. The symptoms were similar to the past broken calculators, which sparked a lot of discussion about it in #codewalrus IRC with @gbl08ma . He seems to conclude that what TeamFX suggested might be the cause and mentions that on Cemetech, no bricked Prizm case ever got solved. Judging by what seems to cause the bricking and what causes the decrease of this calculator model lifespan, I suspect that my own Casio PRIZM will suffer the same fate within the next year or two (or before, since I'm about to lend it to juju for software development), judging by how I used it in the past.

So what we suggest is if you own a Casio Prizm that got manufactured before 2015, then don't overclock it and avoid using add-ins that overclocks the calculator automatically, especially that those add-ins used the old overclocking method that is less safe than the new one (it is unclear which add-ins in particular do). We also suggest to only let the calculator be powered via USB and transfer files when necessary.

Hopefully, more accurate info can be compiled together in the near future to clarify the issue and hopefully prevent the bricking of more fx-CG10/CG20 calculators.
Title: Re: Prizmpocalypse: The mystery of the mass Casio fx-CG10/20 bricking
Post by: gbl08ma on July 01, 2016, 10:35:26 AM
Yesterday I built a comparison table with the different cases of known Prizms bricks, but I can't draw a clear conclusion because there's too much data missing. I sent it to TeamFX who told me to send it to Simon Lothar, which I'm yet to do. Anyway, both already said, long ago, what they think about this: that it's a hardware defect, perhaps coupled with the hardware not being designed for overclocking (much less the "improper" overclocking that only increases the main CPU clock without increasing the other clocks to match). This was their opinion even before the faulty component near the reset circuitry was identified; when it was identified it only confirmed their suspicions. I don't think they are especially available to look at this "mystery" again.

Most bricked Prizms seemed to have my Utilities add-in running or installed at some point, but there have been many versions of Utilities and we can't be sure which Utilities version was in use (except in a few cases like Juju's, which was a self-compiled build already sent to me for posterity).
v1.3 and later versions included changes the only thing remotely possible thing that could be causing bricks. (v1.3 was released in March 2014)

TeamFX suggested that the fact that Utilities is installed on many of the bricked calcs as coincidental, because the group of advanced users who go on calculator forums like Cemetech and Codewalrus has a strong overlap with the group that would know about Utilities and find it useful. Furthermore, Utilities wasn't installed on all of the bricked calculators (at least one calculator bricked even before the first version of Utilities was published).

I would post a link to the table here, but first I need to redact the calculator product IDs (serial numbers) as I'm not sure people are OK with them being made public. I only have the Product ID for one or two bricks, anyway.
Title: Re: Prizmpocalypse: The mystery of the mass Casio fx-CG10/20 bricking
Post by: Dream of Omnimaga on July 02, 2016, 07:47:44 AM
Yeah, I'm thinking that the hardware is just not done for overclocking, unlike certain other calculators. Some other calcs are probably more tolerant to overclocking than the PRIZM. It could also be a big bad batch of capacitors or calculators. After all, this happens in every domain (eg some iMacs from 2011 or so and the mass Toyota car recalls a few years ago).


And thanks for explaining how the previous overclocking method was flawed. I didn't realize the PRIZM also had multiple clocks like the Nspire CX.

As for Utilities, I think I might have used it in the past,, in addition to overclocking my calculator several times, but I could be wrong. All I know, though, is that either Raptor, Pong or Rainbow Dash Cloud Attack overclocked the calculator, and one of them also forgot to set the speed back to its previous state.


Oh and I played those games a lot and used Overclui a lot. I suspect that my calculator only has about 100 hours of use left before it finally goes belly up too. I don't blame the authors, though, because there was barely any calculator documentation back then and no evidence that the old overclocking method was dangerous at the time. What is strange, too, is that most bricking happened after 2013. It's almost like if calculators were set to not self-destruct before a certain date if they had to. Also, most bricking started happening immediately after OS 2.00 came out, which is why at first I thought that OS 2.00 was the direct culprit.
Title: Re: Prizmpocalypse: The mystery of the mass Casio fx-CG10/20 bricking
Post by: gbl08ma on July 02, 2016, 10:33:27 AM
Here is the comparison table with the various known cases of Prizm bricks: https://docs.google.com/spreadsheets/d/1hXEGg71-iG36w7dvmOoWzMYhH0939VkrDagIivW3wmw/edit?usp=sharing

Indeed, most bricks seemed to happen after 2013. In 2014 alone, there were 5 bricks, but in 2015, zero bricks. There are explanations for this more likely than planned obsolescence or OS versions. For one, when people see new reports of bricked Prizms, they tend to report theirs too, where otherwise they would just assume it died of old age or because it fell to the floor X weeks ago, etc. For example, user jubjub449 at Cemetech bricked two Prizms, one in April and another in May 2014, but only bothered to post on the forum when the second one broke. Had the replacement kept working fine, we might not have heard about the first brick at all. Throughout 2015 there have been no reports of "mysterious" bricks, this may be because by then the brick reports thread was already buried, and I doubt it is the first thing that appears on Google searches for "broken prizm" or something like that. Also note that the Prizm doesn't seem to be particularly popular in English-speaking countries and people from e.g. Portugal, Spain or France may not be as inclined to create an account and post on an English-speaking website.

For similar reasons, we also only hear reports from users who already have forum accounts and also tend to have many add-ins (including Utilities) installed. Most other people, even if they visited some calculator community to download games, will not bother to look for a forum thread where people complain about bricks, much less post there.

Finally, even if there is a strong correlation with OS 2.00 coming out and more Prizms becoming bricks, this may not be due to version 2.00 per se, but with the fact that many of the calculators that got updated to that version had already gone through lots of OS updates. Consider that many people bought their calculators at a time when they came preloaded with OS 1.02, and that people on calculator forums usually keep the OS on their calcs updated to the latest version (or at least, the last version to support custom programs/Ndless/whatever). This means that to reach OS 2.00, they might have been updated three times: from 1.02 to 1.03, then to 1.04 and finally to 2.00. If we link this with the reports from Simon Lothar that components in the Prizm board got scorched when he repeatedly flashed custom (and non-custom) operating systems, we may have an explanation for why many bricks matched with the release of OS 2.00.

As for why the bricks seem to happen on the process of entering standby: well, that may be because entering standby requires saving at least 600 KB of RAM to the flash, which corresponds to five flash sectors that need to be erased and written. This certainly requires some additional power compared to just making calculations or even writing small files with eActivity and add-ins, putting additional pressure on the power supply circuitry that may already be weakened from OS updating, overclocking or whatever.

Also note that none of the bricks for which we know the hardware revision correspond to the newer revision 001V04.
Title: Re: Prizmpocalypse: The mystery of the mass Casio fx-CG10/20 bricking
Post by: Dream of Omnimaga on July 02, 2016, 07:38:37 PM
I have the feeling only 1% of people have publicly reported bricking issues with their calculator. Most calc owners don't know there's an online community centered mostly around them.

I wonder if limited flash rewrites could be the culprit? After all, the 1999 FX 1.0 and AFX 2.0 calcs stopped working after just a few years due to the flash chip getting worn out too fast. The 2001 hardware revision solved the issue.
Title: Re: Prizmpocalypse: The mystery of the mass Casio fx-CG10/20 bricking
Post by: gbl08ma on July 02, 2016, 07:53:48 PM
TeamFX believes that isn't the case as the flash is NOR. The flash chips used guarantee an endurance rating of 100000 erase/program cycles and data retention of at least 20 years; this is, if used properly, of course: if under- or over-powered anything can happen.

Personally I think that if failing flash chips were the culpirit we'd see different types of errors before the final hang on shutdown: corrupted documents, disappearing add-ins (due to checksums not matching), OS errors in places other than the shutdown routine, etc.
Title: Re: Prizmpocalypse: The mystery of the mass Casio fx-CG10/20 bricking
Post by: Yuki on July 02, 2016, 11:17:23 PM
Yeah, the weird thing is that nothing foreshadowed it, plus the fact the part that breaks did not immediately shutdown the calc, instead making the shutdown process longer? Seems that piece isn't essential to keep the calc powered on, but the bootloader/OS choke on it, preventing a boot.
Title: Re: Prizmpocalypse: The mystery of the mass Casio fx-CG10/20 bricking
Post by: gbl08ma on July 03, 2016, 09:40:08 AM
The component in question could be a voltage regulator. If it malfunctions, its output level may go higher or lower than desired. The OS wouldn't detect this, because the ADC used to measure the battery tension goes before any regulator. If the flash chip is supplied with a too high or too low level, then it may cause unpredictable behavior. It could just lock up, and the OS and/or the CPU (which can execute code directly from the flash) will wait forever for the last operation to complete. Or it could begin returning garbage, and the OS wouldn't know what to make of it either, especially because all the error handlers are in the flash, and that is not available. So I can see why the OS would just lock up.

As to why it seems to happen more on shutdown, it may be because of what I already explained: entering standby might be more power-demanding (and in a short amount of time, so it's a demand peak) than most other operations which only require reading data from the flash, and probably more than operations on the filesystem, which as far as we know are carefully designed to decrease flash wear, minimizing write cycles. This additional demand may cause the output of an already failing regulator to go too far off from the accepted levels.
Of course, the best way to validate this would be to connect an ammeter between the batteries and the Prizm, and measure the consumption during different operations. Supposedly, the Prizm has those two big capacitors precisely to minimize the effect of these "demand peaks", but I don't know if they go before or after the voltage regulators, which means they might do nothing to smooth the output of a failing regulator.

Also, I don't think the regulators necessarily become visibly scorched when they break, which may explain why some Prizms look "normal" on the outside. Taking some measurements on a working Prizm and then on a broken Prizm and comparing them would be useful.

Also useful to know is that some of the bricks at least would keep consuming batteries even without visibly powering on. There are at least two possible explanations for this. One, very simple, is that some failed component is shorting the batteries. The other, more elaborate, says the CPU gets stuck in a loop trying to run an error handler on an erased sector. I think it has been determined before that 0xFFFF (the contents of an erased byte in flash - erasing sets all bits to 1, writing lets you choose which bits to set to zero) is an invalid instruction and so the CPU would keep trying to run the "invalid opcode" handler, except that is also 0xFFFF, and so on. This means that at least the power supply for the CPU was working, or at least, "good enough" for it to waste battery.
Title: Re: Prizmpocalypse: The mystery of the mass Casio fx-CG10/20 bricking
Post by: Yuki on July 03, 2016, 06:37:11 PM
Nice theory here, but I guess you'd need a chip programmer to figure it out?
Title: Re: Prizmpocalypse: The mystery of the mass Casio fx-CG10/20 bricking
Post by: gbl08ma on July 03, 2016, 06:44:21 PM
At Cemetech people found out where to bring out all the pins of the flash, but the problem is getting some sort of device that can connect to all of them. Since no one had a proper programmer, and since Arduinos and etc. don't have enough pins some tests were made using an oscilloscope but only garbage data could be read from a bricked Prizm. It could be that the data read (only a few bytes, anyway) was indeed what was in the flash, or some mistakes were made.

Without a reliable way to unbrick Prizms to then try to brick them again and see what causes the bricking, there's not much that can be done to confirm these theories... but we don't manage to unbrick any Prizm because we're not even sure of what bricked them or what components are affected, so we're stuck in this catch-22.
Title: Re: Prizmpocalypse: The mystery of the mass Casio fx-CG10/20 bricking
Post by: Dream of Omnimaga on July 04, 2016, 01:16:59 AM
Do we know at which speed the calculator runs at during the shutdown (where the Casio logo appears) process? (without the user applying overclocking by himself beforehand)
Title: Re: Prizmpocalypse: The mystery of the mass Casio fx-CG10/20 bricking
Post by: gbl08ma on July 04, 2016, 10:54:04 AM
I don't think the OS changes any of the clocks of the chip other than when initializing them when the calculator first powers on. I have not confirmed this, but I believe Simon or TeamFX would have noticed this when inspecting the OS. Also, when overclocking calculators we noticed the clocks are preserved even after going into standby, so if the OS changes them it is careful enough to preserve the previous settings and restore them, instead of just setting them back to the 58 MHz setting.

When the CPU starts it's always running at about 3 MHz. It's up to the user code to change that. As far as I know the OS sets the main clock to 58 MHz and never changes from that. Some years ago, I suggested on a Cemetech post that the OS could decrease the CPU clock when idling (e.g. while waiting for input between calculations) to save battery. TeamFX replied that the hardware is not really designed for dynamic clocking, which means it's really meant to run always at the same frequency. Because of this, I doubt Casio is over- or under-clocking the CPU at any point, much less on shutdown.
Title: Re: Prizmpocalypse: The mystery of the mass Casio fx-CG10/20 bricking
Post by: Dream of Omnimaga on July 06, 2016, 03:59:58 AM
I see. I was worried that the Casio OS was overclocking the calc at unsafe speeds to prevent the shutdown sequence from taking as long as the TI-Nspire CX booting sequence. I know that some people complained about how long it takes for the Nspire to boot/shutdown before.

I am curious if Casio ever switches to ez80 or ARM if they will make more use of changing clock speed when idle. IIRC the TI-84 Plus CE runs at 6 MHz during non-heavy operations and 48 otherwise, but I could be wrong.
Title: Re: Prizmpocalypse: The mystery of the mass Casio fx-CG10/20 bricking
Post by: gbl08ma on July 06, 2016, 10:37:09 AM
They could be making use of dynamic clocks on SH4 if they had designed the hardware to support it - that is, assuming the hardware can't indeed cope with it, because if it already can, then it's just a matter of software. And I don't see Casio switching to Z80 or eZ80, as it would mean a huge architecture downgrade. Casio already makes great use of stuff like the MMU, and their OS is written mostly in C and C++. It would also need to be a quite fast Z80 (probably clocked higher than 100 MHz) to have the same MIPS as with the SH4. This would be a requirement, as I don't see them optimizing or rewriting their math core or picture routines to fit on a slower CPU.

They'll either keep using SH4 through their partnership with Renesas (and we'll eventually get to a point where the only client for SH4 processors is Casio), or switch to ARM. On a less likely possibility, they'll go for a more "exotic" solution such as very-low-power x86 chips (but I don't think anyone makes those anymore), like they did with the AFX.
Title: Re: Prizmpocalypse: The mystery of the mass Casio fx-CG10/20 bricking
Post by: Dream of Omnimaga on July 08, 2016, 08:36:41 PM
Ah I didn't know that SH4 was so much more powerful than the ez80. I guess I was misled by how the Prizm speed didn't seem much better than the TI-84+CE but it's most likely Casio optimizing their OS even less than TI.

As for switching from SH4, my main worry was this: https://www.cemetech.net/forum/viewtopic.php?p=250125#250125 . What if Renesas decided all of a sudden to stop making SuperH processors altogether and that even Casio was forced to switch to something else?

I also didn't know the AFX used something similar to x86. Is that why the AFX calculators used files similar to exe's? IIRC, they had a .exe extension or maybe it was .lec.
Title: Re: Prizmpocalypse: The mystery of the mass Casio fx-CG10/20 bricking
Post by: Yuki on July 09, 2016, 09:11:42 AM
Renesas switching to ARM would make sense to me. Mostly every embedded application nowadays (that isn't trying to be compatible to something made 20 years ago, eh TI?) use ARM and their processor would definitely sell a lot more if they use that instruction set. As stated on Cemetech, plans to make a SH5 most likely failed, and what do you do when plans fail? Abandon it, then either go bankrupt or manufacture something similar to the competition, which is, luckily in our case, an already successful product that happens to license the instruction set to anyone asking.

In short, TI is making ARM calcs, HP too, it's used on most Android phones, it's on the Raspberry Pi and the likes too, if you can't make a CPU that can compete with ARM, better make ARM CPUs.

Note that it also works for Intel CPUs, but Intel was less willing to license the instruction sets. That didn't stopped a bunch of companies making Intel clones and compatible CPUs, but, as far as I know, only AMD is still successful at it. And probably Zilog, making Intel 8080-compatible CPUs since the 70s.
Title: Re: Prizmpocalypse: The mystery of the mass Casio fx-CG10/20 bricking
Post by: Dream of Omnimaga on July 11, 2016, 11:36:24 PM
In the event where Casio would switch to ARM processors, I think it might be time to write a new MLC. Claw language could maybe be it, but first I think we would need some library thing to allow users to make programs for the TI-Nspire CX, TI-84 Plus CE, fx-CG10/20 and once a third-party OS is available, the HP Prime. It would be much less work for people who don't mind not using the entire screen (for example, if the PRIZM LCD is 384x216 pixels and everything else 320x240, then just use 320x216.


Also I wonder if Classpad calculators also have bricking issues? It would be interesting to see if the hardware is similar to older Prizms that are at risk. I also wonder what is the future of the Classpad. I can't see this calc being a big seller, because it's not sold in USA and even if stores sold it, it would be banned at many tests, and in Europe it's friggin expensive.