Friday, October 27, 2017

Testing Memory or "What did I come in here for?"

**DISCLAIMER - I am not an electronics expert. Some explanations here may not be 100% accurate. Purists/pedants, feel free to write in the comments with corrections if you feel the urge! Thank you.***

A little while ago I acquired a faulty A501. This is the official Commodore A500 memory expansion but, sadly, it just does not work. When plugged into my A500+ it just does not register that it is there. There are many reasons why this might be and I have done my best to work out what the problem is. Despite there being some battery damage (curse you Varta!) this wasn't too bad and there are no damaged tracks despite it looking a bit crappy.

Faulty A501 - Bottom Chips Removed

I have come to the conclusion that there must be one or more faulty memory chips on this board. There are 16 memory chips on this particular revision of the A501 (Rev 5). Later revisions had eight chips as they were twice the memory capacity.

Our chips are Sanyo LM33256G-12 (the -12 means 120ns which is the 'speed' of the memory. The lower the number the better. These ones are average.)  I couldn't find the datasheet for these exact chips but I did find an equivalent, the NTE 21256, which, after much poring over the A501 and A500+ schematics, seems to be pin and functionally equivalent to the LM33256G.

Now to the point of all this. I wondered if there was any way to test the RAM chips and, if possible, identify which ones were faulty. After some Googling, it became clear that I was not the only one to have had this thought. A couple of sites showed Arduinos running relatively simple software to test each bit of vintage RAM chips.

I have in my drawer an Arduino Nano clone that I bought from China a year or two ago. They were so cheap, they didn't even sell them singly, I had to buy two, hence why I have a spare. The other sites had used Arduino Uno (the original style/size) or just the microprocessors on their own but I was sure that the Nano could do everything required.

There is one important thing about this type of memory. It's actually called DRAM or Dynamic Random Access Memory. The 'D' bit means that the contents need to be continually refreshed to keep hold of their data. Typically, this is required every few milliseconds. Having read through the information I can find, executing loops to continually read/write data should mean I don't actually have to worry about this since the act of reading or writing provides refreshes to the rows/columns. We shall see...

First problem, the pin numbering of the Arduino Nano seems to bear no relation to the physical pins. While trying to shamelessly blag the code of one of the other chaps who has done this I noticed that the pin allocation for the 'DONE' LED seemed to already have been used. A lot of googling and staring at images of Arduino Uno and Nano later I worked out that on the Nano, the pin outs can be described in many, many different ways. I had assumed the numbers I was looking at were the physical pin numbers, starting from top left with the USB port at the bottom. Not so. The numbers refer to the pins on an ATMega 328 microcontroller. And, they were not the actual physical pin numbers but the 'port' pin numbers for the Dx pins. So pin 2 was actually PD2 (shown as D2 on the Nano). After this penny dropped it became a little easier to work out the connections between the Nano and the 33256 chip.

In any case, here's a professional grade* drawing of the actual pinout for an Arduino Nano to the 33256 (or equivalent).

Pin Diagram 

*may not look professional

The 'C' program I used came from here. I pretty much used it exactly as it came. It uses a couple of for-next loops to write the column address and row address and then write either '1' or '0'. The act of writing (or reading) the bits in this way does have the effect of refreshing the rows and columns so that it's not necessary to provide separate refresh code. Thank goodness for that.

Finally wired up

Having wired everything up I took the first chip that I had extracted and plugged it in. After turning it on, the 'L' LED on the nano flashes. What should happen is, when all tests are completed, one of the discrete LEDs flashes - the one on the furthest right on the white breadboard. I only used red LEDs but the code says what colours the original author used. If the tests fail then the 'RED' LED lights constantly - which is what happened.

I couldn't believe that the first chip I had was faulty so I wired up the data lines (A0 to A8 from the nano to the RAM) to some other LEDs. They counted up in binary - very quickly - as expected. I rewired it back to accept a chip but modified the code by commenting out all tests except the 'fillzero'. At least this would be start.

Same result.

I went through eight chips and finally found one where the LED flashed to indicate tests were completed. After de-commenting the other tests and reloading the nano, I ran the code again with the same chip.

Success! All tests completed and the LED flashes indicating a working chip.

I was finding it difficult to believe that I had only one working chip out of eight so I got out my DIY oscilloscope which I built from a kit. I hooked it up to the 'DOUT' line on the RAM as I knew that this was where the check was made in the code to see if the address being read contained what was expected. If the DOUT didn't have what was expected, then the LED would be lit constantly.

On the working chip it was easy to see what was happening. Pulses on the DOUT line could be seen indicating '1' output. On the alternating '1' and '0' fill it was obvious that there were '1's followed by '0's. And it was very cool seeing it in action too.

Working DRAM - DOUT Line

Reading the all '1's test 

Reading the alternate '0's and '1's test

Reading the '0's test - not much to see but no '1's!

So I then picked a chip that had been failed by the code and tried the same. This chips DOUT stayed resolutely at 0V, even when it should have been pulsing '1's. Dead chip.

On to another one. This was more interesting because there were some '1's shown during the fill test but there were also random '0's too. This indicates that there is potentially an issue with this one giving read errors.

Of the remaining five chips, two more came out working (yay!) while the others were variations of the above. One just seemed completely dead while the others each show missing '1's on the fill test to varying degrees. A hit rate of 5/8 faulty chips seems high but, to be fair, the three chips that worked just keep working - every time.

I will post updates on this if I find out any more once I get the other eight chips off the A501 board. :)

Below is a badly written explanation of what goes on in these chips (this was originally at the start of this blog but I though would put a lot of people off):

Each chip on this board has 262,144 word x 1 bit RAM which basically means (262,144 / 8 ) x 1 bytes of memory i.e 32,768 bytes or 32kb per chip. 32 times 16 equals the 512Kb of expansion memory.

(A quick note on terminology here. When I talk about kilobytes I actually mean 1024 bytes. This is the way I learned in the early eighties. Any talk of 'kibibytes' will result in a tantrum. A kilobyte is 1024 bytes to me and always will be even if it is not totally accurate to IEC, ISO or IEEE or whoever decides this stuff. So there.)

So, to access this memory, we need a way to address each bit. With 262,144 bits, dividing by 512 give 512. So, imagine 512 rows and 512 columns. Each intersection (bit) can be described by a row and column address (simple enough). To write a '1' in the chip we tell it we want to write and then where to write it with the correct row and column addresses. But you may have noticed that the chips on the A501 don't have hundreds of pins. They have 16 pins. So how do you get 512 x 512 out of sixteen pins?


And do everything twice.

Let me explain.

You can represent numbers in base 2 (for anyone under the age of about 45, number bases were removed from school maths lessons in the early 80's). Base 2 only has two digits, 0 or 1. So to represent the number one we just say '1'. But to represent the number two we need to carry over to the next column. So number 2 becomes 10. Number 3 becomes 11 since we just add 1 to 2. When we hit number four we need to move to the next column again so it becomes 100. Five is 101 since we just add 1 to 4. There are loads of resources about binary (and other number bases such as octal) on the internet which explain it far better than I ever can. Google is your friend..

Back to our chip. If we want to represent the number 512, how many binary digits would we need? The simplest way is to keep taking powers of 2. 2^0 is 1, 2^1 is 2, 2^2 is 4 etc up to 2^9 is 512. So we need 9 binary digits to count to 512. These could easily be allocated to 9 pins on a chip. But the chips we have are only 16 pins. Don't we need 9 for rows and 9 for columns?

Yes. This is why I said we do things twice (not strictly true but there are two operations). Imagine a little man in the chip. You send him the row address using 9 lights. He puts a little marker next to it on his 512x512 grid and clears the lights. Then you send the column address and he puts a little marker next to that. Finally, you tell him either to place a '1' or a '0' at that row and column location.When we want to write to a memory address we send the row address to the 9 address pins on the chip. Then we send it a signal to store that. Then we send the column address and a signal to store that. Finally, we say what we want to write - either a '0' or '1' via a different pin.

To read it back we do the same but instead send a signal that we want to read rather than write to the row/column address.