Computers are binary things, right?  A bit is either "1" or "0".

Except when it isn't.

Computers are built from logic gates, which are built up from silicon, which is based in physics.  In physics, things very rarely are clear cut, and silicon gates are no exception.  Signals take time to transition between "0" and "1", which is why we have clocks - they guarantee that when the clock signal is a "1", all other signals are valid.

If you have two clocks, they might not be perfectly in sync.  One clock may be moving between a "1" and a "0" while the other clock has indicated that everything is stable.  This can happen even when the clocks are supposedly synchronized, such as when one clock is derived from another.

This is the exact problem that I ran into on the Fomu USB stack.  In Valentyusb, the USB stack for Fomu, we have two clock domains: usb_48 and usb_12.  We split it this way because the ICE40UP5k FPGA is too slow to put everything in the 48 MHz domain, so we put timing-critical stuff there and leave the complex CPU in the 12 MHz domain.  This means that USB communication requires two clock crossings: Once for USB receive, and once again for transmission.

In Fomu, usb_12 is derived from usb_48, so I thought that I wouldn't need to worry about clock domain crossing issues, particularly when crossing from the slower domain into the faster one.  This turned out to be an invalid assumption, and we ran into subtle issues.

With clock issues, all bets are off. It's like in C programming when we talk about "undefined behavior".  Except things get even weirder.

The version of the Fomu bootloader that I synthesized worked very well on my development machine.  It even worked reasonably well on the factory test jig.  And it worked spectacularly well on the sample Fomu that I had.

Unfortunately, that's where things stopped being so nice.

About 80% of the boards that I tested worked acceptably well with the test jig, but didn't work in my home server.  70% worked in the server but not in my desktop. 10% worked in every device I tested, and some didn't work at all.

To solve this, I started randomly generating bitstreams.  70% of the time they would work, but about 30% of the time they failed to enumrate.  The solver had come up with a solution that exacerbated the timing issue, resulting in a board that technically met timing but exhibited issues.

Once I had a "reproducible" test case, I started looking into potential problems.  I noticed that there wasn't any interface between usb_12 and usb_48:

 self.comb += [
    nrzi.i_valid.eq(self.i_bit_strobe),
    nrzi.i_data.eq(fit_dat),
    nrzi.i_oe.eq(fit_oe),
]

The fix was very simple - insert a multi-stage register between the two:

# Cross the data from the 12MHz domain to the 48MHz domain
nrzi_dat = Signal()
nrzi_oe = Signal()
cdc_dat = cdc.MultiReg(fit_dat, nrzi_dat, odomain="usb_48", n=3)
cdc_oe  = cdc.MultiReg(fit_oe, nrzi_oe, odomain="usb_48", n=3)
self.specials += [cdc_dat, cdc_oe]
self.comb += [
    nrzi.i_data.eq(nrzi_dat),
    nrzi.i_oe.eq(nrzi_oe),
]

With this change in place, I let the synthesizer do its thing again. It generated over 200 bitstreams, none of which had any reliability problems.

I re-generated the Fomu bootloader, loaded it onto a USB drive, and had the factory install Foboot v1.8.7 on all of the boards.

Timing issues are scary, particularly because they only shows up sometimes, and can vary between different compilation runs of your Verilog code. I'm now confident that I've solved the timing problems with the Fomu USB stack, so now I eagerly await others getting their production Fomu boards, and seeing what people do with them!