Successful Firmware Design Tips
So firmware can be a tricky jump for anyone with a background in the software domain, indeed many student's I've helped have struggled with the inherent parallel execution of firmware in comparison to the sequential programming of software and indeed the sequential manner in which we write firmware as text files.
How to help get successful firmware designs? In my opinion along with understanding the theory digital systems, this is more about a rigorous methodology than hours and hours of design.
So top tips time...
Draw up your specifications...
Functionality
Clocking
Resets
Interrupts
Configuration
Fabric Side Inputs/Outputs
External Pin Inputs/Outputs
Use of any existing protocols/standards (SPI, I2C, SerDes etc)
Use of IPcores, either 3rd party or your own
Define ALL states in state machines
Define either pos-pos data clock transfer or pos-neg data clocking
Utilise Design for Test (DFT) Methodology
Utilise Design for Adaptability (DFAD) Methodologies
Consider hiarchical and modular designs
Consider your physical and timing constraints
Consider all diferent clock domains and how to cross them
If you can't draw it, then go back to your original specifications
Draw a top level block diagram of the block
Flesh out sub-blocks with logic diagrams
Initial Design
Simple functionality first, there is no need to jump right in with complex states or odd iteration modes of counters
Force a good personal coding style
Source header, revision history, to do list, simulation outcomes, synthesis outcomes, error/warning lists etc
Add any source dependencies, list data sheets, user manuals, schematics
Add any support or forum queries or external web links etc
Ensure all registers have defined initial states, preferably all zeros
Code all state machines to use Grey code or unary code for robustness
Ensure all state machines are synchronous
Functional Simulation
Can all registers be reset?
Can all registers be set?
Is base functionality met?
Correct data clock edges?
Correct state evolution?
What if block never gets an initial reset?
What if block slips into an unused state code?
What if an input is delayed by a random time period?
Co-Simulation with either other blocks or models of connected blocks
Check data transfer progression
Use data valid flags to aid handshaking
Use formal handshaking protocols if advantageous
Use known patterns such as 0xFFFF, 0xFAFF or 0xABCD to indicate in the data stream when the block is in an erroneous state or a waiting state, 0x0000 can lead to debugging difficulties
Try to use edge triggered events rather than level triggered
Try to always be synchronous to your system clock, even if this required synchronously registering an asynchronous external signal.
Design for code reuse and parameterise the code, for example make an SPI module N-bit rather than 16-bit
Utilize the generate statement and compile time flags to aid in programability
Check all declared wires/registers for connectivity
Check all registers have inputs, outputs, clock, reset and hold conditions
Remember the compiler will perform trimming if it thinks a register/wire is redundant
Check Syntax
Hunt out and remove any latches (unless explicitly coded)
Perform an initial synthesis (only progress once clean)
Remove all Errors, go back to initial design and functional simulation
Remove simple warnings first as they often create ripple warnings
Tackle compiler bit trimming as a priority
Aim for a synthesis, pre-routed frequency 20% higher than your specification, so for 100MHz operation ensure your synthesis report says something like 120 to 130MHz etc. The routed design always is slower than this as the compiler adds in routing delays, which are not accounted for in a pre-routed timing report.
Perform code to netlist translate
Perform netlist to CLB mapping
Remember that a CLB is formed by a LUT, multiplexer and flip flop and must have a clock and reset
Perform design place and route
Aid the clocking by using dedicated routing as far as possible
Aid routing by forcing the block to be close to its critical FPGA pads (e.g. an SPI module should be close to its SCLK, SDI and SDO pins)
Aid fast register-register clocking by forcing short routing delays, principally by using physical constraints
Constrain the block to an area roughly 10% larger than its required resources
Use timing constraints so the compiler knows system clock is fast and knows that a clock like an I2C clock is very slow.
If a signal must traverse a large distance then consider a) clock buffering and b) adding data registers, latency in terms of clock cycles is preferable to lengthy RCL routing delays.
Use FIFOs, Buffers, re-timing and handshaking to cross clock boundaries
Use FPGA visualizers such as PlanAhead or FPGA Editor to check placement and clock/reset or data routing
Where possible use the FPGAs internal PLLs, clock managers, clock dividers and dedicated clock routing to enable high speed clocking
Do not use clock gating unless absolutely confident on glitch free operation and correct functionality
Use state machines to force sequential operations
Use locked, valid or ready flags to indicate when a sub-block is stable and use to control state machine iteration.
If needed may need to slow the loop time of feedback loops and FSMs to ensure stability.
This is perhaps quite a brain dump of tips, but over time I will revisit this list and re-organise and formalise it as a teaching aid. Firmware can be very complex at times and particularly difficult to get your head round as a student.
Below is a rather out of date block diagram of the FLITES firmware system, the point being that personal organisation, sticking to a listing/checklist methodology and a modular design will help in the end and will produce good, reusable and configurable results.

That's all for now.
Ed