Mucking about with the PDK!

It's been a few weeks, sorry about that. I have been traveling for work and I've now finally landed in Rome for a month of exploration and work!

The coolest part about this movement of open source silicon is the fact that these fabs are realizing there are tons of people who want to build their own silicon but simply can't afford 500k plus in startup costs, so fabs like Global Foundries, and SkyWater are doing something amazing! They are taking cutting edge process nodes and just opening them up! Well the cutting edge 25 years ago and that is fine by me! For anything reasonable meaning made by one or a handful of engineers there is simply no need for 3nm, 5nm, or even 50 to 100nm! The Pentium 4, remember those days? Intel at the top of the pile not yet resting on their laurels? It was built on the 130nm process at launch! So while the open source community can't band together and build a 3nm M5 killer we can absolutely gain access to some fantastic technology that the pioneers of the silicon industry would have drooled over in 1975.

Today and over the last few nights I have been poking around at learning the PDK(Process Development Kit) and boy I have a lot to learn! So to dive in, let's start at the beginning.

What is a PDK?

A Process Development Kit is the "bible" sent down to us by the fab! It contains models, design constraints, IP(Intellectual Property, we'll get into this in a moment) and data files we need to know how to convert out RTL(Register Transfer Logic) into physical layers of poly silicon, silicon dioxide, metal, what have you in the physical chip. It's our gateway between code and the physical world!

Models

Models are the data for how a component behaves, what's the charge time of a specific capacitor? Here's a SPICE model for it! What's the limit, threshold, and switching characteristic of this transistor? Here's a SPICE model! Basically it gives us all the information we need to simulate, down to the transistor, capacitor, resister or anything else we can lithograph. We can gate by gate simulate anything we design and ideally our simulation should 1:1 match the physical silicon we receive from the fab. Pretty cool right? Imagine Faggin or Kilby being able to just chuck their whole design into a computer and know it works before baking? They would have killed for that type of clarity into the design. We shouldn't take this for granted, but we should absolutely use the tools at our disposal. Now onto design constraints.

Design Constraints

Now this is pretty self explanatory right? The fab knows if we put two transistors too close together they won't work properly. It also and equally importantly ensures we have good yields. Yield is the big word you hear in silicon manufacturing, the bean counters wake up in cold sweat over this at night. We should be worried about it too! If you design something that pushes the limits of the constraints then your super cool and affordable chip turns into a mess of failed wafers and thousands in lost product. We don't want that at all! So we follow the constraints to a tee. Silicon isn't the place where you leave a bug, patches can cost millions, you can't simply push an update through CI/CD and call it a day, so and you will hear this a lot in the coming months from me, if it fails a DRC we go back to the simulator until it doesn't!

Intellectual Property

Here is the secret sauce. Here's the stuff the TSMC will send a hitman to your apartment if you let it slip(probably not really...), in reality they will send a very well paid team of lawyers which is arguably worse. So we are targeting the SkyWater 130nm process. It's important to remember that while the geometry is the same the IP varies vastly from intel's own 130nm node. So IP what is it? IP is the nodes implementation of standard cells. You need an SRAM block? Don't draw that by hand use the cell! Need a flip flop, nand, anything else? Hand laying these things would be tedious, prone to failure, and you wouldn't even get the density and performance of the standard cell 9 times out of 10. Even worse? I't might not work at all! The cells are designed and tested on real silicon on the process they are defined for. You think you have a better SRAM block? You may in theory, but you can't know if it will work properly on the node, there are thousands of steps in the process to develop a cell, you miss one? You're done, the cell is bad, worse it may short and kill the chip before it even has a chance to live. So we use cells from the fab to ensure everything is good to go. The reason it's called IP is simple, the designs and their implementations belong to the fab and the node they were designed for. TSMC does this, Intel does this, Global Foundries, you get the idea.

Great But What Do We Do With It?

Good question, that's where the tools of the trade come in! In the industry the tools are crazy expensive, like millions per seat per year! I don't have that kind of cash lying around, so open source to the rescue! The chain goes like this, openlane is our EDA(Eletronics Design Automation) think altium for board design. Openlane calls upon the underlying tools YoSYS, openROAD, and magic to do the heavy lifting of synthesizing, routing, layout, DRC and much more. Let's look into the toolchain then we can see the output of one of our modules, the risc-4 ALU!

Yosys

YoSYS(Yosys open Synthesis Suite, gotta love recursive acronyms) is our gateway into RTL synthesis. We write our RTL in verilog, verilog is great at describing a system. We define i/o, logic, storage etc. in verilog and through "magic" that is compiled into say a binary for an FPGA, or in our case is fed into YoSYS to facilitate synthesizing the verilog into actual gate logic. This is a fun process to watch as things stream through the terminal. From there we have a synthesized RTL, what happens next? Well we have to check timing, the real world is all about timing, if we don't have our bits at the bus at the moment a system wants to read say a register? That process gets garbage, that garbage is blindly moved through to the next thing, and the next, and a cascade of garbage ruins our run! So a tool called OpenSTA takes over here.

OpenSTA

OpenSTA(Open Static Timing Analysis) goes in a checks the timing for every piece of our synthesized RTL against a desired clock speed, we define this in the config before running openlane. For risc-4 we are targeting 100Mhz or 10ns, meaning that every operation should be completed within 10ns. Cool right? We know well in advance of the heavier computation whether our RTL is good enough or not. Here is the Static Timing report for the risc-4 ALU

Corner Hold Worst Slack Hold Reg to Reg Paths Hold TNS Hold Vio Count Hold: reg to reg Setup Worst Slack Setup Reg to Reg Paths Setup TNS Setup Vio Count Setup: reg to reg Max Cap Viol Max Slew Viol
Overall4.1575N/A0.000001.3920N/A0.0000000
nom_tt_025C_1v804.4187N/A0.000003.5641N/A0.0000000
nom_ss_100C_1v604.9968N/A0.000001.4607N/A0.0000000
nom_ff_n40C_1v954.1658N/A0.000004.3799N/A0.0000000
min_tt_025C_1v804.4054N/A0.000003.5954N/A0.0000000
min_ss_100C_1v604.9711N/A0.000001.5078N/A0.0000000
min_ff_n40C_1v954.1575N/A0.000004.4060N/A0.0000000
max_tt_025C_1v804.4322N/A0.000003.5242N/A0.0000000
max_ss_100C_1v605.0131N/A0.000001.3920N/A0.0000000
max_ff_n40C_1v954.1742N/A0.000004.3508N/A0.0000000

Let's go over this. The STA runs on something called PVT(Process Voltage Temperature) so when we do our analysis we do it at different voltage and temperature constraints seeing if we fail timing and a certain voltage or temperature. Pretty cool right!

Corners. Corners are what we call the test conditions of a specific run. here is the layout

Corner Name Process Temp Voltage What it Tests (Impact)
nom_tt_025C_1v80 Typical 25°C 1.8V Baseline: Standard operating conditions. Used for general power and timing estimates.
ss_100C_1v60 Slow-Slow 100°C 1.6V Worst Case Setup (Slow): High heat and low voltage make transistors sluggish. If signals arrive too late here, the chip fails at speed.
ff_n40C_1v95 Fast-Fast -40°C 1.95V Worst Case Hold (Fast): Freezing temps and high voltage make signals race through. Used to catch "race conditions" where data changes too quickly.
tt_025C_1v80 Typical 25°C 1.8V Nominal: Represents the average chip coming off the manufacturing line.

From the table you see the various corners and what they mean, now looking back at the results from the STA we see that we have quite a bit of positive slack at our target of 100Mhz, that means we can push the chip faster in all scenarios if we wanted to! Even better our negative slack is all zeros! This is great, we don't miss timing in any situation we tested for. Our setup times are looking great as well. Setup is the amount of time it takes for data to arrive at the gate, did all of our data arrive before we hit our clock edge? If so then we have made our setup time. Hold time looks good too! Hold time ensure data stays on the gate the amount of time needed after the clock falls in order for it to pass on in the proper sequence. Cool fun fact, the 4004 relied on the inherent capacitance of PMOS in order to reach it's hold times, the original 4004 is impossible to replicate internally with any modern process node(more on that in our series on the 4004!) As you see from the report we have zero violations, zero negative slack meaning that we passed with flying colors! I can't get cocky yet though as this was the easy part, just wait until we start connecting all of these blocks together!

That is YoSYS and RTL synthesis in a nutshell, bet you forgot we have a lot more to go!

Floorplanning with OpenROAD

Now that we have or timing validated synthesized RTL(say that five times fast) we can move on to floorplans! OpenROAD is at the helm of this process(and also kind of the STA, the foundation provides OpenSTA too) we can start to think about where all of this goes! First we work out or playing field. How big is the die going to be? That is setup in our config as well, in this case the die is 15mm square, so we lay that out as our constraint, nothing can go outside of this area. We figure out where to put our external pins, typically around the perimeter, let the software determine this unless you are doing your own packaging. In this specific case the caravel wrapper does that all for us! Then it generates it's VDD and GND grid across the die. That grid needs to go in first so that the place knows how to orient and where to put our cells. And that is floor planning done, pretty automated, gone are the days of rubylith sheets and hand drawn silicon and metal.

Placement

OpenROAD again! Well tools inside of it much like OpenSTA, these ones are called RePLAce and OpenDP. This is where the big algos live, it goes through your list of cells, figures out how they connect to each other using the netlist, and then walks through their placement and orientation to come up with the shortest wire lengths possible in the system, this takes a while and I won't pretend to understand the math behind it but boy is it cool that someone does! It does this in passes, rough pass goes through and just kind of slaps them in the ideal spot, then the fine pass comes through and aligns them so gates all have power, ground, everything. I should point out I am using the term gate to refer to the entire transistor not simply the gate region. The fine pass ensure everything has room, spacing matches our design rules, and sets us up nicely for clock tree synthesis, our next step.

Clocks

What is clock tree synthesis? Well in the regular world all we need to do is feed a clean clock into a pin right? Well what happens after that pin? That signal needs to reach every single cell and every single cell needs to see that clock pulse at exactly the same time or our beautiful static timing analysis is for naught, and more importantly you would get undefined conditions and races and seemingly random moments of operation or maybe the thing doesn't work at all. So enter TritonCTS, guess where it lives? OpenROAD! So it runs over the newly placed cells and with a lot more complex math determines the need for buffers and or flip flops based on the cells location and needs to ensure everything lines up perfectly.

Routing

This is the second to last step I promise. Routing is incredibly important just like every other part of this process, all those gates and cells need to connect to each other or how do you get the value of your register to the ALU? So we have another multistep process starting with the global router, think of this as the autoroute in altium, it goes through our netlist and figures out what neighborhood each net needs to be in and if the placer did it's job well then the router will have a lovely time. Once we get the general routing done it goes over everything again, this is where we get our metal layers, vias etc. This is heavily driven by the DRC, think of this as you hand tuning a pcie trace in altium. The tools are again from our favorite project FastROAD, general routing is completed by openRoute, and detailed by TritonRoute.

Post Processing

Now we are done, not really but we are closer than ever! We just need to complete DRC, LVS, and Antenna Checks. DRC we covered before but this time we go through the mountain of design rules that the PDK gives us, this part takes a while and calling back to what I said at the beginning, if we fail here we rip it all up and try again! DRC is done with Magic, and Klayout(you use this to look at your finished chip too!) The we get LVS(Layout Vs Schematic) and what this does is a sanity check, are all of our nets connected? Did we forget something? This ensures that what is in the RTL is in the chip, done by Netgen. Finally the last step, Antenna Checks. This uses Magic to verify that none of our metal traces or layers is going to build up a charge becoming an antenna and blowing up our chip!

We're Done... For Now

Well I barely scratch the surface here, I didn't even go into the ALU that much, we'll save that for another post, today we went over the toolchain for the open source silicon movement as it exists right now, things are changing rapidly and that is a good thing but it also means any of this is subject to change. Hopefully you now know how we go from verilog to files ready to give to a fab! There is a lot of fantastic documentation out there on the web for these tools and as this series progresses we will continue to go deeper into each step of the process as it pertains to our project! Please share this with the nerd in your life and follow along as we work towards holding a piece of silicon in our hands!

Subscribe to King Applied Research

Sign up now to get access to the library of members-only issues.
Jamie Larson
Subscribe