本帖最后由 kami1217 于 2015-3-1 18:33 编辑
这帖子全手打的,肯定有错!
下图是某大学的地上用碎石铺的一个晶体管符号,作为这个帖子的开始。
CPU大家都知道就是个极其复杂的IC,里面有超多超多的transistors(晶体管,不知道有没有人玩过一个游戏也叫晶体管,挺有意思的),拿我的i7-3930k(几年的古董了) 来说吧,它就有22.7亿个晶体管。晶体管吧,常用的有两种,一种BJT(bipolar junction transistor),这个base电流不为0所以会浪费能源。另一种叫MOSFET(俗称mos),这个好,gate电流为0,所以目前市场90%以上的transistor都是用mos。
下面就是一个NPN(BJT),做个speaker的amplifier还行,但绝对不会出现在CPU里面,即便base电流很小,如果乘以个1亿,P=VI是谁说的呢?
=> =>
MOSFET一般分两种,一种Nmos,一种Pmos。把元素周期表拿出来,硅前面那一串元素做的就是p-type(需要一个电子才成为半导体,一般用Boron硼,果然是B都有hole吗?),反之后面那一串做的就是n-type(丢一个电子成为半导体,一般用phosphorus磷,P长得就像手枪)。
当我们稍稍给它两一点压力(potential difference),N就开始激动了,想把一个神奇的物体射到P的小穴里面,不射会非常不爽,我们称这个现象叫做x where x = pn-junction or 爱。根据不同的材料制造的这种动作可能会有不同的效果,比如把电压控制到0.7的普通diode,or由于剧烈到触发强大高能,估计是太爽了,结果造出了可以发光的小孩叫photon, diode也升级成为-LED(light emitting diode)。哦,对了,还记得爱因斯坦的光电效应么,同理。另外,几个日本大叔发现了Blue LEDs (InGaN LEDs),因此获得了2014诺贝额物理学奖,因为该发现可以将制造白光所需的能源减少最多至90%。日本大叔就是厉害啊,真正改变了世界。
回到CPU
当一个PMOS和一个NMOS组成一起的时候就会形成一个CMOS,那这个CMOS用来干啥呢? 答案是用来当开关的,N开,P就关;P开,N就关。
CMOS切面图——
对,CPU用的就它了,虽然非常小,而且很多层,但用显微镜看就是这个样子滴。
比如这个:
就是常见的AND-gate,两个NMOS串联,两个PMOS并联,所以output是1的情况为:A和B都得是1。
好了,我们为什么要CPU呢?
因为我们需要它来做很多简单重复的逻辑工作。
为什么要集成上亿的晶体管在里面呢?
因为我们需要更好的performance,晶体管越多,可实现的逻辑门越多,可用的function/clock就能越复杂。
为什么大家喜欢超频呢?
因为当你不能改变CPU结构的情况下,你只能增加频率来提升速度(T=1/f),同样1分钟,你可以跑更多的clock cycle。
为什么台积电,inte天天在炫耀他们工艺技术多先进? 又是几纳米了,又是合格率了...
其实就是把gate-Length变小,这样W/L ratio就会变大,增加performance。
附两个公式一个图:
线性区:
饱和区:
scaling的好处是什么呢?
一个wafer就那么大,die越小,变相缩小成本;如果die不变,变相提升performance因为可集成的transistor又多了;还有一个好处就是节约能源了,原因有点复杂还是和W/L有关。
可以一直变小么?
难,相当难,所以多核心出现了!即便如此,摩尔定律的slope也趋缓了,有垂死挣扎的感觉。
开始正题吧在之前给大家看看CPU是什么做的:
对,你没看错,就是含高硅的黄沙做的,所以几千几千的CPU的主材料就是这个玩意儿。
然后加工提纯后,变成这个,感觉有点像那个啥。
不说了,打字太累...直接上图:
这个是初步的设计图,没想太多,主要是抛砖引玉,以后可以加更多的block
这个CPU是16bits的,所有bus均为16bits,一共8个储存register(注册器),两个临时注册器,一个指令注册器,一个multiplexer(选择器),一个主控说白了就是个FSM(有限状态机), 之后加了一个counter,数指令clock的,比如movi为一个clock。
Specification(直接从我的报告复制的)
• Register – This system has three inputs, two 1-bit signals, Clock and Enable, and one 16-bit signal, D. Also, this system has one 16-bit output signal, Q.
• Instruction Register – This system has three inputs, two 1-bit signals, Clock and Enable, and one 16-bit signal, Din. Also, this system has one 9-bit output signal, cmd.
• Multiplexer – This system has eleven input signals, one 4-bit Sel signal and ten 16-bit signals, Reg0, Reg1, Reg2, Reg3, Reg4, Reg5, Reg6, Reg7, Din, and AddSub. This system also has one 16-bit output signal, Bus.
• Adder/Subtracter – This system has three input signals, one 1-bit signal Sign, and two 16-bit signals, Rx and Ry. This system has one 16-bit output signal, Output.
• Control Unit – This system has four input signals, three 1-bit signals, Run, Reset, and Clock, and one 9-bit signal IRin. Also, this system has thirteen output signals, one 4-bit Mux signal, and twelve 1-bit signals, Reg0, Reg1, Reg2, Reg3, Reg4, Reg5, Reg6, Reg7, RegA, Done, IRen, and AddSub.
• CPU – This system has four input signals, three 1-bit signals, Run, Reset, and Clock, and one 16-bit signal, Din. Also, this system has four output signals, one 1-bit signal, Done, and three 16-bit signals, BusO, R0out, and R1out.
对了还有8个七划灯(seven-segment-display,就是你电梯上看到的那个数楼层的),作为每个register的显示,貌似忘写了。
模拟测试:
储存注册器——
指令注册器——
对了这个忘说了,指令为9位,IIIXXXYYY, III为指令代号,XXX为Rx的代号,YYY为Ry的代号。
目前的指令就四个,mov;movi,add,sub。代号分别为001,010,011,100.由于代号为三位,所以opcode最多也就8个,加上000特殊,所以就7个。
给大家出个quiz吧:大家知道MIPS的opcode为多少位么?x86的呢?
选择器——
加法减法识别——
mv/mvi——
add/sub——
一切没问题之后就可以整理组合了。用的是Quartus II 11做的图,DE2的板作为测试。
图1:
图2:
这是最终测试结果,貌似还好...
最后把vhdl的code贴上来,感觉可能有点长,哎~
Reg_16.vhd
Library ieee;
Use ieee.std_logic_1164.all;
Entity Reg16 is
Port(
Clock : in std_logic;
Enable : in std_logic := '1';
D : in std_logic_vector(15 downto 0) := "0000010000000000";
Q : out std_logic_vector(15 downto 0) := "0000000000000000"
);
End Reg16;
Architecture struct of Reg16 is
Begin
Process (Clock)
Begin
if(rising_edge(Clock) and Enable = '1') then
Q <= D;
end if;
End Process;
End struct; 复制代码
IReg.vhd
Library ieee;
Use ieee.std_logic_1164.all;
Entity IReg is
Port(
Clock : in std_logic;
Enable : in std_logic := '1';
Din : in std_logic_vector(15 downto 0) := "0000000000000000";
cmd : out std_logic_vector(8 downto 0) := "000000000"
);
End IReg;
Architecture struct of IReg is
Begin
Process (Clock, Enable, Din)
Begin
if(rising_edge(Clock) and Enable = '1') then
cmd <= Din(15 downto 7);
end if;
End Process;
End struct;
复制代码
Multiplexer.vhd
Library ieee;
Use ieee.std_logic_1164.all;
Entity Multiplexer is
Port(
Sel : in std_logic_vector(3 downto 0);
Reg0 : in std_logic_vector(15 downto 0);
Reg1 : in std_logic_vector(15 downto 0);
Reg2 : in std_logic_vector(15 downto 0);
Reg3 : in std_logic_vector(15 downto 0);
Reg4 : in std_logic_vector(15 downto 0);
Reg5 : in std_logic_vector(15 downto 0);
Reg6 : in std_logic_vector(15 downto 0);
Reg7 : in std_logic_vector(15 downto 0);
Din : in std_logic_vector(15 downto 0);
AddSub : in std_logic_vector(15 downto 0);
BusO : out std_logic_vector(15 downto 0)
);
End Multiplexer;
Architecture struct of Multiplexer is
Begin
Process (Sel, Reg0, Reg1, Reg2, Reg3, Reg4, Reg5, Reg6, Reg7, Din, AddSub)
Begin
if(Sel = "0000") then
BusO <= Reg0;
elsif(Sel = "0001") then
BusO <= Reg1;
elsif(Sel = "0010") then
BusO <= Reg2;
elsif(Sel = "0011") then
BusO <= Reg3;
elsif(Sel = "0100") then
BusO <= Reg4;
elsif(Sel = "0101") then
BusO <= Reg5;
elsif(Sel = "0110") then
BusO <= Reg6;
elsif(Sel = "0111") then
BusO <= Reg7;
elsif(Sel = "1000") then
BusO <= Din;
elsif(Sel = "1001") then
BusO <= AddSub;
end if;
End Process;
End struct;
复制代码
Adder.vhd
Library ieee;
Use ieee.std_logic_1164.all;
Use ieee.numeric_std.all;
Entity Adder is
Port(
sign : in std_logic := '0';
Rx : in std_logic_vector(15 downto 0) := "0000000000000000";
Ry : in std_logic_vector(15 downto 0) := "0000000000000000";
Output : out std_logic_vector(15 downto 0) := "0000000000000000"
);
End Adder;
Architecture struct of Adder is
Begin
Process (sign, Rx, Ry)
Begin
if(sign = '0') then
Output <= std_logic_vector(unsigned(Rx) + unsigned(Ry));
else
Output <= std_logic_vector(unsigned(Rx) - unsigned(Ry));
end if;
End Process;
End struct; 复制代码
Control_Unit.vhd (这个应该算是最关键的一个)
Library ieee;
Use ieee.std_logic_1164.all;
Use ieee.numeric_std.all;
Entity Control_Unit is
Port (
Run : in std_logic := '1';
Reset: in std_logic := '0';
Clock: in std_logic;
IRin : in std_logic_vector(8 downto 0) := "000000000";
Done : out std_logic := '0';
Clear: out std_logic := '0';
IRen : out std_logic := '1';
Mux : out std_logic_vector(3 downto 0) := "0000";
Reg0 : out std_logic := '0';
Reg1 : out std_logic := '0';
Reg2 : out std_logic := '0';
Reg3 : out std_logic := '0';
Reg4 : out std_logic := '0';
Reg5 : out std_logic := '0';
Reg6 : out std_logic := '0';
Reg7 : out std_logic := '0';
RegA : out std_logic := '0';
RegG : out std_logic := '0';
AddSub: out std_logic := '0'
);
End Control_Unit;
Architecture struct of Control_Unit is
Type state_type is (decode, mv, add1, add2);
Signal currS : state_type;
Signal nextS : state_type;
Signal toog : std_logic := '0';
Begin
Process (Clock, Reset)
Begin
if(Reset = '1') then
currS <= decode;
elsif(rising_edge(Clock)) then
currS <= nextS;
toog <= not toog;
end if;
End Process;
-- mv has one next state, so it has two clock cycles;
-- mvi has no next state, so it has one clock cycle;
-- add or sub has two next state, so it has three clock cycles;
Process (toog)
Begin
if(Clock = '1') then
Case currS is
when decode =>
Reg0 <= '0';
Reg1 <= '0';
Reg2 <= '0';
Reg3 <= '0';
Reg4 <= '0';
Reg5 <= '0';
Reg6 <= '0';
Reg7 <= '0';
if(IRin(8 downto 6) = "001") then -- mv Rx, Ry
Mux <= "0" & IRin(2 downto 0); -- Ry
Case IRin(5 downto 3) is
when "000" => Reg0 <= '1';
when "001" => Reg1 <= '1';
when "010" => Reg2 <= '1';
when "011" => Reg3 <= '1';
when "100" => Reg4 <= '1';
when "101" => Reg5 <= '1';
when "110" => Reg6 <= '1';
when "111" => Reg7 <= '1';
when others => null;
End Case;
nextS <= mv;
Done <= '0';
Clear <= '0'; -- Keep counting
IRen <= '0'; -- Hold the current command
elsif(IRin(8 downto 6) = "010") then -- mvi Rx, #D
Mux <= "1000"; -- Din
Case IRin(5 downto 3) is
when "000" => Reg0 <= '1';
when "001" => Reg1 <= '1';
when "010" => Reg2 <= '1';
when "011" => Reg3 <= '1';
when "100" => Reg4 <= '1';
when "101" => Reg5 <= '1';
when "110" => Reg6 <= '1';
when "111" => Reg7 <= '1';
when others => null;
End Case;
nextS <= mv;
Done <= '0';
Clear <= '0'; -- Keep counting
IRen <= '0'; -- Hold the current command
elsif(IRin(8 downto 6) = "011" or IRin(8 downto 6)="100") then -- add or sub Rx, Ry
Mux <= "0" & IRin(5 downto 3); -- Rx
RegA <= '1'; -- Store Rx in A on next cycle
RegG <= '1';
nextS <= add1;
Done <= '0';
Clear <= '0'; -- Keep counting
IRen <= '0'; -- Hold the current command
end if;
when mv =>
nextS <= decode;
Reg0 <= '0';
Reg1 <= '0';
Reg2 <= '0';
Reg3 <= '0';
Reg4 <= '0';
Reg5 <= '0';
Reg6 <= '0';
Reg7 <= '0';
IRen <= '1'; -- Get new Command next clock cyle
Done <= '1';
Clear <= '1'; -- Clear the Counter
when add1 =>
RegA <= '0'; -- Disable A from writing
Mux <= "0" & IRin(2 downto 0); -- Put Ry on the bus next cycle
AddSub <= IRin(8); -- Determin add or sub
nextS <= add2;
when add2 =>
Case IRin(5 downto 3) is
when "000" => Reg0 <= '1';
when "001" => Reg1 <= '1';
when "010" => Reg2 <= '1';
when "011" => Reg3 <= '1';
when "100" => Reg4 <= '1';
when "101" => Reg5 <= '1';
when "110" => Reg6 <= '1';
when "111" => Reg7 <= '1';
when others => null;
End Case;
RegG <= '0';
Mux <= "1001"; -- Put Result from Adder on the BUS
nextS <= decode; -- Reset all reg
Clear <= '1'; -- Clear the Counter
Done <= '1';
IRen <= '1';
when others => null;
End Case;
end if;
End Process;
End struct;
复制代码
dec_7seg.vhd
LIBRARY IEEE;
USE IEEE.STD_LOGIC_1164.all;
USE IEEE.STD_LOGIC_ARITH.all;
USE IEEE.STD_LOGIC_UNSIGNED.all;
-- Hexadecimal to 7 Segment Decoder for LED Display
ENTITY dec_7seg IS
PORT( hex_digit_16 : IN STD_LOGIC_VECTOR(15 DOWNTO 0);
segment_a, segment_b, segment_c, segment_d, segment_e, segment_f,
segment_g : OUT std_logic);
END dec_7seg;
ARCHITECTURE a OF dec_7seg IS
SIGNAL segment_data : STD_LOGIC_VECTOR(6 DOWNTO 0);
SIGNAL hex_digit_4 : STD_LOGIC_VECTOR(3 DOWNTO 0);
BEGIN
PROCESS (Hex_digit_16, Hex_digit_4)
-- HEX to 7 Segment Decoder for LED Display
BEGIN -- Hex-digit is the four bit binary value to display in hexadecimal
Hex_digit_4 <= Hex_digit_16(3 DOWNTO 0);
CASE Hex_digit_4 IS
WHEN x"0" =>
segment_data <= "1111110";
WHEN x"1" =>
segment_data <= "0110000";
WHEN x"2" =>
segment_data <= "1101101";
WHEN x"3" =>
segment_data <= "1111001";
WHEN x"4" =>
segment_data <= "0110011";
WHEN x"5" =>
segment_data <= "1011011";
WHEN x"6" =>
segment_data <= "1011111";
WHEN x"7" =>
segment_data <= "1110000";
WHEN x"8" =>
segment_data <= "1111111";
WHEN x"9" =>
segment_data <= "1111011";
WHEN x"A" =>
segment_data <= "1110111";
WHEN x"B" =>
segment_data <= "0011111";
WHEN x"C" =>
segment_data <= "1001110";
WHEN x"D" =>
segment_data <= "0111101";
WHEN x"E" =>
segment_data <= "1001111";
WHEN x"F" =>
segment_data <= "1000111";
WHEN OTHERS =>
segment_data <= "0111110"; -- if u get something alse, this is going to show "U"
END CASE;
END PROCESS;
-- extract segment data bits and invert
-- LED driver circuit is inverted
segment_a <= NOT segment_data(6);
segment_b <= NOT segment_data(5);
segment_c <= NOT segment_data(4);
segment_d <= NOT segment_data(3);
segment_e <= NOT segment_data(2);
segment_f <= NOT segment_data(1);
segment_g <= NOT segment_data(0);
END a; 复制代码
最后一个,总体测试
Test_CPU.vhd
Library ieee;
Use ieee.std_logic_1164.all;
Entity Test_CPU is
End Test_CPU;
Architecture struct of Test_CPU is
Component CPU is
Port(
Run : in std_logic := '1';
Reset : in std_logic := '1';
Clock : in std_logic;
Din : in std_logic_vector(15 downto 0);
BusO : out std_logic_vector(15 downto 0);
Done : out std_logic;
R0out : out std_logic_vector(15 downto 0);
R1out : out std_logic_vector(15 downto 0)
);
End Component;
Signal Clk : std_logic := '0';
Signal Din : std_logic_vector(15 downto 0);
Signal MSB : std_logic_vector(15 downto 0);
Signal Done: std_logic;
Signal r0o : std_logic_vector(15 downto 0);
Signal r1o : std_logic_vector(15 downto 0);
Begin
Sys: CPU Port Map(
Run => '1',
Reset => '1',
Clock => Clk,
Din => Din,
BusO => MSB,
Done => Done,
R0out => r0o,
R1out => r1o
);
Process
Begin
wait for 50 ns;
Din <= "0100000000000000"; -- movi R0, 128
wait for 5 ns;
Clk <= not Clk; -- High
wait for 5 ns;
Clk <= not Clk; --Low
wait for 50 ns;
Din <= "0000000010000000"; -- 128
wait for 5 ns;
Clk <= not Clk; -- High
wait for 5 ns;
Clk <= not Clk; --Low
wait for 50 ns;
Din <= "0010010000000000"; -- mov R1 , R0
wait for 5 ns;
wait for 5 ns;
Clk <= not Clk; --Low
wait for 5 ns;
Clk <= not Clk; --High
wait for 5 ns;
Clk <= not Clk; --Low
wait for 50 ns;
Din <= "0110000010000000"; -- add R0 , R1
wait for 5 ns;
Clk <= not Clk; --High
wait for 5 ns;
Clk <= not Clk; --Low
wait for 5 ns;
Clk <= not Clk; --High
wait for 5 ns;
Clk <= not Clk; --Low
wait for 5 ns;
Clk <= not Clk; --High
wait for 5 ns;
Clk <= not Clk; --Low
wait for 50 ns;
Din <= "0110000010000000"; -- add R0 , R1
wait for 5 ns;
Clk <= not Clk; --High
wait for 5 ns;
Clk <= not Clk; --Low
wait for 5 ns;
Clk <= not Clk; --High
wait for 5 ns;
Clk <= not Clk; --Low
wait for 5 ns;
Clk <= not Clk; --High
wait for 5 ns;
Clk <= not Clk; --Low
wait for 50 ns;
Din <= "1000000010000000"; -- sub R0 , R1
wait for 5 ns;
Clk <= not Clk; -- High
wait for 5 ns;
Clk <= not Clk; --Low
wait for 5 ns;
Clk <= not Clk; --High
wait for 5 ns;
Clk <= not Clk; --Low
wait for 5 ns;
Clk <= not Clk; --High
wait for 5 ns;
Clk <= not Clk; --Low
End Process;
End struct;
复制代码
既然讲到了几位日本大叔,那就得把图补上来,就是那三位,相当NB。
还有改变世界的蓝光LED。