简体   繁体   中英

Primary chart of opcodes to bytes

I'm trying to take this simple hello world assembly program and generate it from scratch to machine language (for the ultimate purpose of making an assembler, but for now I just want to understand the hex values necessary for this particular program only):

section .text
    global _start //I'm guessing not needed for the machine language, only assembler or linker

_start: ;label, I don't know if this is somehow decoded to something in machine language or not
    mov edx,myLength ;"DX" register: the data register, apparently moving a new empty variable "myLength" into t, since it wasn't defined before
    mov ecx,coby ;move a new variable "coby" into the CX or "count register", not sure why this register was used for holding text, as it's usually used as a decrementer, but ok
    mov ebx,1 "BX" is the base register, used for indexing things, like system commands I'm guessing, although I don't know what else. In this case though "1" corresponds to the stdout command
    mov eax,4 ;"AX" the primary register, used to execute a certain command, in this case "4" or "sys_write", which I'm guessing simply is the equivalent of writing to the terminal, somehow, but I don't know what exactly it's writing
    int 0x80 ;interrupt at "20" which means call whatever is stored in eax, in this case sys_write 

    mov eax,1 ;"1" is exit, moving to primary register to perform next
    int 0x80 ;executes the command stored in the primary register "AX", in this case, exit

segment .data ;pretty sure this stuff is just for the assembler or linker, but I could be wrong
coby db 'Yo, world just testing!?!!! 123, BTW', 0xa ;somehow sets the value of the message variable "coby", although I don't know what "db" and 0xa (10) are for
myLength equ $ - coby ;somehow sets the value of myLength, I know in general "-" represents an input, I'm guessing "$" is the length of, but I have absolutely no idea how to translate that to machine langauge, or if the assembler automatically reads the length and hard-codes it in instead? 

So I know the general idea is machine language is just the parts mentioned in "_start" in input/output operations. It is also used with AX register along with DX for multiply and divide operations involving large values. but instead of writing it out with text, each of the commands is represented by numbers (or binary etc.).

So for example "mov" would be one number, "edx", the data register, would be represented by another register, and "myLength" would be represented by another. Also the command "int 0x80" -- interrupt at "128", which means (I think) call whatever command is stored in the primary register, namely, AX, would correspond to some number representing the command "int", plus 0x80 (128), which I'm guessing would just be 128 in binary.

I'm also guessing that all of the extra stuff, like section.text, segment.data, is only for the assembler to find and replace the variables used from above, but I'm guessing the machine language itself doesn't keep any kind of variables or constants, although correct me if I'm wrong.

So theoretically in machine language, I wouldn't need to define a variable to "coby" (the message variable) in order to insert it into the Counter register (CX); rather I would just insert the raw string, or in this case, not the string, but the char-code characters, directly after whatever number represents the eCx (although I'm also not sure why the counter register was used in this example , I guess its not only used to stored things which decrement?)

Anyway, if my assumptions are correct, then I need some kind of chart that tells me what numbers exactly correspond to what opcodes (like mov and int in this case), and what registers (like edx, eax, etc), and I also need to know what order, exactly, are these byte commands stored in the binary file. I tried used NASM (with the command nasm -f elf yo.asm && ld -m elf_i386 -s -o yoman yo.o ) and viewing the contents in ANSI, here's what I got when printed from JavaScript console to an HTML page, with each number laabeled on its own line, together with its binary value, char code, and actual unicode text (which isn't always visible from the HTML):

000: 01111111, 127: 
001: 01000101, 069: E
002: 01001100, 076: L
003: 01000110, 070: F
004: 00000001, 001: 
005: 00000001, 001: 
006: 00000001, 001: 
007: 00000000, 000:
008: 00000000, 000:
009: 00000000, 000:
010: 00000000, 000:
011: 00000000, 000:
012: 00000000, 000:
013: 00000000, 000:
014: 00000000, 000:
015: 00000000, 000:
016: 00000010, 002: 
017: 00000000, 000:
018: 00000011, 003: 
019: 00000000, 000:
020: 00000001, 001: 
021: 00000000, 000:
022: 00000000, 000:
023: 00000000, 000:
024: 10000000, 128: €
025: 10000000, 128: €
026: 00000100, 004: 
027: 00001000, 008: 
028: 00110100, 052: 4
029: 00000000, 000:
030: 00000000, 000:
031: 00000000, 000:
032: 11011100, 220: Ü
033: 00000000, 000:
034: 00000000, 000:
035: 00000000, 000:
036: 00000000, 000:
037: 00000000, 000:
038: 00000000, 000:
039: 00000000, 000:
040: 00110100, 052: 4
041: 00000000, 000:
042: 00100000, 032:
043: 00000000, 000:
044: 00000010, 002: 
045: 00000000, 000:
046: 00101000, 040: (
047: 00000000, 000:
048: 00000100, 004: 
049: 00000000, 000:
050: 00000011, 003: 
051: 00000000, 000:
052: 00000001, 001: 
053: 00000000, 000:
054: 00000000, 000:
055: 00000000, 000:
056: 00000000, 000:
057: 00000000, 000:
058: 00000000, 000:
059: 00000000, 000:
060: 00000000, 000:
061: 10000000, 128: €
062: 00000100, 004: 
063: 00001000, 008: 
064: 00000000, 000:
065: 10000000, 128: €
066: 00000100, 004: 
067: 00001000, 008: 
068: 10011101, 157: 
069: 00000000, 000:
070: 00000000, 000:
071: 00000000, 000:
072: 10011101, 157: 
073: 00000000, 000:
074: 00000000, 000:
075: 00000000, 000:
076: 00000101, 005: 
077: 00000000, 000:
078: 00000000, 000:
079: 00000000, 000:
080: 00000000, 000:
081: 00010000, 016: 
082: 00000000, 000:
083: 00000000, 000:
084: 00000001, 001: 
085: 00000000, 000:
086: 00000000, 000:
087: 00000000, 000:
088: 10100000, 160:  
089: 00000000, 000:
090: 00000000, 000:
091: 00000000, 000:
092: 10100000, 160:  
093: 10010000, 144: 
094: 00000100, 004: 
095: 00001000, 008: 
096: 10100000, 160:  
097: 10010000, 144: 
098: 00000100, 004: 
099: 00001000, 008: 
100: 00100101, 037: %
101: 00000000, 000:
102: 00000000, 000:
103: 00000000, 000:
104: 00100101, 037: %
105: 00000000, 000:
106: 00000000, 000:
107: 00000000, 000:
108: 00000110, 006: 
109: 00000000, 000:
110: 00000000, 000:
111: 00000000, 000:
112: 00000000, 000:
113: 00010000, 016: 
114: 00000000, 000:
115: 00000000, 000:
116: 00000000, 000:
117: 00000000, 000:
118: 00000000, 000:
119: 00000000, 000:
120: 00000000, 000:
121: 00000000, 000:
122: 00000000, 000:
123: 00000000, 000:
124: 00000000, 000:
125: 00000000, 000:
126: 00000000, 000:
127: 00000000, 000:
128: 10111010, 186: º
129: 00100101, 037: %
130: 00000000, 000:
131: 00000000, 000:
132: 00000000, 000:
133: 10111001, 185: ¹
134: 10100000, 160:  
135: 10010000, 144: 
136: 00000100, 004: 
137: 00001000, 008: 
138: 10111011, 187: »
139: 00000001, 001: 
140: 00000000, 000:
141: 00000000, 000:
142: 00000000, 000:
143: 10111000, 184: ¸
144: 00000100, 004: 
145: 00000000, 000:
146: 00000000, 000:
147: 00000000, 000:
148: 11001101, 205: Í
149: 10000000, 128: €
150: 10111000, 184: ¸
151: 00000001, 001: 
152: 00000000, 000:
153: 00000000, 000:
154: 00000000, 000:
155: 11001101, 205: Í
156: 10000000, 128: €
157: 00000000, 000:
158: 00000000, 000:
159: 00000000, 000:
160: 01011001, 089: Y
161: 01101111, 111: o
162: 00101100, 044: ,
163: 00100000, 032:
164: 01110111, 119: w
165: 01101111, 111: o
166: 01110010, 114: r
167: 01101100, 108: l
168: 01100100, 100: d
169: 00100000, 032:
170: 01101010, 106: j
171: 01110101, 117: u
172: 01110011, 115: s
173: 01110100, 116: t
174: 00100000, 032:
175: 01110100, 116: t
176: 01100101, 101: e
177: 01110011, 115: s
178: 01110100, 116: t
179: 01101001, 105: i
180: 01101110, 110: n
181: 01100111, 103: g
182: 00100001, 033: !
183: 00111111, 063: ?
184: 00100001, 033: !
185: 00100001, 033: !
186: 00100001, 033: !
187: 00100000, 032:
188: 00110001, 049: 1
189: 00110010, 050: 2
190: 00110011, 051: 3
191: 00101100, 044: ,
192: 00100000, 032:
193: 01000010, 066: B
194: 01010100, 084: T
195: 01010111, 087: W
196: 00001010, 010:

197: 00000000, 000:
198: 00101110, 046: .
199: 01110011, 115: s
200: 01101000, 104: h
201: 01110011, 115: s
202: 01110100, 116: t
203: 01110010, 114: r
204: 01110100, 116: t
205: 01100001, 097: a
206: 01100010, 098: b
207: 00000000, 000:
208: 00101110, 046: .
209: 01110100, 116: t
210: 01100101, 101: e
211: 01111000, 120: x
212: 01110100, 116: t
213: 00000000, 000:
214: 00101110, 046: .
215: 01100100, 100: d
216: 01100001, 097: a
217: 01110100, 116: t
218: 01100001, 097: a
219: 00000000, 000:
220: 00000000, 000:
221: 00000000, 000:
222: 00000000, 000:
223: 00000000, 000:
224: 00000000, 000:
225: 00000000, 000:
226: 00000000, 000:
227: 00000000, 000:
228: 00000000, 000:
229: 00000000, 000:
230: 00000000, 000:
231: 00000000, 000:
232: 00000000, 000:
233: 00000000, 000:
234: 00000000, 000:
235: 00000000, 000:
236: 00000000, 000:
237: 00000000, 000:
238: 00000000, 000:
239: 00000000, 000:
240: 00000000, 000:
241: 00000000, 000:
242: 00000000, 000:
243: 00000000, 000:
244: 00000000, 000:
245: 00000000, 000:
246: 00000000, 000:
247: 00000000, 000:
248: 00000000, 000:
249: 00000000, 000:
250: 00000000, 000:
251: 00000000, 000:
252: 00000000, 000:
253: 00000000, 000:
254: 00000000, 000:
255: 00000000, 000:
256: 00000000, 000:
257: 00000000, 000:
258: 00000000, 000:
259: 00000000, 000:
260: 00001011, 011: 
261: 00000000, 000:
262: 00000000, 000:
263: 00000000, 000:
264: 00000001, 001: 
265: 00000000, 000:
266: 00000000, 000:
267: 00000000, 000:
268: 00000110, 006: 
269: 00000000, 000:
270: 00000000, 000:
271: 00000000, 000:
272: 10000000, 128: €
273: 10000000, 128: €
274: 00000100, 004: 
275: 00001000, 008: 
276: 10000000, 128: €
277: 00000000, 000:
278: 00000000, 000:
279: 00000000, 000:
280: 00011101, 029: 
281: 00000000, 000:
282: 00000000, 000:
283: 00000000, 000:
284: 00000000, 000:
285: 00000000, 000:
286: 00000000, 000:
287: 00000000, 000:
288: 00000000, 000:
289: 00000000, 000:
290: 00000000, 000:
291: 00000000, 000:
292: 00010000, 016: 
293: 00000000, 000:
294: 00000000, 000:
295: 00000000, 000:
296: 00000000, 000:
297: 00000000, 000:
298: 00000000, 000:
299: 00000000, 000:
300: 00010001, 017: 
301: 00000000, 000:
302: 00000000, 000:
303: 00000000, 000:
304: 00000001, 001: 
305: 00000000, 000:
306: 00000000, 000:
307: 00000000, 000:
308: 00000011, 003: 
309: 00000000, 000:
310: 00000000, 000:
311: 00000000, 000:
312: 10100000, 160:  
313: 10010000, 144: 
314: 00000100, 004: 
315: 00001000, 008: 
316: 10100000, 160:  
317: 00000000, 000:
318: 00000000, 000:
319: 00000000, 000:
320: 00100101, 037: %
321: 00000000, 000:
322: 00000000, 000:
323: 00000000, 000:
324: 00000000, 000:
325: 00000000, 000:
326: 00000000, 000:
327: 00000000, 000:
328: 00000000, 000:
329: 00000000, 000:
330: 00000000, 000:
331: 00000000, 000:
332: 00000100, 004: 
333: 00000000, 000:
334: 00000000, 000:
335: 00000000, 000:
336: 00000000, 000:
337: 00000000, 000:
338: 00000000, 000:
339: 00000000, 000:
340: 00000001, 001: 
341: 00000000, 000:
342: 00000000, 000:
343: 00000000, 000:
344: 00000011, 003: 
345: 00000000, 000:
346: 00000000, 000:
347: 00000000, 000:
348: 00000000, 000:
349: 00000000, 000:
350: 00000000, 000:
351: 00000000, 000:
352: 00000000, 000:
353: 00000000, 000:
354: 00000000, 000:
355: 00000000, 000:
356: 11000101, 197: Å
357: 00000000, 000:
358: 00000000, 000:
359: 00000000, 000:
360: 00010111, 023: 
361: 00000000, 000:
362: 00000000, 000:
363: 00000000, 000:
364: 00000000, 000:
365: 00000000, 000:

366: 00000000, 000: 367: 00000000, 000: 368: 00000000, 000: 369: 00000000, 000: 370: 00000000, 000: 371: 00000000, 000: 372: 00000001, 001: 373: 00000000, 000: 374: 00000000, 000: 375: 00000000, 000: 376: 00000000, 000: 377: 00000000, 000: 378: 00000000, 000: 379: 00000000, 000:

So the file is huge. 380 bytes, which is kinda surprising, I think it has something to do with a bunch of headers from NASM and that ELF thing, as evident from the top, so I don't know where the actual assembly program begins, and if its possible to reduce it to just that.

So some things that stick out right away is obviously the string value that is printed to the console, which appears at line (AKA byte#) 160 in the above output. The length of the string happens to be 36, which should correspond to the first value that was stored in the data register (called "myLength" in the assembly program), although I can't find any byte that corresponds to "36" so I'm not sure how that length was stored exactly, if it was broken up or not etc.

Some other noticeable byte that stands out is the number 128, which corresponds to the kernal call, or interrupt number from above (0x80), which appears in the output at line (/byte#) 156, as well as on line 149. It appears other places as well, but in that particular place it appears near the other string so I'm guessing it might correspond to the value passed to "int" the interrupt command, and each of those bytes in those locations is also preceded by the byte "205", which as an ANSI character looks like this Í which may somehow suggest that it corresponds to the command "int", especially as it immediately precedes the number 0x80, but I don't know for sure.

So I tried looking in the intel docs as well as a really long PDF file , but I couldn't find anywhere that talks about:

1. what byte codes exactly correspond to what opcodes and command and

2. what order / format, exactly, to put them in in the compiled output (without resorting to any 3rd party assemblers like NASM)

I found a 3rd party reference sheet but I would preferably like to know where, from the original documentation, they got it, and although it shows some of the codes for the registers I couldn't find out what format to put them in in the generated version. I've also seen these answers to similar questions:

x86 opcode encoding: sib byte

hexadecimal value of opcodes

How to write and execute PURE machine code manually without containers like EXE or ELF?

But all I'm really looking for is a complete and official reference to all of the registers and opcodes

The canonical sources for this type of information are the Intel and AMD reference manuals.

The Intel manual is at https://software.intel.com/en-us/articles/intel-sdm . Information on instruction encoding is in volume 2, chapter 2. Opcode tables are in volume 2, appendix A.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM