PHO - Pokémon Hackers Online
Go Back   PHO - Pokémon Hackers Online > Discussion Board > Archive

Notices

Archive Old threads that serve no purpose are here.

 
 
Thread Tools Display Modes
Old 9th July 2013, 11:37 PM   #1
Full Metal ★
Unstable?
Ex-StaffPHO VIP
 
Full Metal ★'s Avatar
 
Join Date: Oct 2012
Posts: 312
Full Metal ★ Full Metal ★
Post Bits and Bytes and Oh My!



About.
Several new Rom-Hackers have trouble understanding things such as bits, bytes and offsets. The goal and intention of this thread is to provide a reference and explanation that makes this easier to understand. Here are a few things you need to know and understand in order to use this thread functionally:
  • Basic Algebra
  • Will to learn and understand
  • A positive attitude
If you feel there needs to be an addition to this thread, don't hesitate to do so, and I will do my best to add it into the first post in a reasonable time.
Index:Download this tutorial: Here.



Number Bases.
A number base is the amount of possible combinations of characters in a digit. For example, base 10, has 10 possible characters in one digit: 0,1,2,3,4,5,6,7,8,9. Base 2, has 2 different possible values per digit: 0,1. So on and so forth. Number bases are usually assumed, but can be notated with a subscript value.
For example: 101 - You would normally assume a base 10 ( which is the number system most people grow up with now-adays )
But if you put a subscript-2 next to it...
101[sub]2[/sub] it becomes equivalent to: 5[sub]10[/sub]
Here is a list of common number systems:
  • Decimal - base 10.
  • Binary - base 2.
  • Octal - base 8.
  • Hexadecimal (or hex) - base 16.
*Hex numbers are usually notated with a '0x', '&h', '$' preceding the value, as opposed to using a sub-script number to notate the base system.
Back to Top.



Bits.
A bit is the tiniest possible storage unit in modern computing, and uses the binary number system ( base 2 ). This means that it has two possible values: 0, and 1. Let's say you want to know the highest possible value held in X number of bytes.
y = 2[sup]x[/sup]-1
Using that we can figure out that...
2 bits has a maximum value of: 3.
3 bits has a maximum value of: 7.
And so forth.
Now you can adapt that function to fit other number systems.
y = n[sup]x[/sup]-1
Where n is the base ( base 2 - binary, base 10 - decimal, base 8 - octal ) and X is the number of digits.
Back to Top.



Bytes.
A byte consists of 8 bits, and has a maximum value of 0xFF[sub]16[/sub] ( 255 )
In programming, there is a 'signed' or 'unsigned' byte (or char, if you must ). A signed byte sacrifices the most significant bit as a 'negative' flag. The most significant bit is the bit with the highest place-value. ( Furthest away from the bit with value 1[sub]10[/sub] ).
For the sake of simplicity, know that in the rest of this document: I will only notate non-decimal numbers, and The most significant bit ( Bit of significance, or BOS ) will be considered on the right 'side' of a number ( (left)100001000110(right) )

The sacrificing of the BOS means that a signed byte only has 15 bits to store the actual number ( y = 2[sup]15[/sup]-1 ) which effectively cuts the maximum value in half. Unsigned bytes have no such limitations, however negative numbers are not possible in this way.
Back to Top.



Shorts.

It would be good of you to notice that GBA Thumb Instructions (with 1 exception that I am aware of, long branch with link ) are 16 bits in size.

Shorts ( Or half-words ) are 16 bits long ( Max Val: 0xFFFF ). The same information regarding signed-ness applies to shorts, as bytes.

Back to Top.



Words.

It would be good of you to notice that GBA ARM Instructions, and GBA Registers are 32 bits in size. Also notice, on most processors, a WORD is 16 bits, and a DWORD ( double word ) is 32 bits. GBA ARM processor is an exception to this.

Words are 32 bits long ( Max Val: 0xFFFFFFFF ). The same information regarding signed-ness applies to shorts, as bytes and shorts.

Back to Top.



Pointers.

I borrowed the house-address metaphor from C++ for Dummies, 5th edition.

In the world of programming, there exists a thing known as 'variables'. Variables are a programmers way of storing and holding data. As a programmer, you need more than 16 variables, which means you can't just put your variables in your registers. Instead, the variables are stored into memory ( Usually the RAM ). Now you have the variables in memory, now what? If you want to work with them, you have to know where the variable is at in the memory. Think of a city. A city has many houses, apartments, etc. A city also has a mail-man. Mail-men have letters that belong to houses. Letters contain the address of where it belongs. Think of the city as your memory, containing all the houses and apartments ( variables ). The mail contains the address ( pointer ) of a house ( variable ) so that the Mail-Man ( processor ) can get to the house ( variable ) and deliver the mail ( use the data ). In ROM-hacking, an address is commonly referred to as an offset ( the two are equivalent in actuality, but some people hesitate to make the connection )

Back to Top.



Arrays.

A c-style string is an array of chars ( bytes ), and the end of the string is notated by a null-byte ( 0 )

Back to our City metaphor. Houses aren't just randomly dispersed in the city ( usually ). They have neighborhoods. Each house is in a nice row, evenly spaced out, and identical, but the internals of the house can vary. Think of an array as a neighborhood. It contains many houses ( variables ), and each variable can hold it's own value.

Back to Top.



Structures.

You'll notice that I explain things using C and C++ terms quite often, I do apologize for those who do not program in the language, but try to bare with me.

In several processors and architectures, registers are generally 32 bits. The processor can only work with processors. So what happens if your variable is larger than 32 bits? What happens, is a struct. Consider this: A file header has a File-Signature ( provides information about the file type e.g. what version, ensures the correct file type, etc ) and then it contains a WORD ( 32 bit integer ). Well, we'll assume that the signature is 32 bits. 32+32 = 64. This means that our FileHeader Variable can not fit inside a register. So what do we do? We take the variables pointer, and use pointer arithmetics. The first part of our variable ( signature ) is 32 bits ( 4 bytes ). So, we add 4 to our pointer because we want the WORD contained in the header, which is what our pointer now points to. You can now work with the WORD contained in the header.

Back to Top.



BitWise Operators.
A byte is 8 bits, and has a maximum value of 0xFF. A little shortcut for BitWise operations: There are two digits, 4 bits belong to each digit ( when dealing with hexadecimal ). EG: 1111[sub]2[/sub] is equal to 0x0F. 1111 1111 is equal to 0xFF. So if you learn how to count to 0xF in binary, you should be good to go, and doing BitWise operators, as well as converting between number bases, inside your head should be a breaze.

Bit wise operators are just that. They do things to bits. Move bits, reverse bits, set bits, unset bits, etc. Bit Shifting does not apply to a bit. Instead, Bit shifting applies to a group of bits ( Bytes, Shorts, Words, etc ). To BitShift (BS) a unit, you need to know two things:
The amount of bits to shift, and the direction of the shift.
If you bit shift towards the BOS ( left ), the numerical value of the unit will increase. The opposite is also true.
BS-ing to the left: X << N =(exact) X * 2[sup]N[/sup]
BS-ing to the right: X >> N =(rounded) X / 2[sup]N[/sup]
AND operator:
AND-ing, involves two, corresponding bits of two units. IF both of the bits are set ( == 1 ), then the resulting bit is also set ( X = A AND B; X = result, A = Unit 1, B = Unit 2 ). Otherwise, the resulting bit is 0. Unfortunately, I don't know a way to represent this operation with algebra, I'm sorry. In programming ( save for ASM ), the AND operator is represented with the '&' character.
OR operator:
OR-ing, also uses two corresponding bits of two units. IF either bit A, OR bit B is set, then the resulting bit is also set. The only way to get 0 from this operator, is for both bits to be 0. OR-ing is represented with the pipe ( '|' ) character.
XOR ( eXclusive OR operator )
XOR is a bit more complicated than the previous operators, and is somewhat representable in math. 1 XOR 1 = 0. 1 XOR 0 = 1. If BOTH bits are 1, the result is 0. If 1 Bit is one, the result is 1. If BOTH bits are 0, the result is 0.
X XOR Y = C;
C XOR X = Y;
Y XOR C = X;
XOR-ing is represented with a '^' character.
NOT operator:
NOT-ing a bit, is simply reversing it. EG if a bit is set, it becomes unset. If a bit is not set, it becomes set. Typically applied to whole units, but is applicable to a single bit. NOT-ing is often represented by an exclamation point ( '!' ) or a tilde ( '~' [ a C++ destructor reference ] ).

Back to Top.



Logic Operators.
In C++, you signify a destructor with a tilde ( '~' ) followed by the corresponding class name. So in a sense, you're saying NOT X. EG: make X NOT exist. Very clever C++. Very clever.

Without logic, computers would be redundant, at best ( see what I did there? )
Fortunately for us, computer logic is easy to understand. There are a few basic operators you need to know.
X == Y - returns true if X = Y
X <= Y - returns true if X is less than, or equal to Y
X >= Y - returns true if X is greater than, or equal to Y
X < Y - returns true if X is less than Y
X > Y - returns true if X is greater than Y
X != Y - returns true if X is NOT equal to Y
X - returns true if X is NOT 0
!X - returns true if X IS 0.
Take the return value, and IF it is TRUE, then do this. In thumb-ASM, this is what that would look like:

cmp rn,ry @ sets the compares register N, and register Y and sets an appropriate Processor flag ( look them up in gbaTEK )
beq rz @ if ( rn == ry ) goto rZ


Back to Top.



Byte Endianness.
This is what pointers look like. If you have a pointer to address 0xABCDEF, the value in the hex-editor is 0xEFCDAB. HOWEVER, for the most part pointers that most ROM hackers deal with are pointers into the ROM area, which in the GBA is either 0x08NNNNNN, or 0x09NNNNNN. SO, when you see a pointer with '0x08' or '0x09' appended to it, that's what that means. A pointer to the ROM area 0xABCDEF looks like 0xEFCDAB08 in a hex editor.

Byte endianness refers to what order the bytes are in, in a WORD or DWORD. You write numbers like so: 1234. This is known as "Big Endian". In Big Endian, the BOS ( of the DWORD itself ) is all the way on the right. eg:
10101010 10101010 10101010 10100101
The alternative is "Little Endian", and the bytes are in an opposite order.
The best way for me to explain this is by example.
Big Endian: 0x(12 34 56 78)
Little Endian: 0x(78 56 34 12)
Correct me if I'm wrong those who know, but I believe this is the reasoning to this madness.
This seems a little pointless ( albeit, with modern technology, it kind of is ), but in the past processors were slow and the difference between processing 1 byte and 2 bytes may have been significant. If you have a word, and you want say... 16 bits of it. ( 0x12345678 is what you have. You want 0x1234 ) What you would do is:
u32 value = 0x12345678;
u16* pVal = &value; //u16* is syntax to define a pointer of u16 type.
if you look at *pVal ( what pVal points to ) you will get: 0x1234. Why? Because you took the address of a u32 ( 0x12345678 ) and it is stored in memory like so: 0x78563412. If you process the pointer as a short, you get 0x5678 ( the 16 bits are also "flipped" )

Back to Top.

__________________

★ full metal.

#busy
Full Metal ★ is offline  
Likes karatekid552 liked this post
Sponsored Links
Old 10th July 2013, 01:17 AM   #2
Jisuke
私の陰茎は非常に大きい
Ex-StaffPHO VIP
 
Jisuke's Avatar
 
Join Date: Mar 2013
Location: Brooooo...
Age: 22
Posts: 193
Jisuke Jisuke
Default

Boooriiing I actually learnt a lot from this,thank you for making it.
Jisuke is offline  
Old 10th July 2013, 06:28 AM   #3
Pirate Ninja
P l a y t h e f i e l d
 
Pirate Ninja's Avatar
 
Join Date: Jul 2013
Location: The seas
Age: 23
Posts: 49
Pirate Ninja
Default

Somewhat tricky to grasp, nevertheless a nicely made tut.
Pirate Ninja is offline  
Old 10th July 2013, 09:52 AM   #4
Alice
ClariS <3
Style AdminstratorAdministratorPHO VIP
 
Alice's Avatar
 
Join Date: Apr 2010
Location: Azalea Town
Age: 23
Posts: 319
Alice Alice
Default

I'm shocked how much of this I understood first time, thanks for the tut. Now to move on to the ASM tut
__________________
sukiyami.







"I dreamt that you were a dog. And the dog was my husband. Anyway, it was the worst dream ever." — Aisaka Taiga
Alice is offline  
 

Tags
bits, bytes, [Guide]

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT. The time now is 12:59 AM.

Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2018, vBulletin Solutions, Inc. User Alert System provided by Advanced User Tagging (Lite) - vBulletin Mods & Addons Copyright © 2018 DragonByte Technologies Ltd.
Feedback Buttons provided by Advanced Post Thanks / Like (Lite) - vBulletin Mods & Addons Copyright © 2018 DragonByte Technologies Ltd.
Pokémon characters and images belong to Pokémon USA, Inc. and Nintendo.
Pokémon Hackers Online (PHO) is in no way affiliated with or endorsed by Nintendo LLC, Creatures, GAMEFREAK inc,
The Pokémon Company, Pokémon USA, Inc., The Pokémon Company International, or Wizards of the Coast.
All forum/site content (unless noted otherwise) and site designs are © 2006-2013 Pokémon Hackers Online (PHO).
Green Charizard Christos TreeckoLv100

"Black 2" by ARTPOP. Kyurem artwork by XOUS.

no new posts