Python uses indentation to define control and loop constructs. This contributes to Python's readability, however, it requires the programmer to pay close attention to the use of whitespace. Thus, editor miscalibration could result in code that behaves in unexpected ways.
Python uses the colon symbol (:) and indentation for showing where blocks of code begin and end (If you come from another language, do not confuse this with somehow being related to the ternary operator). That is, blocks in Python, such as functions, loops, if clauses and other constructs, have no ending identifiers. All blocks start with a colon and then contain the indented lines below it.
For example:
def my_function(): # This is a function definition. Note the colon (:) a = 2 # This line belongs to the function because it's indented return a # This line also belongs to the same function print(my_function()) # This line is OUTSIDE the function block
or
if a > b: # If block starts here print(a) # This is part of the if block else: # else must be at the same level as if print(b) # This line is part of the else block
Blocks that contain exactly one single-line statement may be put on the same line, though this form is generally not considered good style:
if a > b: print(a) else: print(b)
Attempting to do this with more than a single statement will not work:
if x > y: y = x print(y) # IndentationError: unexpected indent
if x > y: while y != z: y -= 1 # SyntaxError: invalid syntax
An empty block causes an IndentationError. Use pass (a command that does nothing) when you have a block with no content:
def will_be_implemented_later(): pass Spaces vs. Tabs
In short: always use 4 spaces for indentation.
Using tabs exclusively is possible but PEP 8, the style guide for Python code, states that spaces are preferred.
Python 3.x Version ≥ 3.0
Python 3 disallows mixing the use of tabs and spaces for indentation. In such case a compile-time error is generated: Inconsistent use of tabs and spaces in indentation and the program will not run.
Python 2.x Version ≤ 2.7
Python 2 allows mixing tabs and spaces in indentation; this is strongly discouraged. The tab character completes the previous indentation to be a multiple of 8 spaces. Since it is common that editors are configured to show tabs as multiple of 4 spaces, this can cause subtle bugs.
Citing PEP 8:
When invoking the Python 2 command line interpreter with the -t option, it issues warnings about code that illegally mixes tabs and spaces. When using -tt these warnings become errors. These options are highly recommended!
Many editors have "tabs to spaces" configuration. When configuring the editor, one should differentiate between the tab character ('\t') and the Tab key.
The tab character should be configured to show 8 spaces, to match the language semantics - at least in cases when (accidental) mixed indentation is possible. Editors can also automatically convert the tab character to spaces. However, it might be helpful to configure the editor so that pressing the Tab key will insert 4 spaces, instead of inserting a tab character.
Python source code written with a mix of tabs and spaces, or with non-standard number of indentation spaces can be made pep8-conformant using autopep8. (A less powerful alternative comes with most Python installations: reindent.py) Section 1.4: Datatypes
Built-in Types Booleans
bool: A boolean value of either True or False. Logical operations like and, or, not can be performed on booleans.
x or y # if x is False then y otherwise x x and y # if x is False then x otherwise y not x # if x is True then False, otherwise True
In Python 2.x and in Python 3.x, a boolean is also an int. The bool type is a subclass of the int type and True and False are its only instances:
issubclass(bool, int) # True
isinstance(True, bool) # True isinstance(False, bool) # True
If boolean values are used in arithmetic operations, their integer values (1 and 0 for True and False) will be used to return an integer result:
True + False == 1 # 1 + 0 == 1 True * True == 1 # 1 * 1 == 1
Numbers
int: Integer number
a = 2 b = 100 c = 123456789 d = 38563846326424324
Integers in Python are of arbitrary sizes.
Note: in older versions of Python, a long type was available and this was distinct from int. The two have been unified.
float: Floating point number; precision depends on the implementation and system architecture, for CPython the float datatype corresponds to a C double.
a = 2.0 b = 100.e0 c = 123456789.e1
complex: Complex numbers
a = 2 + 1j b = 100 + 10j
The <, <=, > and >= operators will raise a TypeError exception when any operand is a complex number.
Strings Python 3.x Version ≥ 3.0
str: a unicode string. The type of 'hello' bytes: a byte string. The type of b'hello'
Python 2.x Version ≤ 2.7
str: a byte string. The type of 'hello' bytes: synonym for str unicode: a unicode string. The type of u'hello'
Sequences and collections
Python differentiates between ordered sequences and unordered collections (such as set and dict).
strings (str, bytes, unicode) are sequences
reversed: A reversed order of str with reversed function
a = reversed('hello')
tuple: An ordered collection of n values of any type (n >= 0).
a = (1, 2, 3) b = ('a', 1, 'python', (1, 2)) b[2] = 'something else' # returns a TypeError
Supports indexing; immutable; hashable if all its members are hashable
list: An ordered collection of n values (n >= 0)
a = [1, 2, 3] b = ['a', 1, 'python', (1, 2), [1, 2]] b[2] = 'something else' # allowed
Not hashable; mutable.
set: An unordered collection of unique values. Items must be hashable.
a = {1, 2, 'a'}
dict: An unordered collection of unique key-value pairs; keys must be hashable.
a = {1: 'one', 2: 'two'}
b = {'a': [1, 2, 3], 'b': 'a string'}
An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() method). Hashable objects which compare equality must have the same hash value.
Built-in constants
In conjunction with the built-in datatypes there are a small number of built-in constants in the built-in namespace:
True: The true value of the built-in type bool False: The false value of the built-in type bool None: A singleton object used to signal that a value is absent. Ellipsis or ...: used in core Python3+ anywhere and limited usage in Python2.7+ as part of array notation. numpy and related packages use this as a 'include everything' reference in arrays. NotImplemented: a singleton used to indicate to Python that a special method doesn't support the specific arguments, and Python will try alternatives if available.
a = None # No value will be assigned. Any valid datatype can be assigned later Python 3.x Version ≥ 3.0
None doesn't have any natural ordering. Using ordering comparison operators (<, <=, >=, >) isn't supported anymore and will raise a TypeError.
Python 2.x Version ≤ 2.7
None is always less than any number (None < -32 evaluates to True).
Testing the type of variables
In python, we can check the datatype of an object using the built-in function type.
a = '123' print(type(a))
# Out: <class 'str'> b = 123 print(type(b)) # Out: <class 'int'>
In conditional statements it is possible to test the datatype with isinstance. However, it is usually not encouraged to rely on the type of the variable.
i = 7 if isinstance(i, int): i += 1 elif isinstance(i, str): i = int(i) i += 1
For information on the differences between type() and isinstance() read: Differences between isinstance and type in Python
To test if something is of NoneType:
x = None if x is None: print('Not a surprise, I just defined x as None.') Converting between datatypes
You can perform explicit datatype conversion.
For example, '123' is of str type and it can be converted to integer using int function.
a = '123' b = int(a)
Converting from a float string such as '123.456' can be done using float function.
a = '123.456' b = float(a) c = int(a) # ValueError: invalid literal for int() with base 10: '123.456' d = int(b) # 123
You can also convert sequence or collection types
a = 'hello' list(a) # ['h', 'e', 'l', 'l', 'o'] set(a) # {'o', 'e', 'l', 'h'} tuple(a) # ('h', 'e', 'l', 'l', 'o') Explicit string type at definition of literals
With one letter labels just in front of the quotes you can tell what type of string you want to define.
b'foo bar': results bytes in Python 3, str in Python 2 u'foo bar': results str in Python 3, unicode in Python 2 'foo bar': results str r'foo bar': results so called raw string, where escaping special characters is not necessary, everything is taken verbatim as you typed
normal = 'foo\nbar' # foo
# bar escaped = 'foo\\nbar' # foo\nbar raw = r'foo\nbar' # foo\nbar Mutable and Immutable Data Types
An object is called mutable if it can be changed. For example, when you pass a list to some function, the list can be changed:
def f(m): m.append(3) # adds a number to the list. This is a mutation.
x = [1, 2] f(x) x == [1, 2] # False now, since an item was added to the list
An object is called immutable if it cannot be changed in any way. For example, integers are immutable, since there's no way to change them:
def bar(): x = (1, 2) g(x) x == (1, 2) # Will always be True, since no function can change the object (1, 2)
Note that variables themselves are mutable, so we can reassign the variable x, but this does not change the object that x had previously pointed to. It only made x point to a new object.
Data types whose instances are mutable are called mutable data types, and similarly for immutable objects and datatypes.
Examples of immutable Data Types:
int, long, float, complex str bytes tuple frozenset
Examples of mutable Data Types:
bytearray list set dict Section 1.5: Collection Types
There are a number of collection types in Python. While types such as int and str hold a single value, collection types hold multiple values.
Lists
The list type is probably the most commonly used collection type in Python. Despite its name, a list is more like an array in other languages, mostly JavaScript. In Python, a list is merely an ordered collection of valid Python values. A list can be created by enclosing values, separated by commas, in square brackets:
int_list = [1, 2, 3] string_list = ['abc', 'defghi']
A list can be empty:
empty_list = []
The elements of a list are not restricted to a single data type, which makes sense given that Python is a dynamic language:
mixed_list = [1, 'abc', True, 2.34, None]
A list can contain another list as its element:
nested_list = [['a', 'b', 'c'], [1, 2, 3]]
The elements of a list can be accessed via an index, or numeric representation of their position. Lists in Python are zero-indexed meaning that the first element in the list is at index 0, the second element is at index 1 and so on:
names = ['Alice', 'Bob', 'Craig', 'Diana', 'Eric'] print(names[0]) # Alice print(names[2]) # Craig
Indices can also be negative which means counting from the end of the list (-1 being the index of the last element). So, using the list from the above example:
print(names[-1]) # Eric print(names[-4]) # Bob
Lists are mutable, so you can change the values in a list:
names[0] = 'Ann' print(names) # Outputs ['Ann', 'Bob', 'Craig', 'Diana', 'Eric']
Besides, it is possible to add and/or remove elements from a list:
Append object to end of list with L.append(object), returns None.
names = ['Alice', 'Bob', 'Craig', 'Diana', 'Eric'] names.append("Sia") print(names) # Outputs ['Alice', 'Bob', 'Craig', 'Diana', 'Eric', 'Sia']
Add a new element to list at a specific index. L.insert(index, object)
names.insert(1, "Nikki") print(names) # Outputs ['Alice', 'Nikki', 'Bob', 'Craig', 'Diana', 'Eric', 'Sia']
Remove the first occurrence of a value with L.remove(value), returns None
names.remove("Bob") print(names) # Outputs ['Alice', 'Nikki', 'Craig', 'Diana', 'Eric', 'Sia']
one_member_tuple = tuple(['Only member'])
Dictionaries
A dictionary in Python is a collection of key-value pairs. The dictionary is surrounded by curly braces. Each pair is separated by a comma and the key and value are separated by a colon. Here is an example:
state_capitals = { 'Arkansas': 'Little Rock', 'Colorado': 'Denver', 'California': 'Sacramento', 'Georgia': 'Atlanta' }
To get a value, refer to it by its key:
ca_capital = state_capitals['California']
You can also get all of the keys in a dictionary and then iterate over them:
for k in state_capitals.keys(): print('{} is the capital of {}'.format(state_capitals[k], k))
Dictionaries strongly resemble JSON syntax. The native json module in the Python standard library can be used to convert between JSON and dictionaries.
set
A set is a collection of elements with no repeats and without insertion order but sorted order. They are used in situations where it is only important that some things are grouped together, and not what order they were included. For large groups of data, it is much faster to check whether or not an element is in a set than it is to do the same for a list.
Defining a set is very similar to defining a dictionary:
first_names = {'Adam', 'Beth', 'Charlie'}
Or you can build a set using an existing list:
my_list = [1,2,3] my_set = set(my_list)
Check membership of the set using in:
if name in first_names: print(name)
You can iterate over a set exactly like a list, but remember: the values will be in an arbitrary, implementationdefined order.
defaultdict
A defaultdict is a dictionary with a default value for keys, so that keys for which no value has been explicitly defined can be accessed without errors. defaultdict is especially useful when the values in the dictionary are collections (lists, dicts, etc) in the sense that it does not need to be initialized every time when a new key is used.