3. Data Types

Now that you are able to write basic programs let’s examine the concept of data type in more detail. Python has several standard built-in data types to help programmers solve their programming problems. In this course we will work with the most commonly used types: string, number, boolean, list and dictionary. They are enough to solve most programming problems you will encounter in the real world.

3.1. Strings

A string is a sequence of characters that may be a combination of letters, numbers, and special symbols. To define a string in Python, you can enclose the string in matching single or double quotes:

>>> string1 = 'I am enclosed in single quotes.'
>>> string2 = "I am enclosed in double quotes."

If a literal string enclosed in single quotes contains a single quote, you will have to place a backslash ( \ ) before the single quote within the string to “escape” the character. For example:

>>> string3 = 'It doesn\'t look good at all.'
>>> print(string3)
It doesn't look good at all.

You wouldn’t have to use \ if you used double quotes to enclose the string:

>>> string3 = "Doesn’t this look better?"
>>> print(string3)
Doesn’t this look better?

Similarly, you’ll have to place a backslash before a double quote if your string is enclosed in double quotes:

>>> string4 = "I say: \"You get the same with double quotes.\""
>>> print(string4)
I say: "You get the same with double quotes."

3.1.1. Indexing

When dealing with text, it is often necessary to indicate the position of characters within a strings. The technical term for position is index. In Python, indexing starts from 0 (zero). Consequently, the first character of any string has index 0. To illustrate how string indexing works in Python, define the string “Hello World” in interactive mode.

>>> helloText = "Hello World."

This is how Python indexes variable helloText:

H

e

l

l

o

W

o

r

l

d

.

0

1

2

3

4

5

6

7

8

9

10

11

As you can see every character, including the special onces like spaces have an index number.

To access the first character of variable helloText, enter the variable name and the index 0 within square brackets like this:

>>> print(helloText[0])
H
>>> print(helloText[11])
.

It is easy to access the first character because you know that its index number is zero. You do not have this advantage when you want to access the last character of the string. In Python, negative index numbers make counting from back to front easier:

H

e

l

l

o

W

o

r

l

d

.

-12

-11

-10

-9

-8

-7

-6

-5

-4

-3

-2

-1

As an example, to access the last character and second last characters, you can use:

>>> print(helloText[-1])
.
>>> print(helloText[-2])
d

3.1.2. Concatenating and Repeating Strings

Besides indexing, you can use other functions and mathematical operators on a string. Strings can be added together with the plus (+) operator. To concatenate the string “Hello World.”:

>>> helloText = "Hello" + "World!"
>>> print(helloText)
HelloWorld!

You can easily repeat a string with the * operator. Simple examples:

>>> text = "TU" * 3
>>> print(text)
TUTUTU
>>> print(text * 2)
TUTUTUTUTUTU

3.1.3. Length

You can get the number of characters - a.k.a. the size or length - of a string with the len() function. For example:

>>> lengthOfText = len("Hello?")
>>> print(lengthOfText)
6

3.1.4. Slicing

You can take parts of strings, called substrings, with the so-called slicing notation:

[index1:index2].

Here, index1 marks the start of the substring while index2 indicates the index number of the first character you don’t want to include in the substring. For example:

>>> print( "Psychology"[3:5] )
ch
>>> print( "Psychology"[3:6] )
cho

This works equally well when the string is stored in a variable:

>>> text = "Technology"
>>> print(text[4:9])
nolog

If you want the substring to start from a certain character to the end of the original string, you can just omit the second index:

>>> original = "Python"
>>> sliced = original[2:]
>>> print(sliced)
thon

If you want your substring to start from the first character of the original string, you can omit the first index and write the first index NOT to be included anymore in the substring:

>>> original = "Python"
>>> sliced = original[:4]
>>> print(sliced)
Pyth

3.1.5. Changing the case

If you need text to be in all lower case or upper case, you can use the lower() and upper() functions.

For example, Execution of his code:

original = "Eindhoven University of Technology"
lower_case = originalText.lower()
upper_case = originalText.upper()
print(original)
print(lower_case)
print(upper_case)

would give this output:

Eindhoven University of Technology
eindhoven university of technology
EINDHOVEN UNIVERSITY OF TECHNOLOGY

3.1.6. Justification

Printout of strings can be ordered by adding spaces to the end to get a desired length. Function `.ljust(length)` will do the trick:

food_1 = "orange"
category_1 = "fruit"
food_2 = "steak"
category_2 = "meat"
food_3 = "tea"
category_3 = "fluid"

print("food".ljust(10), "category")
print("----".ljust(10), "--------")
print(food_1.ljust(10), category_1)
print(food_2.ljust(10), category_2)
print(food_3.ljust(10), category_3)

with printout:

food       category
----       --------
orange     fruit
steak      meat
tea        fluid

As the output shows this give an effect of “left justification” in a column, hence the name of the function. In many cases, justification to the right is also wanted, for instance with numbers of unequal size. In this case '.rjust(length)' may come in handy:

item_1 = "1 bread"
price_1 =  1.49
item_2 = "10 kg cheese"
price_2 = 15.55
total = price_1 + price_2

line_1 = item_1.ljust(15) + str(price_1).rjust(5)
line_2 = item_2.ljust(15) + str(price_2).rjust(5)
line_3 = "total".ljust(15) + str(total).rjust(5)

print("receipt")
print()
print(line_1)
print(line_2)
print()
print(line_3)

with the corresponding output:

receipt

1 bread         1.49
10 kg cheese   15.55

total          17.04

3.1.7. Splitting

Strings in Python have a method named split() that returns a list containing the words in the string. This can be handy, for instance when analysing a sentence.

# Create a string with multiple words.
>>> my_string = 'This is a sentence.'
# Split the string.
>>> word_list = my_string.split()
# Print the list of words.
>>> print(word_list)
['This', 'is', 'a', 'sentence.']

By default, the split method uses spaces as separators (that is, it returns a list of the words in the string that are separated by spaces). You can specify a different separator by passing it as an argument to the split method. For example, suppose a string contains a date, as shown here:

date_string = '11/26/2014'

If you want to extract the month, day, and year as items in a list, you can call the split method using the ‘/’ character as a separator, as shown here:

date_list = date_string.split('/')

After this statement executes, variable date_list will reference this list:

['11', '26', '2014']

3.1.8. Joining

The opposite method of .split(…) is .join(…), which concatenates all items of a list joined by a specified character or characters. Here is an example of a list of words, ‘glued’ together into a sentence by .join():

>>> words = ['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']
>>> sentence = ' '.join(words)
>>> sentence
'The quick brown fox jumps over the lazy dog'
>>>

3.1.9. Finding characters or substrings

If you need to find the index of a substring you can use the find() and rfind() methods:

>>> long_word = 'Supercalifragilisticexpialidocious' # Mary Poppins
>>> long_word.find('e') # index of first occurence of 'e'
3
>>> # start index of last occurence of 'li'
>>> long_word.rfind('li')
25
>>> # index of first occurence of 'li' between index 5 and 20
>>> long_word.find('li', 5, 20)
7
>>> # start index of last occurence of 'li' between 5 and 20
>>> long_word.rfind('li', 5, 20)
14

3.2. Numbers

Python has three built-in numeric data types: integers, floating-point numbers, and complex numbers. In most cases the Python interpreter can automatically decide which type of number you need when running your statements.

Integers

Integers, or int are whole numbers without decimal point. They can be positive or negative as long as they don’t contain a decimal point that would make a number a floating number, a distinct numeric type.

Floating-point numbers

Floating-point numbers, or float represent real numbers. Floats are written with a decimal point that separates the integer part from the decimal part:

>>> floatVar1 = 625.5
>>> print(floatVar1)
625.5

Alternatively, you can use scientific notation where the ‘E’ or ‘e’ denotes the power of 10:

>>> floatVar2 = 6.255e2
>>> print(floatVar2)
625.5

Complex Numbers

Complex numbers are not part of this course. However, for the interested reader: Complex numbers are pairs of real and imaginary numbers. They take the form ‘a + bj’ where ‘a’ is a float and the real part of the complex number. On the other side is bj where ‘b’ is a float and j indicates the square root of an imaginary number, -1. This makes ‘b’ the imaginary part of the complex number. Just one example:

>>> a = 2 + 5j
>>> b = 4 – 2j
>>> c = a + b
>>> print(c)
(6 + 3j)

3.2.1. Operators (Calculations)

Of course numbers are only useful if we can do calculations with them. Python has most of the simple mathematical operators built in. In the table below you can see some common examples:

operation

result

-x

x negated

x + y

sum of x and y

x - y

difference of x and y

x * y

product of x and y

x / y

quotient of x and y

x ** y

x to the power y

abs(x)

absolute value or magnitude of x

x // y

floored quotient of x and y

x % y

remainder of x / y

Most operators are obvious, but the floored quotient (//) and remainder (%) deserve some attention. They are mostly used in counting operations, like in this simple ‘boxing cookies’ example:

>>> totalCookies = 125
>>> cookiesPerBox = 18
>>> print("Filled cookie boxes:", totalCookies // cookiesPerBox)
Filled cookie boxes: 6
>>> print("Cookies left over:", totalCookies % cookiesPerBox)
Cookies left over: 17

With normal division and rounding such counting operations would be much more complicated.

In science and engineering you often want to perform more advanced calculations than the ones shown in the table above. Besides built-in functions many more standard mathematical operations can be used by importing modules math or statistics with import math or import statistics. When you type help(“math”) or help(“statistics”) in the IDLE console you can see all the functions that are supported by these modules.

3.2.2. Rounding-off numbers

Sometimes the output of a float contains an inconvenient number of digits. The built-in function round() can limit the number of digits to the desired value. An example from the math library with the number \(\pi\):

>>> import math
>>> var = math.pi
>>> print(var)
3.141592653589793
>>> print(round(var, 4))
3.1416
>>> print(round(var, 1))
3.1

Warning: in scientific calculations, only round off if you want to output to the user for instance with print(), or you will loose the accuracy in your calculations.

3.3. Booleans

In if or while statements you use comparisons like a == b or c > 21 to make decisions in your program. Such comparisons are called condition in programming. Sometimes it is useful to store the result of such a comparison in a variable. Such variables are of data type boolean or bool for short and they evaluate to one of two values: True or False. A simple example in interactive mode:

>>> varA = 5
>>> varB = 7
>>> varAgreaterThenB = (varA > varB)
>>> print(varAgreaterThenB)
False
>>> varBgreaterThenA = (varB > varA)
>>> print(varBgreaterThenA)
True

3.3.1. Operators for Comparisons

Comparisons are the basis of most decisions in programs, so it is good to know the most common operators. They are listed below with their meaning.

operation

Meaning

<

strictly less than

<=

less than or equal

>

strictly greater than

>=

greater than or equal

==

equal

!=

not equal

Almost all data types can be compared. Here are some examples:

aNumber == 0.0          # True if-and-only-if aNumber is 0.0
"cat" != "dog"          # True

3.3.2. Boolean Operations

There are only three boolean operators. They are displayed in the table below with their result.

operator

example

result

or

A or B

if A and B are both False, then False, else True

and

A and B

if A and B are both True, then True, else False

not

not A

if A is false, then True, else False

Some more interactive example code to illustrate the effects:

>>> A = True
>>> B = False
>>> A and B
False
>>> A or B
True
>>> not B
True
>>> A and not B
True

The and, or and not operators can combine comparisons:

status == "single" and mood == "interested"
favourite_color != "yellow" or lucky_number != 7

Some comparisons, like > and < can be chained to get more compact expressions:

12 < val < 18            # equivalent to: 12 < val and val < 18
"bird" < animal < "dog"  # strings compare alphabetical order

3.3.3. either - or

There are situations where you need to evaluate if one of two conditions is True but not both. This is sometimes also called either-or, or exlusive-or (xor for short).

As a simple example consider organisms in biology. One of the aspects is whether it is alive or dead. A biologist will either classify a being as alive (is_alive == True) or dead (is_dead == True). If we exclude mythical beings like zombies and vampires, Schrodingers Cat etc., clearly is_alive and is_dead cannot be both true and also not both False.

There is no standard operator in Python for this case, but a condition can be constructed combining and, or and not. The simplest way to do this is:

(is_alive and not is_dead) or (not is_alive and is_dead)

Please verify for yourself that this combination evaluates indeed to either-or. For the interested reader, there is a technique of constructing more complicated conditions like exclusive-or through a so-called ‘truth table’, but that is beyond the scope of this course.

3.4. Lists

In real world programming problems we usually deal with many similar variables, like a file with measurement data. A list is a data type that can be used to store multiple items of any type. You can define and assign items to a list with the expression:

myList = [item1, item2, item3]

Here is a simple example of a shopping list:

>>> myShoppingList = ['Bread', 'Milk', 'Cookies']
>>> print(myShoppingList)
['Bread', 'Milk', 'Cookies']

Python also allows creation of an empty list that you can fill later:

myList = []

The items do not need to be of the same type:

>>> myWeirdList = [2.7479, 'Some Text', True]
>>> print(myWeirdList)
[2.7479, 'Some Text', True]

Most of the time, however, you will use lists with items of the same data type.

To illustrate some of the possibilities with lists, let’s create a list of colors.

>>> colors = ['red', 'orange', 'yellow', 'green', 'indigo', 'white']

Lists are indexed object, like strings. Here, the first item on colors has zero as its index. To access the first item on the list, you can print the color with the command:

>>> print(colors[0])
'red'

To print the color name of the fifth color on the list, you can enter:

>>> print(colors[4])
'indigo'

3.4.1. Getting the number of items

To see how many colors are in the list, you can use the len() function:

>>> len(colors)
6

There are indeed six colors in your list.

3.4.2. Removing items

Now, let’s suppose you want a list of all seven colors of the rainbow. To see if colors is correct you can use print() to see all items:

>>> print(colors)
[‘red’, ‘orange’, ‘yellow’, ‘green’, ‘indigo’, ‘white’]

Currently, one item should be not in list colors: ‘white’ (not part of the rainbow). To remove an item like ‘white’ from the list, you can use the .remove() function:

>>> colors.remove('white')

You can view the updated list with print() command:

>>> print(colors)
['red', 'orange', 'yellow', 'green', 'indigo']

The .remove() works if you know the value of the item (e.g. ‘white’). If you only know the index of an item, you can use the del() function on the selected item (in the case of ‘white’ this is colors[5]):

>>> del(colors[5])

This has the same effect as .remove(‘white’). (N.B. be careful not to run both .remove(‘white’) and del(colors[5]) as you cannot remove the same item twice, that will give an error)

3.4.3. Adding items

The list is still two colors short – ‘violet’ and ‘blue’. To add ‘violet’ to your colors list, you can use the append() method:

>>> colors.append('violet')
>>> print(colors)
[‘red’, ‘orange’, ‘yellow’, ‘green’, ‘indigo’, ‘violet’]

The color ‘violet’ was added to the end of the list.

You only need to add one more color - ‘blue’. Let’s say you want to have ‘blue’ inserted between ‘green’ and ‘indigo’. You can use Python’s insert() method with the syntax: list.insert(index, new_item). The parameters are index and new_item. Parameter index refers to the position where you want the new item to be located. Parameter new_item is the item you want to insert. Applying the syntax to the example above, you’ll have the command:

>>> colors.insert(4, 'blue')

To see the completed list of rainbow colors:

>>> print(colors)
[‘red’, ‘orange’, ‘yellow’, ‘green’, ‘blue’, ‘indigo’, ‘violet’]

3.4.4. Slicing lists

You can slice lists in the same way that you can slice strings.

For example, if you only want to display the colors ‘green’, ‘blue’, and ‘indigo’, with index of 3, 4, 5 respectively, you can use the [] notation:

>>> # select items 3, 4 and 5
>>> colors[3:6]
[‘green’, ‘blue’, ‘indigo’]

>>> # select the whole list (returns a copy)
>>> colors[:]
[‘red’, ‘orange’, ‘yellow’, ‘green’, ‘blue’, ‘indigo’, ‘violet’]

>>> # select the odd indices of the whole list
>>> colors[1::2]
['orange', 'green', 'indigo']

>>> # select the even indices of the whole list
>>> colors[::2]
[‘red’, yellow’, ‘blue’, ‘violet’]

3.4.5. Ordering items

Lists can be ordered easily with the sort() function:

rainbow = ['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet']
rainbow.sort()
print(rainbow)

This will only work if all items in the list can be compared. The following code:

melee = ['red', True, 3.1415]
melee.sort()

would result in an error because data from different types cannot be ordered.

3.4.6. Testing for membership

Often it is necessary to know whether an item is in a list or not. Python uses the in keyword to return True of False depending on membership. Suppose you want to test whether a nation is a member of the European Union or not. This is easily done with in as can be seen from the interactive session below:

>>> eu_members = ['Belgium', 'France', 'Germany', 'Italy', 'Luxembourg',
              'Netherlands', 'Denmark', 'Ireland', 'United Kingdom',
              'Greece', 'Portugal', 'Spain', 'Austria', 'Finland',
              'Sweden', 'Cyprus', 'Czech Republic', 'Estonia', 'Hungary',
              'Latvia', 'Lithuania', 'Malta', 'Poland', 'Slovakia',
              'Slovenia', 'Bulgaria', 'Romania', 'Croatia']
>>> 'United Kingdom' in eu_members
True
>>> 'Russia' in eu_members
False

This will work regardless of the type of the item:

>>> 3.1415 in eu_members
False

3.4.7. Finding the index of a list item

Another useful question is what the index of an item is (provided it is in the list). Lists have the .index() method to return the index of the item. From the eu_membership example above:

>>> eu_members.index('Netherlands')
5

The .index() method works in a similar way to the .find() method for strings (see section 3.1.9). For example, like with .find() you can also specify a start and end index for the search. There is one important difference worth mentioning. If the item is not in the list the .index() method will crash (instead of return -1 like .find()).

>>> eu_members.index('USA')
Traceback (most recent call last):
  File "<pyshell#22>", line 1, in <module>
    eu_members.index('USA')
ValueError: 'USA' is not in list

The problem of the crash can be solved by testing for membership with the in operator:

>>> if 'USA' in eu_members:
        print(eu_members.index('USA'))
    else:
        print(-1)
-1
>>>

For the interested reader, here is a more advanced way of handling the ValueError:

try:
    print(eu_members.index('Croatia'))
except ValueError:
    print(-1)

try:
    print(eu_members.index('Korea'))
except ValueError:
    print(-1)

which will give the desired result:

27
-1

The example above uses the technique of Exception Handling. It is very powerful, because it can deal with many more types of exceptions than just list membership. The drawback of this way of programming is that the intention of the programmer (testing for membership) may be harder to understand from the code itself compared to the solution with the in test.

3.4.8. Lists of lists

In science and engineering data is often organised in tables (also called matrices in mathematics). Python has no built-in type for tables, but the items can be stored in lists-of-lists, where the ‘inner’ lists are the rows of the table. As a simple example consider the sales records of companies over several years:

name

2015

2016

2017

Apple

2000

3000

4000

Google

8000

2000

1000

Tesla

-300

-200

-100

The items in each row can be represented by simple lists: [‘name’, 2015, 2016, 2017], [‘Apple’, 2000, 3000, 4000] etc. and combined in a new list-of-lists:

sales =  [ ['name', 2015, 2016, 2017], ['Apple', 2000, 3000, 4000],
           ['Google', 8000, 2000, 1000], ['Tesla', -300, -200, -100] ]

Individual items can be selected by indexing twice: sales[r][c], where c is the index of the item in row r. To select the sales of Tesla in 2016 we first select the Tesla row with sales[3] and then year 2016 by index 2:

print(sales[3][2])

If we want to select all the items in a column, we need to iterate over all the rows (please refer to chapter 5 for an explanation of the for statement).

Here is an example for the output of the sales of all companies in 2016:

# print all sales in 2016
for row in sales:
        print(row[0].ljust(9), row[2])

with the corresponding output:

name     2016
Apple    3000
Google   2000
Tesla    -200

3.5. Dictionaries

The final built-in data type is the so-called dictionary. A dictionary is like a list but instead of looking up an index to access values, you use a unique key. A colon separates a key from its value and all are enclosed in curly braces. Here is an example:

capital_cities = {'Netherlands' : 'Amsterdam', 'USA' : 'Washington DC', 'China' : 'Beijing'}

In capital_cities ‘Netherlands’ - ‘Amsterdam’ form a so-called key-value pair. A dictionary can be a very useful tool for storing and manipulating key-value pairs such as those used in phone books, directory, menu, or log-in data. You can add, modify, or delete existing entries within the dictionary.

To see how dictionaries actually work, let’s create a dictionary named menu with dish and prices pairs:

menu = {'spam' : 12.50, 'carbonara' : 20, 'salad' : 15 }

To see how many key-value pairs are stored in the dictionary, you can use the len() function:

>>>len(menu)
3

To add another entry in the menu dictionary, you can use this format:

menu['cake'] = 6

To see the updated menu, use the print() command:

>>> print(menu)
{‘spam’: 12.5, ‘carbonara’: 20, ‘salad’: 15, ‘cake’: 6}
>>> len(menu)
4

Assuming you no longer want ‘spam’ in your menu, you can do so with the del() command:

>>> del menu['spam']
>>> print(menu)
{‘carbonara’: 20, ‘salad’: 15, ‘cake’: 6}

You might want to change the values in any of the keys at one point. For instance, you need to change the price of carbonara from 20 to 22. To do that, you’ll just assign a new value to the key with this command:

>>> menu['carbonara'] = 22
>>> print(menu)
{‘carbonara’: 22, ‘salad’: 15, ‘cake’: 6}

If you want to remove all entries in the dictionary, you can use the function dict.clear( ).

To clear all entries in variable menu:

>>> dict.clear(menu)
>>> print(menu)
{}

Finally, you can define an empty dictionary yourself in this way:

d = {}

3.6. Type Conversions

In most cases Python automatically handles data types, for instance in the calculation of expressions. Occasionally, however, you may have to convert one type into another explicitly, e.g. when a function requires an argument to be of a certain type. You can use the built-in functions str(), float() and int() to convert data to another type.

3.6.1. The type function

You can always manually check the current type of a variable with type():

>>> type(True)
<class 'bool'>
>>> a_bool = False
>>> type(a_bool)
<class 'bool'>

>>> type(3.5)
<class 'float'>
>>> a_number = 3.5
>>> type(a_number)
<class 'float'>

>>> type("Hey there")
<class 'str'>
>>> a_text = 'Hey there'
>>> type(a_text)
<class 'str'>

>>> a_list = ['stuff', number, True]
>>> type(a_list)
<class 'list'>

3.6.2. Conversion to string

Types can be easily converted to string format with the str() function. The following interactive code shows that the type of the information has been converted to type string:

>>> float_1 = 7.5
>>> type(float_1)
<class 'float'>
>>> string_1 = str(float_1)
>>> type(string_1)
<class 'str'>
>>> print(string_1)
'7.5'

One common example in which the conversion to string format is useful, is if you want to use a number in the message of an input() function, since input() only allows one string (variable) between the round brackets. Here is an example:

cigs = 12
question = "How many years did you smoke " + str(cigs) + " cigarettes/day? "
answer = input(question)
years = int(answer)
print("That is", years, "years too many!)

Here is a sample run:

How many years have you smoked 12 cigarettes per day? 5

That is 5 years too many!

3.6.3. Conversion to float, int and bool

Data can be forced into the type you need with these functions, here are some simple examples:

>>> variable1 = float("12.5")           # conversion from string to float
>>> variable2 = int(variable1)          # conversion from float to int
>>> variable3 = bool(variable2)         # conversion from int to bool
>>> type(variable1)
<class 'float'>
>>> type(variable2)
<class 'int'>
>>> type(variable3)
<class 'bool'>

You can check for yourself how the values are changed in these conversions. For example, please note that int() does not round off, but truncates (similar to ‘flooring’ in mathematics).

© Copyright 2022, dr. P. Lambooij

last updated: Sep 19, 2022