Skip to main content

NumPy Arrays


NumPy (Numerical Python)is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. The ancestor of NumPy, Numeric, was originally created by Jim Hugunin with contributions from several other developers. In 2005, Travis Oliphant created NumPy by incorporating features of the competing Numarray into Numeric, with extensive modifications. NumPy is open-source software and has many contributors.
To use the numpy package in your program, you have to import the package as follows
import numpy as np
Arrays
A numpy array is a grid of values, all of the same type, and is indexed by a tuple of non negative integers. In NumPy dimensions are called axes.The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension.
We can initialize numpy arrays from nested Python lists, and access elements using square brackets:
Python array indexing start from 0.
Example:
import numpy as np
a = np.array([1, 2, 3]) # Create a rank 1 array 
print(type(a)) # Prints "<class 'numpy.ndarray'>" 
print(a.shape) # Prints "(3,)" 
print(a[0], a[1], a[2]) # Prints "1 2 3" 
a[0] = 5 # Change an element of the array 
print(a) # Prints "[5, 2, 3]" 
print(a.size) # prints 3 
b = np.array([[1,2,3],[4,5,6]]) # Create a rank 2 array 
print(b.shape) # Prints "(2, 3)" 
print(b.ndim) # Prints 2
print(b[0, 0], b[0, 1], b[1, 0]) # Prints "1 2 4"

Numpy also provides many functions for intrinsic array  creation:
import numpy as np 
a = np.zeros((2,2)) # Create an array of all zeros 
print(a) # Prints "[[ 0. 0.]  [ 0. 0.]]" 
b = np.ones((1,2)) # Create an array of all ones
print(b) # Prints "[[ 1. 1.]]" 
c = np.full((2,2), 7) # Create a constant array 
print(c) # Prints "[[ 7. 7.]  [ 7. 7.]]" 
d = np.eye(2) # Create a 2x2 identity matrix 
print(d) # Prints "[[ 1. 0.] [ 0. 1.]]"
e = np.random.random((2,2)) # Create an array filled with random values 
print(e) # Might print "[[ 0.91940167    0.08143941] [ 0.68744134   0.87236687]]"
>>> np.arange(10)
 array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
 >>> np.arange(2, 10, dtype=float) 
array([ 2., 3., 4., 5., 6., 7., 8., 9.]) 
>>> np.arange(2, 3, 0.1) 
array([ 2. , 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9])
>>> b = np.arange(12).reshape(4,3) # 2d array
>>> print(b)
 [[ 0 1 2] 
[ 3 4 5] 
[ 6 7 8] 
[ 9 10 11]]
>>> from numpy import pi
>>> np.linspace( 0, 2, 9 ) # 9 numbers from 0 to 2 array([0. , 0.25, 0.5 , 0.75, 1. , 1.25, 1.5 , 1.75, 2. ]) >>> x = np.linspace( 0, 2*pi, 100 ) # useful to evaluate function at lots of points 
>>> f = np.sin(x)
Array Indexing, Slicing and Iterating
One-dimensional arrays can be indexed, sliced and iterated over, much like lists and other Python sequences.
>>> a = np.arange(10)**3 
>>> a array([ 0, 1, 8, 27, 64, 125, 216, 343, 512, 729])
 >>> a[2] 
>>> a[2:5]
 array([ 8, 27, 64]) 
# equivalent to a[0:6:2] = 1000; # from start to position 6, exclusive, set every 2nd element to 1000 
>>> a[:6:2] = 1000 
>>> a 
array([1000, 1, 1000, 27, 1000, 125, 216, 343, 512, 729])
 >>> a[ : :-1] # reversed a 
array([ 729, 512, 343, 216, 125, 1000, 27, 1000, 1, 1000]) 
>>> for i in a:
     ... print(i**(1/3.))
  ...
 9.999999999999998 
1.0 
9.999999999999998 
3.0
 9.999999999999998 
4.999999999999999 
5.999999999999999 
6.999999999999999 
7.999999999999999 
8.999999999999998
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]]) 
# Use slicing to pull out the sub array consisting of the first 2 rows  and columns 1 and 2; b is the following array of shape (2, 2): 
>>>b = a[:2, 1:3]
>>>b
[[2 3] 
 [6 7]] 
 Two ways of accessing the data in the middle row of the array.  Mixing integer indexing with slices yields an array of lower rank,  while using only slices yields an array of the same rank as the original array: 
>>>row_r1 = a[1, :] # Rank 1 view of the second row of a row_
>>>r2 = a[1:2, :] # Rank 2 view of the second row of a 
print(row_r1, row_r1.shape) # Prints "[5 6 7 8] (4,)" 
print(row_r2, row_r2.shape) # Prints "[[5 6 7 8]] (1, 4)" 
 # We can make the same distinction when accessing columns of an array: 
col_r1 = a[:, 1]
col_r2 = a[:, 1:2]
One useful trick with integer array indexing is selecting or mutating one element from each row of a matrix:
import numpy as np # Create a new array from which we will select elements 
a = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]]) 
print(a) # prints "array([[ 1, 2, 3],  [ 4, 5, 6],  [ 7, 8, 9],  [10, 11, 12]])" 
 # Create an array of indices 
b = np.array([0, 2, 0, 1])
 # Select one element from each row of a using the indices in b 
print(a[np.arange(4), b]) # Prints "[ 1 6 7 11]" 
 # Mutate one element from each row of a using the indices in b 
a[np.arange(4), b] += 10
 print(a) # prints "array([[11, 2, 3],  [ 4, 5, 16],  [17, 8, 9],  [10, 21, 12]])
b= array([[ 0, 1, 2, 3], [10, 11, 12, 13], [20, 21, 22, 23], [30, 31, 32, 33], [40, 41, 42, 43]])
 >>> b[2,3] 
23 
>>> b[0:5, 1] # each row in the second column of b 
array([ 1 11 21 31 41]) 
>>> b[ : ,1] # equivalent to the previous example array([ 1, 11, 21, 31, 41]) 
>>> b[1:3, : ] # each column in the second and third row of b
 array([[10 11 12 13], [20 21 22 23]])

When fewer indices are provided than the number of axes, the missing indices are considered complete slices:
>>>>>> b[-1] # the last row. Equivalent to b[-1,:] 
array([40 41 42 43])
The expression within brackets in b[i] is treated as an i followed by as many instances of : as needed to represent the remaining axes. NumPy also allows you to write this using dots as b[i,...].
The dots (...) represent as many colons as needed to produce a complete indexing tuple. For example, if x is an array with 5 axes, then x[1,2,...] is equivalent to x[1,2,:,:,:]

Indexing with Arrays of Indices
import numpy as np
a = np.array([10,11,12,13,14,15,16,17,18,19,20])
i=np.array([3,4,5])
print(a[i]) # will print 13,14,15
j = np.array([[3, 4], [5, 6]) # a bidimensional array of indices
printf( a[j]) # will print  array([[ 13 14] [15 16]])  the same shape as j 
a = np.array([[10,11,12],[13,14,15],[16,17,18],[19,20,21]])
i=np.array([1,2])
print(a[i]) # will print [[13 14 15][16 17 18]]
a = np.array([[10,11,12],[13,14,15],[16,17,18],[19,20,21]])
i=np.array([[1,2],[2,3]])
print(a[i])
#will print
[[[13 14 15]
[16 17 18]]

[[16 17 18]
[19 20 21]]]
We can also give indexes for more than one dimension. The arrays of indices for each dimension must have the same shape.
a = np.array([[10,11,12],[13,14,15],[16,17,18],[19,20,21]])
i=np.array([[1,2],[2,3]])
j=np.array([[1,1],[2,2]])
print(a[i,j])
#will print
[[14 17]
[18 21]]
Another common use of indexing with arrays is the search of the maximum value of time-dependent series:
>>>>>> time = np.linspace(20, 145, 5) # time scale 
>>> data = np.sin(np.arange(20)).reshape(5,4) # 4 time-dependent series 
>>> time 
array([ 20. , 51.25, 82.5 , 113.75, 145. ]) 
>>> data 
array([[ 0. , 0.84147098, 0.90929743, 0.14112001], 
           [-0.7568025 , -0.95892427, -0.2794155 , 0.6569866 ], 
           [ 0.98935825, 0.41211849, -0.54402111, -0.99999021], 
           [-0.53657292, 0.42016704, 0.99060736, 0.65028784], 
           [-0.28790332, -0.96139749, -0.75098725, 0.14987721]]) 
 # index of the maxima for each series 
>>> ind = data.argmax(axis=0)
>>> ind
 array([2, 0, 3, 1]) # times corresponding to the maxima 
>>> time_max = time[ind] 
>>> data_max = data[ind, range(data.shape[1])] # => data[ind[0],0], data[ind[1],1]... 
 >>> time_max 
array([ 82.5 , 20. , 113.75, 51.25]) 
>>> data_max 
array([0.98935825, 0.84147098, 0.99060736, 0.6569866 ])
You can also use indexing with arrays as a target to assign to:
>>> a = np.arange(5) 
>>> a 
array([0, 1, 2, 3, 4]) 
>>> a[[1,3,4]] = 0 
>>> a
 array([0, 0, 2, 0, 0])
However, when the list of indices contains repetitions, the assignment is done several times, leaving behind the last value:
>>>>>> a = np.arange(5) 
>>> a[[0,0,2]]=[1,2,3] 
>>> a 
array([2, 1, 3, 3, 4])
Boolean array indexing
Boolean array indexing lets you pick out arbitrary elements of an array. Frequently this type of indexing is used to select the elements of an array that satisfy some condition. Here is an example:
import numpy as np
a = np.array([[1,2], [3, 4], [5, 6]]) 
bool_idx = (a > 2) 
# Find the elements of a that are bigger than 2
# this returns a numpy array of Booleans of the same # shape as a, where each slot of bool_idx tells 
# whether that element of a is > 2.
print(bool_idx) # Prints "[[False False]  [ True True] # [ True True]]" 
# We use boolean array indexing to construct a rank 1 array 
# consisting of the elements of a corresponding to the True values # of bool_idx 
print(a[bool_idx]) # Prints "[3 4 5 6]"
Lets see another example:
>>>a=np.arange(12).reshape(3,4)
>>> b1= np.array([False,True,True]) # first dim selection 
 >>> a[b1,:] # selecting rows 
array([[ 4, 5, 6, 7], [ 8, 9, 10, 11]]) 
Iterating over multidimensional arrays is done with respect to the first axis:
a = np.array([[1,2], [3, 4], [5, 6]])
>>> for row in a:
... print(row)
...
[1 2]
[3 4]
[5 6]

However, if one wants to perform an operation on each element in the array, one can use the flat attribute which is an iterator over all the elements of the array:
>>>for element in a.flat:
                     ... print(element)  
 ... 
6
Datatypes
Every numpy array is a grid of elements of the same type. Numpy provides a large set of numeric datatypes that you can use to construct arrays. Numpy tries to guess a datatype when you create an array, but functions that construct arrays usually also include an optional argument to explicitly specify the datatype. Here is an example:
import numpy as np 
 x = np.array([1, 2]) # Let numpy choose the datatype
 print(x.dtype) # Prints "int64" 
 x = np.array([1.0, 2.0]) # Let numpy choose the datatype 
print(x.dtype) # Prints "float64"
 x = np.array([1, 2], dtype=np.int64) # Force a particular datatype 
print(x.dtype) # Prints "int64"
Basic Operations
Arithmetic operators on arrays apply element wise. A new array is created and filled with the result.
>>> a = np.array( [20,30,40,50] )
 >>> b = np.arange( 4 )
 >>> b 
array([0 1 2 3]) 
>>> c = a-b         
>>> c array([20 29 38 47])
 >>> b**2 
array([0 1  4 9]) 
>>> 10*np.sin(a) 
array([ 9.12945251  -9.88031624  7.4511316  -2.62374854])
 >>> a<35 
array([ True True False False])
Basic Arithmetic
import numpy as np
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64) 
# Elementwise sum; both produce the array  [[ 6.0 8.0]  [10.0 12.0]] 
print(x + y) 
print(np.add(x, y))
# Element wise difference; both produce the array  [[-4.0 -4.0]  [-4.0 -4.0]]
print(x - y) 
print(np.subtract(x, y))
# Element wise product both produce the array [[ 5.0 12.0]  [21.0 32.0]] 
print(x * y)
print(np.multiply(x, y))
# Element wise division; both produce the array  [[ 0.2 0.33333333]  [ 0.42857143 0.5 ]] 
print(x / y) 
print(np.divide(x, y))
# Elementwise square root; produces the array  [[ 1. 1.41421356]  [ 1.73205081 2. ]]
print(np.sqrt(x))
Dot product
v = np.array([9,10]) 
w = np.array([11, 12])
 # Inner product of vectors; both produce 219 
print(v.dot(w)) 
print(np.dot(v, w))
x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])
# Matrix / vector product; both produce the rank 1 array [29 67] 
print(x.dot(v)) 
print(np.dot(x, v)) 
# Matrix / matrix product; both produce the rank 2 array # [[19 22]  [43 50]] 
print(x.dot(y)) 
print(np.dot(x, y))
print(x @ y) #only in python 3.5 or later
Numpy provides many useful functions for performing computations on arrays; one of the most useful is sum:
x = np.array([[1,2],[3,4]]) 
print(np.sum(x)) # Compute sum of all elements; prints "10" 
print(np.sum(x, axis=0)) # Compute sum of each column; prints "[4 6]" 
print(np.sum(x, axis=1)) # Compute sum of each row; prints "[3 7]"
Transpose
print(x.T) # Prints "[[1 3] [2 4]]"
Min and Max
print(a.min) # print 1
print(a.max) #print 4
Universal Functions
NumPy provides familiar mathematical functions such as sin, cos, and exp. In NumPy, these are called “universal functions”(ufunc). Within NumPy, these functions operate elementwise on an array, producing an array as output.
>>> B = np.arange(3) 
>>> B 
array([0 1 2]) 
>>> np.exp(B) 
array([1.  2.71828183 7.3890561 ])
 >>> np.sqrt(B) 
array([0.  1.  1.41421356]) 
>>> C = np.
array([2. -1. 4.]) 
>>> np.add(B, C) 
array([2. 0. 6.])
The ix_() function
The ix_ function can be used to combine different vectors so as to obtain the result for each n-uplet. For example, if you want to compute all the a+b*c for all the triplets taken from each of the vectors a, b and c:
>>> a = np.array([2,3,4,5])
>>> b = np.array([8,5,4])
>>> c = np.array([5,4,6,8,3])
>>> ax,bx,cx = np.ix_(a,b,c)
>>> result = ax+bx*cx
>>>results[3,2,4]
17
Structured arrays
Structured arrays are ndarrays whose datatype is a composition of simpler datatypes organized as a sequence of named fields. For example
#creating a structured array
x=np.array([('Rex', 9, 81.0), ('Fido', 3, 27.0)],dtype=[('name', 'U10'), ('age', 'i4'), ('weight', 'f4')])
#printing the array
print(x)
#printing the first record(structure)
print(x[1])
#printing the names in all records
print(x['name'])
#printing the name in first record
print(x[1]['name'])
Note: Here the string indexing is used for accessing the data
Vector Stacking
How do we construct a 2D array from a list of equally-sized row vectors. if x and y are two vectors of the same length then in NumPy this works via the functions column_stack, dstack, hstack and vstack, depending on the dimension in which the stacking is to be done. For example
>>> x = np.arange(0,10,2)
>>> y = np.arange(5)
>>> m = np.vstack([x,y])
>>> m
array([[0, 2, 4, 6, 8],
       [0, 1, 2, 3, 4]])
>>> xy = np.hstack([x,y])
>>> xy
array([0, 2, 4, 6, 8, 0, 1, 2, 3, 4])
>>> a = np.array([4.,2.]) 
>>> b = np.array([3.,8.])
 >>> np.column_stack((a,b)) # returns a 2D array array([[4., 3.], [2., 8.]])
Splitting one array into smaller ones
Using hsplit, you can split an array along its horizontal axis, either by specifying the number of equally shaped arrays to return, or by specifying the columns after which the division should occur:
import numpy as np
x=np.array([[1,2,3,4],[5,6,7,8]])
y,z=np.hsplit(x,2) #splitting into 2
print(y)
print(z)
[[1 2]
[5 6]]

[[3  4]
[7 8]]
x=np.array([[1,2,3,4],[5,6,7,8]])
y,z,k=np.hsplit(x,(1,3))
print(y)
print(z)
print(k)
[[1]
[5]] 
 
[[2 3]
[6 7]] 
 
[[4]
[8]]
Copies and Views
When operating and manipulating arrays, their data is sometimes copied into a new array and sometimes not. This is often a source of confusion for beginners. There are three cases:
No copy at all
x=np.array([1,2,3,4])
y=x
print(id(x))
print(id(y))
x[1]=5
print(x,y)
32677040
32677040
[1 5 3 4] [1 5 3 4]
it is noted that in the above case both x and y refer to the same memory location
View or Shallow Copy
Different array objects can share the same data. The view method creates a new array object that looks at the same data.
x=np.array([1,2,3,4])
y=x.view()
print(id(x))
print(id(y))
x[1]=5
print(x,y)
30054832
29911504
[1 5 3 4] [1 5 3 4]
It is noted that slicing an array will return the view or shallow copy
 y=x[:]
Deep Copy
The copy method makes a complete copy of the array and its data.
x=np.array([1,2,3,4])
y=x.copy()
print(id(x))
print(id(y))
x[1]=5
print(x,y)
30146912
30196080
[1 5 3 4] [1 2 3 4]
Array Broadcasting
Broadcasting is the name given to the method that NumPy uses to allow array arithmeticbetween arrays with a di erent shape or size. Although the technique was developed for NumPy,it has also been adopted more broadly in other numerical computational libraries, such asTheano, TensorFlow, and Octave. Broadcasting solves the problem of arithmetic between arrays of differing shapes by in effect replicating the smaller array along the last mismatched dimension.
For example, suppose that we want to add a constant vector to each row of a matrix. We could do it like this:
import numpy as np 
 # We will add the vector v to each row of the matrix x, 
# storing the result in the matrix y 
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]]) 
v = np.array([1, 0, 1]) 
y = np.empty_like(x) # Create an empty matrix with the same shape as x 
 # Add the vector v to each row of the matrix x with an explicit loop 
for i in range(4): 
    y [i, :] = x[i, :] + v  
print(y)
 [[ 2 2 4] 
 [ 5 5 7] 
 [ 8 8 10] 
 [11 11 13]] 
This works; however when the matrix x is very large, computing an explicit loop in Python could be slow. Note that adding the vector v to each row of the matrix x is equivalent to forming a matrix vv by stacking multiple copies of v vertically, then performing element wise summation of x and vv. We could implement this approach like this:
import numpy as np # We will add the vector v to each row of the matrix x, 
# storing the result in the matrix y 
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1]) 
vv = np.tile(v, (4, 1)) # Stack 4 copies of v on top of each other 
print(vv) 
# Prints "[[1 0 1]  [1 0 1]  [1 0 1]  [1 0 1]]" 
y = x + vv # Add x and vv elementwise 
print(y) # Prints "[[ 2 2 4  [ 5 5 7]  [ 8 8 10]  [11 11 13]]"
Numpy broadcasting allows us to perform this computation without actually creating multiple copies of v. Consider this version, using broadcasting:
import numpy as np 
 # We will add the vector v to each row of the matrix x, 
# storing the result in the matrix y
 x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]]) 
v = np.array([1, 0, 1]) 
y = x + v # Add v to each row of x using broadcasting
 print(y) 
[[ 2 2 4] 
 [ 5 5 7]
[ 8 8 10] 
[11 11 13]]
consider another example
import numpy as np
x=np.array([[1,2],[3,4]])
y=np.array([1,0])
print(x+y.reshape(2,1))
[[2 3]
 [4 5]]
Broadcasting typically makes your code more concise and faster, so you should strive to use it where possible.
Simple Linear Algebra  Operations
>>> import numpy as np
>>> a = np.array([[1.0, 2.0], [3.0, 4.0]])
>>> print(a)
[[1. 2.]
 [3. 4.]]
>>> a.transpose()
array([[1., 3.],
       [2., 4.]])
>>> np.linalg.inv(a)
array([[-2. ,  1. ],
       [ 1.5, -0.5]])
>>> u = np.eye(2) # unit 2x2 matrix; "eye" represents "I"
>>> u
array([[1., 0.],
       [0., 1.]])
>>> j = np.array([[0.0, -1.0], [1.0, 0.0]])
>>> j @ j        # matrix product
array([[-1.,  0.],
       [ 0., -1.]])
>>> np.trace(u)  # trace
2.0
>>> y = np.array([[5.], [7.]])
>>> np.linalg.solve(a, y)
array([[-3.],
       [ 4.]])
>>> np.linalg.eig(j)
(array([0.+1.j, 0.-1.j]), array([[0.70710678+0.j        , 0.70710678-0.j        ],
       [0.        -0.70710678j, 0.        +0.70710678j]]))
Returns
    The eigenvalues, each repeated according to its multiplicity.
    The normalized (unit "length") eigenvectors, such that the
    column ``v[:,i]`` is the eigenvector corresponding to the
    eigenvalue ``w[i]``


Comments

  1. Thanks for sharing Great info...Nice post.

    Python

    ReplyDelete
  2. I am really happy with your blog because your article is very unique and powerful for new.
    Python Training In Bangalore

    ReplyDelete
  3. Thanks for sharing this useful information with us, if any one interested in Python, please visit our website: Python Training in Hyderabad

    ReplyDelete
  4. Thanks for sharing such informative blog.
    Become a Data Science expert with us. study Data Science Course in Hyderabad with Innomatics where you get a great experience and better knowledge.

    ReplyDelete
  5. Nice blog. Informative and knowledgeable content. Keep sharing more articles with us. Thanks for sharing this article.
    Data Science Courses

    ReplyDelete

Post a Comment

Popular posts from this blog

Classes and Objects in Python

Python is an object-oriented programming language, which means that it provides features that support object-oriented programming. The basic components of object oriented programming are classes and objects. A Class is a blue print to create an object. It provides the definition of basic attributes and functions of objects. Object is a running instance of the class having the identity(name), properties( values) and behaviors(functions). The Object oriented program thus consist of object definitions (classes) and most of the computations and functions are mentioned as operations on the object. Each object definition corresponds to some object or concept in the real world, and the functions that operate on these object correspond to the ways real-world objects interact. We have learned objects of string, list, tuple etc…and used the properties and functionalities of these objects which are built into the Python. Now we are going to create our own(user defined) objects. Th

Identifiers,Variables and Keywords

Identifiers An identifier is a name given to entities like variables,functions,class, functions etc. It helps to differentiate one entity from another. Rules for writing identifiers 1.Identifiers can be a combination of letters in lowercase (a to z) or uppercase (A to Z) or digits (0 to 9) or an underscore _.  Names like myClass, var_1 and print_this_to_screen, all are valid example. 2.An identifier cannot start with a digit. 1variable is invalid, but variable1 is a valid name. 3.Keywords cannot be used as identifiers. Eg: global = 1 is invalid 4.We cannot use special symbols like !, @, #, $, % etc. in our identifier. a@=0 is invalid 5.An identifier can be of any length. Things to Remember Python is a case-sensitive language. This means, Variable and variable are not the same. Always give the identifiers a name that makes sense. While c = 10 is a valid name, writing count=10 would make more sense, and it would be easier to figure out what it represents when you look at your code after

Files in Python , Exception handling

While a program is running, its data is in main memory. When the program ends, or the computer shuts down, data in memory disappears. To store data permanently, you have to put it in a file. Files are usually stored on a secondary storage device(hard disk, pen drive, DVD,CD etc).  When there are a large number of files, they are often organized into directories (also called “folders”). Each file is identified by a unique name, or a combination of a file name and a directory name.  By reading and writing files, programs can exchange information with each other and generate printable formats like PDF. Working with files is a lot like working with books. To use a book, you have to open it. When you’re done, you have to close it. While the book is open, you can either write in it or read from it. In either case, you know where you are in the book. Most of the time, you read the whole book in its natural order, but you can also skip around. All of this applies to files as well.