External Files and why you need them
Every program has to deal with data. This data
can come in different amounts and in different types.
- For very small amounts of data, you can just type
the data in yourself into the program code.
x = 0
y = 5
print(x, y)
If you want to change the data you have to edit the
source code.
- For slightly larger amounts of data it makes sense
to ask the user for it. This is also a better solution
than the "changing the program" solution. Any time you
edit the program source, you may introduce a bug.
x = float(input("Enter a number "))
y = input("Enter a word")
print (x, y)
Every time the program runs, the data can be different.
- for larger amounts of data still, there needs to be
a different approach. You won't ask a user to enter 50
numbers, and you wouldn't find one willing to do it either!
And if they were willing, what are the odds they will
make a typo before they are done and have to start over?
So your program must find a way to handle moderate
to large amounts of data. Usually this is done with
something called an "external file".
The reason for "external" is that the data is in another
file, NOT inside the source code (.py) file.
As you might expect, the file of data should be in
the same folder as your .py file.
Reasons external files are good:
- they can be created by any text editor, so that
you can correct mistakes in your data; you don't
have to type the data all at one time and perfectly.
- they can be HUGE. they are stored on secondary storage
devices like hard drives, so they can even be larger than will
fit in RAM at one time!
- they can be used for input to many different programs,
so that the data can be more useful.
- a program can use different input files at
different runs, to give different results, without having
to change the program at all! The program is independent
of the data file.
- output files can be created by programs. This means
that the output of the program does not (necessarily)
show up in the Shell window; instead it is saved
for later use in a text file. What "later use"?
perhaps it will be printed or edited or used as input
by yet another program.
When you use external files in a program (in any language!)
you have to do three things:
- open the file
- manipulate the data from (or for) the file
- close the file
- Opening a file
-
Associates your file stream variable with the external (disk) name for the file - tells the operating system that you want that name
-
If the input file does not exist on disk, open is not successful
-
If the output file does not exist on disk, a new file with that name is created
-
If the output file already exists, it is erased with "w" mode!
- With the append mode, if the file does not exist it is created.
- with the append mode, if the file does exist the data is added
to the end of the file.
- Manipulating the data
- if the file is for input, then you have to read
the data into memory so it can be used. Python provides
several ways to do that.
- once the data is in memory, use it like any other variable
- If the file is for output, then you have to write
the data from memory out to the file.
- Closing the file
- This is an important action!
- Some operating systems
do not tidy things up as they should when a program ends.
- Make sure your files are closed when you are done with them.
- Don't forget the parentheses in the close method call!
What's a buffer?
-
Hard drives are slow, CPU and RAM are fast = bottleneck
-
If you're getting some data from HD, why not get a good-sized amount of it?
-
The OS sets aside some RAM (buffer) and stores the file data in it until your program asks for it, then provides it as requested
-
If the buffer empties then OS gets some more from the hard drive.
-
The OS maintains a "buffer pointer" which keeps track of what characters have been given to the program and which haven't
-
The pointer tells the OS when the buffer is empty
-
All input and output functions affect (move) the buffer pointer
File input
-
When a file is opened for reading, a buffer-full of data is pulled in from the secondary storage device
-
A buffer pointer is placed in the stream at the beginning of the file contents
-
Input commands issued on the stream affect the buffer pointer position in the buffer
-
Sequential access, starts at front of buffer, moves pointer forward through data
File Output
-
Your output file buffer is filled by program commands like print or write
-
Output data is placed into a disk file from your output file buffer when the buffer is full or when the program ends
- THAT is why the close command is important!
- without a close call, a buffer of data can be left
in RAM and not flushed to the storage
device!