External Files and why you need them

Every program has to deal with data. This data can come in different amounts and in different types.

For very small amounts of data, you can just type the data in yourself into the program code.
```
	x = 0
	y = 5
	print(x, y)
```
If you want to change the data you have to edit the source code.
For slightly larger amounts of data it makes sense to ask the user for it. This is also a better solution than the "changing the program" solution. Any time you edit the program source, you may introduce a bug.
```
	x = float(input("Enter a number "))
	y = input("Enter a word")

	print (x, y)
```
Every time the program runs, the data can be different.
for larger amounts of data still, there needs to be a different approach. You won't ask a user to enter 50 numbers, and you wouldn't find one willing to do it either! And if they were willing, what are the odds they will make a typo before they are done and have to start over?
So your program must find a way to handle moderate to large amounts of data. Usually this is done with something called an "external file". The reason for "external" is that the data is in another file, NOT inside the source code (.py) file. As you might expect, the file of data should be in the same folder as your .py file.
Reasons external files are good:
- they can be created by any text editor, so that you can correct mistakes in your data; you don't have to type the data all at one time and perfectly.
- they can be HUGE. they are stored on secondary storage devices like hard drives, so they can even be larger than will fit in RAM at one time!
- they can be used for input to many different programs, so that the data can be more useful.
- a program can use different input files at different runs, to give different results, without having to change the program at all! The program is independent of the data file.
- output files can be created by programs. This means that the output of the program does not (necessarily) show up in the Shell window; instead it is saved for later use in a text file. What "later use"? perhaps it will be printed or edited or used as input by yet another program.

When you use external files in a program (in any language!) you have to do three things:

open the file
manipulate the data from (or for) the file
close the file

Opening a file
- Associates your file stream variable with the external (disk) name for the file - tells the operating system that you want that name
- If the input file does not exist on disk, open is not successful
- If the output file does not exist on disk, a new file with that name is created
- If the output file already exists, it is erased with "w" mode!
- With the append mode, if the file does not exist it is created.
- with the append mode, if the file does exist the data is added to the end of the file.
Manipulating the data
- if the file is for input, then you have to read the data into memory so it can be used. Python provides several ways to do that.
- once the data is in memory, use it like any other variable
- If the file is for output, then you have to write the data from memory out to the file.
Closing the file
- This is an important action!
- Some operating systems do not tidy things up as they should when a program ends.
- Make sure your files are closed when you are done with them.
- Don't forget the parentheses in the close method call!
What's a buffer?
- Hard drives are slow, CPU and RAM are fast = bottleneck
- If you're getting some data from HD, why not get a good-sized amount of it?
- The OS sets aside some RAM (buffer) and stores the file data in it until your program asks for it, then provides it as requested
- If the buffer empties then OS gets some more from the hard drive.
- The OS maintains a "buffer pointer" which keeps track of what characters have been given to the program and which haven't
- The pointer tells the OS when the buffer is empty
- All input and output functions affect (move) the buffer pointer
File input
- When a file is opened for reading, a buffer-full of data is pulled in from the secondary storage device
- A buffer pointer is placed in the stream at the beginning of the file contents
- Input commands issued on the stream affect the buffer pointer position in the buffer
- Sequential access, starts at front of buffer, moves pointer forward through data
File Output
- Your output file buffer is filled by program commands like print or write
- Output data is placed into a disk file from your output file buffer when the buffer is full or when the program ends
- THAT is why the close command is important!
- without a close call, a buffer of data can be left in RAM and not flushed to the storage device!