File handling with Python is a very important topic for GIS programmers. Text files have long been used as an interchange format for exchanging data between systems. They are simple, cross-platform, and easy to process. Comma and tab delimited text files are among the most commonly used formats for text files so we’ll take a look at some of the Python tools available for processing these files. A common task for GIS programmers is to read comma delimited text files containing x,y coordinates along with other attribute information. This information is then converted into GIS data formats such as shapefiles or geodatabases.
To use Python’s built in file processing function you must first open the file. Once open, data within the file is processed using functions provided by Python, and finally the file is closed. Always remember to close the file when you’re done. So, the process is as follows:
Opening Files
In Python the ‘open()’ function accepts a path to the file that you’d like to open along with a mode in which the file will be opened. The most commonly used modes are read, write, and append. This function creates a new File object which can then be iterated to extract or write information.
Python’s open function creates a file object which serves as a link to a file residing on your computer. You must call the open function on a file before reading and/or writing data to a file. The first parameter for the open function is a path to the file you’d like to open. The second parameter of the open function corresponds to a mode which is typically read (‘r’), write (‘w’), or append (‘a’). A value of ‘r’ indicates that you’d like to open the file for read only operations, while a value of ‘w’ indicates you’d like to open the file for write operations. In the event that you open a file that already exists for write operations this will overwrite any data currently in the file so you must be careful with write mode. Append mode (‘a’) will open a file for write operations, but instead of overwriting any existing data it will append data to the end of the file.
Below you will find a list of all the available file modes. As I mentioned in a previous slide the most commonly used are read, write, and append. However, you can also add the “+” to each of the modes to enable read/write capability. The contents of a file can be preserved or deleted depending upon the combination that you use. For example, w+ will open a file for read/write but the contents of the file are deleted while r+ preserves the contents of the file. Adding a ‘b’ to r, w, or a will open a file in Binary mode. Finally, the universal or ‘U’ character applies a universal newline translator.
Universal file mode is an exceptionally useful tool for creating consistency in newline characters. Applications use different characters to indicate a newline and they aren’t consistent between applications. Newline characters might include /r, /n, or /r/n. The universal mode will automatically convert all newline characters to n. Python scripts that open files from various sources should use this mode in conjunction with ‘r’,’w’, or ‘a’ to gracefully handle the different possible newline characters.
Reading Data from Files
After a file is open, data can be read from a file in a number of ways and through various methods. The most typical scenario would be to read data one line at a time from a file through the readline() method (see below). Readline can be used to read the files one line at a time into a string variable. You would need to create a looping mechanism in your Python code to read the entire file line by line. We’ll do just that in a code example later in this article.
If you would prefer to read the entire file into a variable you can use the read() method which will read the file up to EOF marker. The contents of the file are read into a string variable. You can also use the readlines() method to read the entire contents of a file, separating each line into individual strings , until the EOF is found. This method reads the file into a list variable where each line occupies an unique index in the list.
In the case of really large files that might consume all the available memory on the computer running the script the preferred method is to us File.read() with a set number of bytes. The file would be read from beginning to end in this manner until the ‘read()’ method encounters the EOF character.
Handling Delimited Files
There are several ways that you can process delimited data files in Python. In this section we’re going to examine the use of the ‘split()’ function for processing these types of files. The ‘split()’ function, which creates a list from a string based on a delimiter, first requires that you open a file and either read the entire contents of the file into a variable or process the file one line at a time. This function accepts a delimiter as the argument. By default, if you leave the parameter empty, it will split a string based on spaces. In the code example in this section we’re showing you a common scenario wherein we’re working with a comma delimited text file like the one you see below.
We call the ‘split()’ function, passing in a comma. A list is created by this function containing each item as a unique value in the list. The items are divided by a comma. In this case we are only interested in retrieving the latitude, longitude, and confidence values from the delimited text file. By accessing the correct index number we can access each of these individual items. Notice that all of this is being done inside a ‘for loop’ so the split() function is performed once for each line in the file and the individual items are pulled out. Presumably we’d do something additional with this data such as creating a new feature class in ArcGIS containing these individual points.
Close the file
After you’ve completed your read/write operations from a file you should always close the file by using the close( ) method.