
Introduction to Sequential Files
|
Introduction
|
Aims
|
Files are repositories of data that reside on backing
storage (hard disk, magnetic tape or CD-ROM).
Nowadays, files are used to store a variety of different types
of information, such as programs, documents, spreadsheets, videos,
sounds, pictures and record-based data.
Although COBOL can be used to process these other
kinds of data file, it is generally used only to process record-based
files.
In this, and subsequent file-oriented tutorials,
we examine how COBOL may be used to process record-based files.
There are essentially two types of record-based
file organization:
-
Serial Files (COBOL calls these Sequential Files)
-
Direct Access Files.
In a Serial File, the records are organized and
accessed serially.
In a Direct Access File, the records are organized
in a manner that allows direct access to a particular record without
having the read any of the preceding records.
In this tutorial, you will discover how COBOL may
be used to process serial files.
|
Objectives
|
By the end of this unit you should -
- Understand concepts and terminology like file, record, field
and record buffer.
- Be able to write the file and record declarations for a Sequential
File.
- Understand how READ verbs works
- Be able to use the READ, WRITE,
OPEN and CLOSE verbs
to process a Sequential File.
|
Prerequisites
|
Introduction to COBOL
Declaring data in COBOL
Basic Procedure Division Commands
Selection Constructs
Iteration Constructs
|
|
Introduction
to record-based files
|
Introduction
|
COBOL is generally used in situations where the
volume of data to be processed is large. These systems are sometimes
referred to as “data intensive” systems. Generally, large volumes
arise not because the data is inherently voluminous but because
the same items of information have been recorded about a great
many instances of the same object. Record-based files are used
to record this information.
|
Files,
Records, Fields
|
-
We use the term file, to describe a
collection of one or more occurrences (instances) of a record
type (template).
-
We use the term record, to describe
a collection of fields which record information about an object.
-
We use the term field, to describe an
item of information recorded about an object (e.g. StudentName,
DateOfBirth).
|
Record
instance vs Record type
|
It is important to distinguish between a record
occurrence (i.e. the values of a record) and the record type or
template (i.e. the structure of the record).
Each record occurrence in a file will have a different
value but every record in the file will have the same structure.
For instance, in the student details file, illustrated
below, the occurrences of the student records are actual values
in the file. The record type/template describes the structure
of each record occurrence.

|
The
record buffer
|
Before a computer can do any processing on a piece of data, the
data must be loaded into main memory (RAM). The CPU can only address
data that is in RAM.
A record-based file may consist of hundreds of thousands, millions
or even tens of millions of records, and may require gigabytes of
storage. Files of this size cannot be processed by loading the whole
file into memory in one go. Instead, files are processed by reading
the records into memory, one at a time.
To store the record read into memory and to allow access to the
individual fields of the record, a programmer must declare the record
structure (see the diagram above) in his program. The computer uses
the programmer's description of the record (the record template)
to set aside sufficient memory to store one instance of the record.
The memory allocated for storing a record is usually called a "record
buffer".
A record buffer is capable of storing the data recorded for only
one instance of the record. To process a file a program must read
the records one at a time into the record buffer. The record buffer
is the only connection between the program and the records in the
file.

|
Some
implications of "buffers"

|
If a program processes more than one file, a record buffer must
be defined for each file.
To process all the records in an INPUT file,
we must ensure that each record instance is copied (read) from the
file, into the record buffer, when required.
To create an OUTPUT file containing data
records, we must ensure that each record is placed in the record
buffer and then transferred (written) to the file.
To transfer a record from an input file to an output file we must
read the record into the input record buffer, transfer it to the
output record buffer and then write the data to the output file
from the output record buffer. This type of data transfer between
‘buffers’ is quite common in COBOL programs.
|
|
Declaring
Records and Files
|
Introduction

This is for demonstration only. In
reality we would need to include far more items and some of the
fields would have to be considerably larger.
|
Suppose we want to create a file to hold information about the
students in the University. What kind of information do we need
to store about each student?
One thing we need to store is the student's name. Each student
is assigned an identification number; so we need to store that as
well. We also need to store the date of birth, and the code of the
course the student is taking. Finally, we are going to store the
student's gender. These items are summarized below;
- Student Id
- Student Name
- Date of birth
- Course Code
- Gender
|
Creating
a record
|
To create a record buffer large enough to store one instance of
a record, containing the information described above, we must decide
on the type and size of each of the fields.
- The student identity number is 7 digits in size so we need to
declare the data-item to hold it as PIC 9(7).
- To store the student name, we will assume that we require only
10 characters. So we can declare a data-item to hold it as PIC
X(10).
- The date of birth is 8 digits long so we declare it as PIC 9(8).
- The course code is 4 characters long so we declare it as PIC
X(4).
- Finally, the gender is only one character so we declare it as
PIC X.
The fields described above are individual data items but we must
collect them together into a record structure as follows;
01 StudentRec.
02 StudentId PIC 9(7).
02 StudentName PIC X(10).
02 DateOfBirth PIC 9(8).
02 CourseCode PIC X(4).
02 Gender PIC X.
|
The record description above is correct as far as it goes. It reserves
the correct amount of storage for the record buffer. But it does
not allow us to access all the individual parts of the record that
we might require.
For instance, the name is actually made up of the student's surname
and initials while the date consists of 4 digits for the year, 2
digits for the month and 2 digits for the day .
To allow us to access these fields individually we need to declare
the record as follows;
01 StudentRec.
02 StudentId PIC 9(7).
02 StudentName.
03 Surname PIC X(8).
03 Initials PIC XX.
02 DateOfBirth.
03 YOBirth PIC 9(4).
03 MOBirth PIC 99.
03 DOBirth PIC 99.
02 CourseCode PIC X(4).
02 Gender PIC X.
|
In this description, StudentName is a group item consisting of
Surname and Initials, and DateOfBirth consists of YOBirth, MOBirth
and DOBirth.
|
Declaring
a record buffer in your program

|
The record type/template/buffer of every file used in a program
must be described in the FILE SECTION by
means of an FD (file description) entry.
The FD entry consists of the letters FD
and an internal name that the programmer assigns to the file.
So the full file description for the students file might be;.
DATA DIVISION.
FILE SECTION.
FD StudentFile.
01 StudentRec.
02 StudentId PIC 9(7).
02 StudentName.
03 Surname PIC X(8).
03 Initials PIC XX.
02 DateOfBirth.
03 YOBirth PIC 9(4).
03 MOBirth PIC 99.
03 DOBirth PIC 99.
02 CourseCode PIC X(4).
02 Gender PIC X.
|
Note that we have assigned the name StudentFile as the internal
file name. The actual name of the file on disk is Students.Dat.
|
The
SELECT and ASSIGN clause

|
Although the name of the students file on disk is Students.Dat
we are going to refer to it in our program as StudentFile. How can
we connect the name we are going to use internally with the actual
name of the program on disk?
The internal file name used in a file's FD
entry is connected to an external file (on disk, tape or CD-ROM)
by means of the SELECT and ASSIGN
clause. The SELECT and ASSIGN
clause is an entry in the FILE-CONTROL paragraph
in the INPUT-OUTPUT SECTION in the ENVIRONMENT
DIVISION.
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
SELECT StudentFile
ASSIGN TO “STUDENTS.DAT”.
DATA DIVISION.
FILE SECTION.
FD StudentFile.
01 StudentRec.
02 StudentId PIC 9(7).
02 StudentName.
03 Surname PIC X(8).
03 Initials PIC XX.
02 DateOfBirth.
03 YOBirth PIC 9(4).
03 MOBirth PIC 99.
03 DOBirth PIC 99.
02 CourseCode PIC X(4).
02 Gender PIC X.
|
|
SELECT
and ASSIGN syntax for Sequential files

The Select and Assign clause has far more entries
(even for Sequential files) than those we show here; but we will
examine the other entries in later tutorials.

We are only going to deal with statically assigned
file names for the moment, but it is possible to assign a file name
to a file at run-time.
|

The Microfocus COBOL compiler recognizes two kinds
of Sequential File organization
LINE SEQUENTIAL
and
RECORD SEQUENTIAL.
LINE SEQUENTIAL
files, are files in which each record is followed by the
carriage return and line feed characters. These are the kind of
files produced by a text editor such as Notepad.
RECORD SEQUENTIAL files,
are files where the file consists of a stream of bytes. Only the
fact that we know the size of each record allows us to retrieve
them. Files that are not record based, can be processed by defining
them as RECORD SEQUENTIAL.
The ExternalFileReference can be a simple
file name, or a full, or a partial, file specification. If a simple
file name is used, the drive and directory where the program is
running is assumed but we may choose to include the full path
to the file. For instance, we could associate the StudentFile
with an actual file using statements like:
SELECT StudentFile
ASSIGN TO "D:\Cobol\ExampleProgs\Students.Dat"
SELECT StudentFile
ASSIGN TO "A:\Students.Dat"
|
What
is the purpose of the SELECT and ASSIGN clause?
|
The SELECT and ASSIGN
clause allows us to assign a meaningful name to an actual file
on a storage device. The advantage of this is that it makes our
programs more readable and more easy to maintain. If the location
of the file, or the medium on which the file is held, changes
then the only change we need to make to our program, is to change
the entry in the SELECT and ASSIGN
clause.
|
|
COBOL
file handling verbs
|
Introduction
|
Sequential files are uncomplicated. To write programs that process
Sequential Files you only need to know four new verbs - the OPEN,
CLOSE, READ and WRITE.
You must ensure that (before terminating) your program closes
all the files it has opened. Failure to do so may result in data
not being written to the file or users being prevented from accessing
the file.
|
The
OPEN verb

Although, as you can see from the
ellipses in the syntax diagram, it is possible to open a number
of files with one OPEN statement it not advisable to do so. If an
error is detected on opening a file and you have used only one statement
to open all the files, the system probably won't be able to show
you which particular file is causing the problem. If you open all
the files separately, it will.
|

Before your program can access the data in an input
file or place data in an output file, you must make the file available
to the program by OPENing it.
When you open a file you have to indicate how you intend to use
it (e.g. INPUT, OUTPUT,
EXTEND) so that the system can manage the
file correctly.Opening a file does not transfer any data to the
record buffer, it simply provides access.
OPEN notes
When a file is opened for
INPUT or EXTEND, the
file must exist or the OPEN will fail.
When a file is opened for INPUT, the Next
Record Pointer is positioned at the beginning of the file.
When the file is opened for EXTEND, the
Next Record Pointer is positioned after the last record in the file.
This allows records to be appended to the file.
When a file is opened for OUTPUT, it is
created if it does not exist, and is overwritten, if it already
exists.
|
The
CLOSE verb
|
CLOSE InternalFileName...
You must ensure that, before terminating, your program closes all
the files it has opened. Failure to do so may result in some data
not being written to the file or users being prevented from accessing
the file.
|
The
READ verb
|

Once the system has opened a file and made it available to the
program it is the programmers responsibility to process it correctly.
To process all the records in the file we have to transfer them,
one record at a time, from the file to the file's record buffer.
The READ is provided this purpose.
The READ copies a record occurrence/instance from the file and
places it in the record buffer.
READ notes
When the READ attempts to read a record
from the file and encounters the end of file marker, the AT
END is triggered and the StatementBlock following
the AT END is executed.
Using the INTO Identifier clause,
causes the data to be read into the record buffer and then copied
from there, to the Identifier, in one operation. When this
option is used, there will be two copies of the data. One in the
record buffer and one in the Identifier. Using this clause
is the equivalent of executing a READ and
then moving the contents of the record buffer to the Identifier.
|
How
the READ works
|
The animation below demonstrates how the READ
works. When a record is read it is copied from the backing storage
file into the record buffer in RAM. When
an attempt to READ detects the end of file
the AT END is triggered and the condition
name EndOfFile is set to true. Since the condition name is set up
as shown below, setting it to true fills the whole record with HIGH-VALUES.
FD StudentFile.
01 StudentRec.
88 EndOfFile VALUE HIGH-VALUES.
02 StudentId PIC 9(7).
etc
|

|
The
WRITE verb

The WRITE format actually contains
a number of other entries but these relate to writing to print files
and will be covered in subsequent tutorials.
|
WRITE RecordName [FROM Identifier]
The WRITE verb is used to copy data from
the record buffer (RAM) to the file on backing storage (Disk, tape
or CD-ROM).
To WRITE data to a file we must move the
data to the record buffer (declared in the FD entry) and then WRITE
the contents of record buffer to the file.
When the WRITE..FROM is used the data contained
in the Identifier is copied into the record buffer and is
then written to the file. The WRITE..FROM is
the equivalent of a MOVE Identifier
TO RecordBuffer statement followed by
a WRITE RecordBuffer statement.
|
Read
a file, Write a record
|
If you were paying close attention to the syntax diagrams above
you probably noticed that while we READ a
file, we must WRITE a record.
The reason we read a file but write a record, is that a file can
contain a number of different types of record. For instance, if
we want to update the students file we might have a file of transaction
records that contained Insertion records and Deletion records. While
the Insertion records would contain all the student record fields,
the Deletion only needs the StudentId.
When we read a record from the transaction file we don't know which
of the types will be supplied; so we must - READ
Filename. It is the programmers responsibility to discover what
type of record has been supplied.
When we write a record to the a file we have to specify which of
the record types we want to write; so we must - WRITE
RecordName.
|
Example
Program

|
The example program below demonstrates the items discussed above.
The program gets records from the user and writes them to a file.
It then reads the file and displays part of each record.
$ SET SOURCEFORMAT"FREE"
IDENTIFICATION DIVISION.
PROGRAM-ID. SeqWriteRead.
AUTHOR. Michael Coughlan.
* Example program showing how to create a sequential file
* using the ACCEPT and the WRITE verbs and then read and
* display its records using the READ and DISPLAY.
* Note: In this version of COBOL pressing the Carriage Return (CR)
* without entering any data results in StudentDetails
* being filled with spaces.
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
SELECT StudentFile ASSIGN TO "STUDENTS.DAT"
ORGANIZATION IS LINE SEQUENTIAL.
DATA DIVISION.
FILE SECTION.
FD StudentFile.
01 StudentRec.
88 EndOfStudentFile VALUE HIGH-VALUES.
02 StudentId PIC 9(7).
02 StudentName.
03 Surname PIC X(8).
03 Initials PIC XX.
02 DateOfBirth.
03 YOBirth PIC 9(4).
03 MOBirth PIC 9(2).
03 DOBirth PIC 9(2).
02 CourseCode PIC X(4).
02 Gender PIC X.
PROCEDURE DIVISION.
Begin.
OPEN OUTPUT StudentFile
DISPLAY "Enter student details using template below."
DISPLAY "Enter no data to end"
PERFORM GetStudentRecord
PERFORM UNTIL StudentRec = SPACES
WRITE StudentRec
PERFORM GetStudentRecord
END-PERFORM
CLOSE StudentFile
OPEN INPUT StudentFile.
READ StudentFile
AT END SET EndOfStudentFile TO TRUE
END-READ
PERFORM UNTIL EndOfStudentFile
DISPLAY StudentId SPACE StudentName SPACE CourseCode
READ StudentFile
AT END SET EndOfStudentFile TO TRUE
END-READ
END-PERFORM
CLOSE StudentFile
STOP RUN.
GetStudentRecord.
DISPLAY "NNNNNNNSSSSSSSSIIYYYYMMDDCCCCG"
ACCEPT StudentRec.
|
|
Copyright Notice
These COBOL course materials are the copyright
property of Michael Coughlan.
All rights reserved. No part of these
course materials may be reproduced in any form or by any means
- graphic, electronic, mechanical, photocopying, printing, recording,
taping or stored in an information storage and retrieval system
- without the written permission of the
author.
(c) Michael Coughlan
|