Europrobe Workshops - PANCARDI (PANnonian-CARpathian-DInarides system)
Stará-Lesná, High Tatra Mts., Slovakia;    1995, October 21-25

Maistrello M.(1), DeFranco R.(1), Schettini R.(2), Mazzon L.(2), Floris B.(3), Geyko V.(4)


 
Digitization of photographic records: an example from Carpathian old D.S.S. data

Poster and oral presentation, by M.Maistrello

Abstract :
    The old DSS data collected since the fifties in the former USSR represent a treasure of information that must be preserved and used as input for the new crustal surveys.
    On the occasion of a visit to Milan of Prof. V.Geyko of the Ukrainian Academy of Sciences, we jointly performed a digitization and processing test of some seismic traces recorded on photographic paper along a DSS profile across the Eastern Carpathians during 1966-'67. The entire section (Beregovo-Dolina-Shepetovka) is 250 Km long and consists of 650 on-paper seismograms, containing 50 traces each: only a little part of it was examined, in order to explore the possibilities to acquire and process such 'images'.
    The phases of the experiment are described as well the employed techniques: the output consists of a series of vectorized seismic traces that can be treated using some usual commercial packages of geophysical software for PC (DOS/Windows environment). Considering the single seismographic sheet like an image, the poster illustrates the methodology, the hardware and software employed both in the phase of image acquisition and in the other (much more time consuming and complex) of its processing in different environments. The poster is documented also with some pictures. Finally, a proposal for the realization of an ad-hoc processing Centre for this type of data is presented.



(1)    Italian National Research Council (CNR); Research Institute on Seismic Risk (IRRS) - Milan -
(2)        "            "                "              "           "             "               "         "  Multimedia Information Technology - Milan -
(3)    Milan State University: Environment's Sciences Dept. - Milan -
(4)    Ukrainian National Academy of Sciences, Institute of Geophysics - Kiev -

Background:
    During the '60-'80s, the Ministry for Geology in the former SU committed vast funding to the investigation of the continental crust and upper mantle by seismic methods (mainly refraction and wide angle reflection). In some cases the energy sources were nuclear. The former-SU is criss-crossed by a network of seismic lines more detailed than in any other part of the world, and an extremely valuable seismic database for future co-operative research could be available ( photo 1 ). Unfortunately, most of the seismic records do exist only in analogue form and many only on photographic paper ( photo 2 ). So we started to gather documentary evidence on some cheap solution, discuss with some Colleague expert in image processing, look for a large, free and available scanner to make the test. Finally we draw our solution.

Our proposal
    It was clear what the problem was: we could imagine a picture like that shown in photo 3. So, taking into account all the the elements, i.e.: the problem itself, the available hardware and software in our Milan research network, the human resources, the character of prototype of such test, the formal correctness of the various processing step, for the retrieval of the paper-seismograms we propose such working plan :
 

  N. 
phase
software
environment
file type
1
Image ACQUISITION I/SCAN®
Unix
RLE/TIF
2
QUALITY control TEXTure
Unix
TIF/TGA
3
VECTORIZATION GEOSCAN®
DOS
TIF/GSC
4
Image EDITING AutoCAD®
DOS
DXF
5
NUMERICAL PROCESSING MATLAB®
DOS
ASCII
6
Digital Seismic Data-Base ?? in progress
DOS
ASCII

1.    Image ACQUISITION
       Due to the various dimensions of the original paper-records (from 40x60 up to 40x100 cm) we need of a big Scanner (A0). The hardware used for this test (property of the italian CNR) is shown in the photo 4. It's composed by the scanner EAGLE model 4080, connected with the Unix Work-Station INTERGRAPH model IA-2020, provided with the necessary software for the acquisition and the pre-processing of data. The main features of the hardware are summarized below:
•    COMPUTER:    InterAct Work-Station, model IA-2020; 10 MIPS, RAM: 64 MB, HD: 800 MB, Graph. Z-Buffer; double 19" colour monitors; main proc.: C300; I/O proc.: 80386; ASAP (Appl. Special Accel. Proc.); bus VME.
•    SCANNER:    EAGLE mod. 4080; doc. Size: 44"xROLL; scan Width: 40"; max. resol.: 1600 DPI; interpol. DPI: 1-1600; CCDs N°: 10; scan Speed: ("/min): 12.8 (200 dpi), 3.2(800), 1.6 (1600); scan accuracy: +/- 0.02"; paper skew: +/-0.05%; data format: CCITT groups 3,4,ANA (LRD), 8-bit gray-scale (0-255), Intergraph's RLE; Illum.: quartz-halogen; MTBF: >5,000 hrs.

        Of course, for the single best acquisition, a set of well considered parameters must be selected and set; a few used for this test were:

SETTINGS
SCAN  SET UP
OUTPUT  DEFINITION
Threshold [ 0-255 ]        210 Resolution                         200 Destination
Sharpness [ 0-100 ]           10 Units                                   dpi File transfer
% Scan speed [ 0-100 ]     80 Output Raster Format       TIFF uncompressed Disk-File location (Dir.,Filename)
Minimum Run Length size [ 0-20 pixel ]   2 Orientation
Dynamic Threshold [ ON/OFF ]     ON Polarity
Mirror
Scan Region definition (Xm,XM,Ym,YM,Units)
Furthermore, a pre-processing of data was carried out, in order to get experience on Byte dimensions of files, local raster-Editing facilities ( photo 5 ) for a quality improvement of the images, and a first tentative of vectorization of seismograms ( photo 6 ) using a proper I/SCAN module.

TIME and BYTE considerations for this Test acquisition
        The image UKR02 (scan region size 35x78 cm) needed about 5 min. for the optimum choice and set-up of the various parameters (executing a small test-acquisition, pre-viewing the image, re-setting the parameters, performing a small acquisition again, and so on.), and about 6 min. for the final acquisition; total of 11 min.  For the second image UKR01 (larger and darker) the total time has been of 13 min. This way the two test-images have been acquired by Intergraph WS, using it's own data-format (R.L.E.: Random Length Encoded). At the end the two files were saved in TIFF uncompressed format, for a better exchange of data.  The final dimensions (in Bytes) of the two images have been:

image
TIF uncomp.
R.L.E.
UKR01
10.550.016
2.303.492
UKR02
9.776.210
2.464.898
(the different size of TIF files reflect the more complexity of UKR01 raster image; on the other hand it's less size in R.L.E.format is due to its minor encoding by the I/SCAN software during the acquisition, because of its major roughness).

2.    automatic QUALITY control
        In order to check the possibility to quantify the different complexity of the images (that is related to the reliability of the digitizing system), an automatic Q-control tool was built up and tested, using the CNR-ITIM facilities and technologies. This part of processing could be used in the future as C.A.S.E. decision-support tool (Computer Aided Seismogram Evaluation) for the paper-records analysis and pre-selection, in order to optimize the whole process of data acquisition. The procedure ( photo 7 ) has been the following:

Image N°
Roughness
Line liken. 1
Line liken. 2
Line liken. 3 
1
0.65
1.67
1.64
1.64
2
0.76
1.58
1.53
1.55
3
0.64
1.48
1.42
1.41
4
0.61
1.39
1.36
1.34
5
0.74
1.26
1.22
1.2
6
0.81
1.27
1.25
1.24
7
0.66
1.46
1.3
1.24
8
0.79
1.44
1.41
1.41
9
0.92
1.3
1.28
1.27
10
1.04
1.32
1.3
1.29

This analysis is at first stage: more work must be done in order to define better the various features; in any case, also with the limitation of this first experiment, one can deduce that the Line likeness is a better descriptor than the Roughness to describe the complexity of one seismic paper-record. In the future, and if it will be decided to go deep into this research, we could realize a good prototype of an Automatic Seismic Evaluator and/or Descriptor to be used as a tool for decision support.

 3.    C.A.V.: Computer Aided Vectorization: GEOSCAN
    This 'critical' point of the whole procedure must be carefully treated. In this our approach, we were helped by our geologist Colleagues, who daily use the software package GEOSCAN® to vectorize the isolines in doing their research jobs using thematic maps. So, to extract the single seismograms from the whole raster image (vectors the seismograms) we used such package ( photo 8 ). The scheme of the digitization phase is this:

        A - IMPORT .tif file               TIF-GSC
                                                                                                        |
        B - VECTORIZATION         GEOSCAN
                                                                                             |                    |
        C - EXPORT File       ELATODXF    |
                                                                                     |                            |
                  Drawing Interchange              |                     |          Elan Interface
                               File  Format        .DXF           .EIF       File

                                               for AutoCAD               ASCII file for MATLAB

3.A    Raster TIF-image import
        For the digital acquisition, the GEOSCAN import facility requests some important RUN parameters, both for INPUT (such as TIF File name, Scan resolution, Scale value) and OUTPUT (such as GSC File name). At the end a GSC-raster file is created.

3.B    Raster to Vector application (GEOSCAN)
        This is the 'hearth' of the whole procedure; the description of the software package is described in [5]; a good example is reported in photo 9. Here we recall only the logic structure of Geoscan (defined by the Authors as Interactive Vectorization of Cartographic Drawings):
 
                CONFIGURATION:

Colours combining
Palette definitions
Vectorization's parameters
Coding setting
Options

                TRANSFORMATIONS:

Vector DB orientation
Plot File
E.I.F. File
D.X.F. File

                VECTORIZATION:

Curvilinear solid/dashed line
Full Area
Primed Area
Broken line
Vector edit
Symbols
 
Rectangular
 
 
Unit read
Symbols
 
Squared
 
 
Unit search
Elevation point
 
Irregular
 
 
Vertex displacement
Place names
 
Double axis
 
 
Unit displacement
 
 
Minimum area
 
 
Vector delete
 
 
Symbol area
 
 
Vertex delete
 
 
Manual acquisition
 
 
Vertex insert
 
 
Options
 
 
 
                UTILITIES:
Print image Vertexes
Print N° of vectors/code
Automatic orientation
Thematic visualization
Hard copy grid creation
Compensation network
Data-Base control & edit
                VISUALIZATION:
Refresh
Window change
Vectors
Magnifying lens
Whole picture
Options
3.C        Export file:
        Before export the file, some time-references lines were drawn, for the proper time interpretation in the following numerical processing. These lines were drawn on the basis of time-lines present on the original copies. Finally, the GEOSCAN facilities can write Output files in two useful formats: EIF format (Elan Interface Format, ASCII, for numerical processing) and DXF format (Drawing Interchange File, for 'AutoCAD' applications). The first one is used in order to process the numerical data (coordinates-values) by the usual commercial math. packages, after small ASCII editing; for this test we successfully used MATLAB® software, and the whole procedure is shown at point 5. The 2nd format permits useful image-editing (for instance by AutoCAD software, see next point 4).

4.        Image Editing
        For a raw plot and editing of the original raster image, an output DXF File is created by Geoscan. It's the standard format for the well known AutoCAD® software package. This is an optional step, and should be useful only if there are many corrections to make on the original traces (e.g. wrong 'spatial' (distance) or 'time' positions). This was carried out using the AutoCAD rel. 12 version; with this it was possible also a faster exchange of data, from GEOSCAN output-file and AutoCAD input. The photo 10 shows some editing phases.

5.        Numerical processing
        The vectorized output files from GEOSCAN (EIF format, that is ASCII format) have been processed by MATLAB® package. Our final goal was to create a unique file including all the traces regarding a seismic line, corrected in shape and time, well positioned in space, to be easily plotted into the so-called Section-film. So, this final processing followed these steps:
a)    some initial ASCII editing of the input EIF files (to check header values, no extra characters, etc.)
b)    raw data plotting of X/Y co-ordinates and Time tick-marks
c)    reprocessing of X/Y data:  X-values monotonic and equally-spaced, X/Y cubic spline interpolation
d)    X to Time conversion, with the calculation of: T0 (time of the 1st digit), FRQ (re-sampling rate, interactive), TREND removal
e)    new math. processing (filter, spectra, reduced section-film)
        The final photos 11, 12, and 13 show some elements of the used procedure and its results.

Conclusive remarks
        We whould like to finish this presentation with some results and comments and suggestions for the best continuation of such activitity.
•    The modern technologies enable us to acquire such big quantity of paper-data by scanner with no problem, in terms of Hw, Sw and storage;
•    To successfully process a big amount of such seismic data, showing significant differences in terms of complexity, a pre-view and selection phase is advisable, in order to choose the only papers that could be vectorized with only a few manual intervention by the operator; for this purpose a previous automatic texture-analysis is suggested.
•    A good quality vectorization can be successfully done with commercial software packages running on PC-Windows machines; on a clear raster picture, and after a good parameters choise, the operation is carried out mostly in automatic way.
•    Modern format-filters permit easy operation both in compress/uncompress the files and/or re-formatting the different file-types under different plattform requirements.
•    The image-editing can be carried out by several common packages that can read TIF or DXF format.
•    The final numerical processing needs a little programming in order to:
            - extract the only pairs of coordinates data, and at least 2 time-reference values (for re-sampling procedure)
            - convert X/Y coordinates into Time/Distance information
            - correct Time info and calculate T0 and Freq
            - remove the trend of the traces; execute some linear interpolation; filter the data
            - plot the new corrected seismic section-film
            - write the new digital correct seismic trace, adding correct Header information
            - create the associated seismic DataBase on fast storage devices.

References

[1]    Autodesk AG: "The graphic system AutoCAD®" - rel. 12, 1995
[2]    Bradley J.: "xv®" - rel. 3.10a, 1994
[3]    Corel Corp.: "Corel Photo Paint ™" - rel. 5.00E2, 1994
[4]    Helbig K., Treitel S.: "Pattern recognition & image processing" Handbook of geophysical exploration, sec. I, vol.20, 1987
[5]    Landser P.: "GEOSCAN®: Computer Aided Vectorization software" - rel. 7.4, 1995
[6]    the Math Works Inc.: "MATLAB®: High performance Numeric computation and visualization software" - rel. 4.2c, 1992
[7]    Mazzon L.: "Indicizzazioni di Texture", graduation Thesys, Milan State University, Dept. of information Science, in progress
[8]    Sollogub V.B., Prosen D., Zounkova M. et al.: "Crustal structure of Central and Southeastern Europe by data of explosion Seismology", in:   Tectonophysics, vol.20 (1-4), 1973, pagg. 1-33
[9]    Tamura H., Mori S., Yamawaki T.: "Textual features corresponding to visual perception", in: I.E.E.E. Transaction on systems man and cybernetics, vol.SMC-8, N°6, 1978
[10]    Tuceryan M., Jain A.K.: "Texture Analysis", in: C.H.Chen, L.F.Pan and S.P.Wang (eds.): "Handbook of Pattern Recognition and Computer vision" (World Scientific Company), 1994
[11]    Ward P.: "SUDS, the Seismic Unified Data System", in: AGU-EOS transaction, vol.74, N°37, 1993.


Photo 1:     A window on former-SU seismic surveys
Some of the seismic profile lines on the territory of Central and Southeastern Europe (based on the International Tectonic Map of Europ, Moscow, 1962);   legend:  1=international DSS profiles, 2=national DSS profiles, 3=recording stations, 4=shot points, 5=boundaries of geologic regions, 6=planned DSS profiles.

Photo 2:    An example of an old seismic line.
A short piece of the ukrainian profile III (Beregovo-Dolina-Vishnevetz, 1960) built up by means of 6 papers composition, showing the good mean quality of data.

Photo 3:   Our technical Proposal.
Instrumental history of the Seismic data acquisition and processing: as one can see, our task regards the first 'seismic era' of data recording systems.

Photo 4:    Paper-records acquisition Station .
Paper-records raster acquisition station: you can see the Eagle big scanner and the Intergraph Work Station.

Photo 5:    Raster-Edit.
First digital raw processing on the first raster image: many edit-tools are available into I/SCAN software; the most important modification on the raster image acquired, is to 'separate' (when and where possible) the seismic traces from the many ortogonal lines (time reference lines, generally one every 0.1 sec.): that means 'cut' the time-lines (see details on frames 1, 3 and 4).

Photo 6:    Raster to Vector: 1st approach.
In the same acquisition Station environment, the I/SCAN software provieds a good series of editing and vectorization tools. But if one hasn't carefully ' cut the time-lines, the automatic trace-follower vectors could follow wrong directions (see magnified examples in frames 7 and 8). The frame 6 represents the 'ideal' situation.

Photo 7:    Quality-Control
The selected 10 images are shown. After the best choice of the editing parameters (2nd-up frame) the 10 images have been read. In the lower frames you can see: UP the selected 10 windows, DOWN the same windows after a special pre-filter process (note the good example of the frame 7 or 10, where most of the time cross-lines were automatically erased).

Photo 8:    GEOSCAN at work.
After the image importation and a good parameter choice, the GEOSCAN automatic vectorization tool creates the desired digital seismograms. But, during the work, the operator is often requested to help and/or decide the best way to follow. In these cases, it's very useful the small magnifying glass on line (see the two left lower frames).

Photo 9:    Vectorization is an Art.
Here you see an example of a full seismic record (composed by 48 traces, 30" long each) successfully ditized. Also some time-reference lines are drawn (one per second) after their 'mark' choice from the originals (as in frame 1).

Photo 10:    AutoCAD Editing (optional).
It could be useful to add more graphical information (and/or edit some graphic vectors) to the image under processing.

Photo 11:    The final step: MATLAB - 1
Using this powerful package, you can interactively write the programmin lines and see the effects soon. After a first ASCII-file import, and first raw plot (frame 1), we start to work on a single trace, and after a re-sampling phase we can plot it without any trend-line. Until now the scales values have no physical meaning.

Photo 12:    MATLAB - 2
After other programming steps, and linked to the precise time values of the reference-lines, the X-scale is well ordered into Time units (this case: seconds, frame 1) and so we can easily compute its FFT (frame 2); then we move to Y-scale processing: from the original record we obtain the right information about the relative distances to include in the actual processing step, so the trace is well positioned in the time and distance coordinates ('dromocrone' space); then we apply the same procedure for all the traces, in a loop process, and plot the whole section (frame 3, with Y-values (now it's time) in absolutes values; the two frames below show two ways of standard representations, with a reduction velocity of 6 Km/s (left) and 8 (right).

Photo 13:    MATLAB - 3 the final SECTION-FILMS !
This final picture shows the last informations of the preceding photo: the upper frame is the digital representation of a segment of the corrected section, starting from UKR02 image: the X-labels are the true distances, calculated from the absolute position of the geophones with reference to the various shot-points; the lower frame is the same data-set represented with a reduction velocity of 8 km/s. For both we used a standard band-pass filter of 2-20 Hz.