NetScan

Netscanlogo.gif

Overview

NetScan is a search service, similar to Internet services like Excite, AltaVista, Infoseek, HotBot, etc. They explore the Internet indexing web pages, NetScan does the same thing for files, printers, workstations, etc. on an AppleTalk network. NetScan indexes the names of everything visible on the AppleTalk network (e.g. printers, workstations, AppleSearch servers, etc.). It also indexes the file names, folder names and text content of any publicly accessible file servers (both AppleShare servers and personal machines using File Sharing). It then lets you search for anything that it has indexed in just a few seconds.

NetScan is useful even if you only have a handful of Macintoshes and a single AppleShare server on your network. However, it really starts to get indispensable when your network contains dozens of zones and hundreds of servers. Imagine being able to find that press release that talks about the xyz product even when you don't know which of a hundred servers its actually stored on.

NetScan is actually made up of two applications. The main application indexes the AppleTalk network and answers queries on that index. Your system administrator typically sets up one copy of this application on a server machine for the entire AppleTalk network. The NetScan Client is a small, easy to use application that every user gets a copy of. It lets users type in what they're looking for and then sends the query to the main NetScan server. The NetScan Client gets back a list of matching files, folders, workstations, or whatever and displays them in a convenient list. You can then double-click on one of them to mount the appropriate volume, select the printer, etc. You can use the client to search for documents that contain certain words or are similar to other documents, or to search for a printer with a specific name. In a sense, NetScan flattens the hierarchical organization of an AppleTalk network and lets you focus on the thing you're looking for, rather than where it's currently located in the physical structure of the network.

Requirements

The NetScan server will run on any Macintosh or MacOS computer with a PowerPC processor. Use MacOS 7.6.1 or MacOS 8. If you have a large network, the server should have 20 megabytes or more of RAM. The NetScan Client application that is run by every user will run on any Macintosh with System 7.1 or later and requires less than one megabyte of memory.

NetScan Administrator's Guide

Warning! You will only want to run one NetScan server for your whole organization or AppleTalk network. Normally the NetScan server is set up and operated by the same people that maintain your network and AppleShare servers. If you are thinking of downloading a copy of NetScan for your company or educational institution, make sure that you have permission from your management! When it's indexing your servers, NetScan will generate a fair amount of network traffic, so you should coordinate with everyone else in your organization to make sure someone else isn't already setting up a NetScan server. You should also make sure that everyone in your organization knows that NetScan will be indexing guest accessible server content and that they know how to check for and turn off guest access on their personal Macintoshes.

An operational NetScan server is actually composed of two separate parts: an indexing engine and a query server. The indexing engine scans the network looking for the names of printers, workstations, etc. to index and the guest-accessible content of file servers to index. The query engine uses an existing index file created by the indexing engine and answers queries sent via remote AppleEvents. The indexing engine and query server are actually the same application run in a different mode on two separate Macintoshes. Every few days, both machines must be shutdown to copy the latest index from the indexing machine to the query machine, replacing that previous index. Then the indexing engine can be restarted to continue scanning for new content.

NetScan has very modest hardware requirements: any PowerPC-based Macintosh and enough disk space to store the index for both the indexing engine and the query engine. The current NetScan server that indexes Apple's internal network is running on a Power Macintosh 6100/66 and an 8100/100 without any problems and tends to build an index that is close to two gigabytes (for >1 million files, 100+ gigabytes of content, 500+ zones, etc.). If you have fewer zones to index, you'll need much less disk space.

The Indexing Engine

The first step in setting up a NetScan server is to set up an indexing engine and get it started indexing all of the guest-accessible content on your network. Depending on the size of the network, the indexing could take anywhere from a few hours to a couple of weeks. Once you have the indexing engine up and running, then you can begin to get a second Macintosh ready for the query server.

Setup Instructions

0. Warn Users and Management

Before you start indexing guest-accessible server content, you need to remind everyone in your organization to check that they aren't sharing confidential or personal information. Make sure that everyone knows how to turn off file sharing for their personal Macintoshes and make sure that AppleShare servers are properly configured. If you skip this step and NetScan makes it very easy to access some sensitive confidential or personal information, you might be the one that ends up in trouble! Make sure management knows what you're doing before you get started!

1. Launch NetScan

Double-click the NetScan application to launch it. In the future, once you have a set of index files, you can double-click one of them or select and open the pair of files to launch NetScan. You should make sure that the following three files are also located in the same folder as the NetScan application: EnglishStopwords, EnglishSubstitutions, and english.stop. If you're planning to manually exclude any servers or zones (described later) or use the message of the day feature, those files must also be in the same folder when you start up the NetScan application.

2. Create a new set of index files.

Select New from the file menu. NetScan actually uses a pair of files for indexing, so you will be asked first where you want to store the "Main Index File" using the standard file dialog. After selecting a folder for the main index file, you'll be asked where you want to store the "Content Index File". Typically, both of these files are placed in the same folder, but you can put them on different hard disks if you're short of space on one.

3. Decide whether you want to use sub-index merging.

In its simplest mode of operation, NetScan builds a single content index in the "Main Index File" and you can stop the indexer at any time and immediately use the index files to answer queries. However, if you will be building an index that will exceed about 200 megabytes in size, you'll find that NetScan can index the content much faster by creating many smaller indices and then merging these small indices into the single main index after the indexing is finished. The disadvantage of using the sub-index merging technique is that after stopping the indexing engine, you have to first merge the sub-indices before you can use the index to answer queries or start indexing again. This merging process can take several hours. If your network isn't extremely large, you can probably get by without using merging. You can turn this feature off by unchecking the "Use Sub-Indices & Merge" option in the Edit menu.

4. Decide if you want to restrict indexing to a subset of visible servers.

By default, NetScan will collect a list of all of the visible zones in your AppleTalk network and then proceed through those zones in alphabetical order, indexing every guest-accessible server that it can find in each zone. If you want to index only a specific set of servers, you can create a file that lists these servers and then select Scan Server List... under the File menu. This file should be a plain text file where each line is either a zone name, or a zone name followed by a tab, followed by a server name. NetScan will iterate through the lines in this file and if a zone name appears without a server name, then all of the servers in that zone are indexed, otherwise the given server in the given zone is indexed. (It's also possible to set up NetScan to index all the zones and servers it can find except a list of zones/servers, or to prioritize a list of zones so that they are indexed first. See the section titled "Excluding or Prioritizing Zones and Servers" for more details.)

5. Start Scanning

If you decided to index only a specific list of zones and servers, then NetScan will have started indexing as soon as you selected the file. If not, then select Start Scan from the File menu now to cause NetScan to begin indexing. NetScan is a multi-threaded application, and it constantly creates and destroys threads as it iterates through each zone and server. When it starts scanning, a threads window will appear that shows each of these threads and its current state. This window is informational only, and you can feel free to ignore it. NetScan also displays a Log Window that displays a variety of debugging messages as the scan goes on. Most of the error messages that you see in this window can be ignored Ð the network is usually an unpredictable place, and NetScan frequently gets errors while accessing hundreds or thousands of servers to read many gigabytes of data. You can also watch the progress of the scan in the Status window where the thermometer indicates how many of the available zones have already been scanned.

6. Stop Scanning.

Ideally, you'll be able to leave NetScan indexing until it has gone through all of the available zones and servers. If this happens, the Log Window will indicate that the scan completed, the thermometer in the Status Window will be finished, and the button in the Status Window will change to say "Start Scan" or "Merge" depending on whether you were using sub-indices. If you can't want for the full scan to complete (this is always the case with the main Apple NetScan server since it would take a month or more and probably run out of disk space before completing), at any time you can select Stop Scanning from the File menu or click on the Stop Scanning button in the Status Window. It might take NetScan a few minutes to actually stop the scan since it finishes scanning any file that it already has open. On rare occasions, NetScan will access an erratic AppleShare server that will refuse to disconnect from NetScan. If this is the case, you'll notice that after several minutes there is still a server listed in the threads window with a status of "Waiting" that doesn't seem to be changing. The only way to break the connection to a faulty server like this is to select the Break Network Connection command in the Edit menu. This is a fairly drastic step that will disrupt network access for any other applications on the same Macintosh that are also using the network, so only use it if necessary.

7. Merge Sub-Indices.

If you are using sub-indices, then you must merge them into the main index before you can transfer it to the query server or begin indexing again. Before beginning the merge, you might want to quit NetScan and then launch it again. V-Twin's internal memory pool seems to get a little fragmented over time and I've occasionally had merges fail for lack of memory when there was still plenty. Click the Merge button in the Status Window or select the Merge Indices command from the File menu. If you have a large number of indices to be merged in, NetScan will do them 40 ata time to avoid running out of memory. If you want to immediately begin scanning again after the merge completes, you can also select Merge & Start Scanning from the File menu and NetScan will automatically start scanning again after the merge has completed.

8. Build the Accessors.

Before V-Twin can do lookups in a content index, it has to build an accessor for the index. For large indices this can take a long time (a 1 gig. index can take almost an hour). To avoid shutting down the query server for this length of time, you can get NetScan to build the accessors on the indexing machine and save them away for quick loading on the query machine. To do this, select Enable Queries from the File menu. The accessors will be build as part of the process of enabling queries. When the accessor building thread completes and the Log Window indicates that the queries are enabled, you can close the index and quit NetScan. The accessors are now stored away inside the index file.

9. Transfer the Indices to the Query Server.

Once you have a new set of indices ready to go on the indexing machine, you'll need to transfer them over to the machine that is answering queries. Since this will require shutting down the server for a little while, you might want to do it when the server isn't in heavy demand (e.g. early in the morning). For large indices, you'll probably find it worthwhile to shutdown both machines and actually move the external hard disk containing the indices to the other machine for the copy (copying a 1.5 gig index can take 20 minutes even when it's direct from one hard disk to another).

10. Start Indexing Again.

Once you've copied the new indices over to the query machine and restarted it, you can start the indexing machine again. If NetScan completed a full pass over the network and the scanning stopped on its own, then when you start indexing again, NetScan will look at every server again and only add new or changed files to the index. Any files, folders or volumes that have disappeared since the last scan will be removed from the index. Since only changes are being put into the index, NetScan is able to scan the network much faster the second time around. If NetScan didn't complete a scan of the whole network and you manually stopped the scan, then when you restart the scan, NetScan will skip over any servers that it completely indexed on the previous scan provided they were indexed within the last two weeks. This gives NetScan a better chance of getting to the servers it didn't even start to index on the previous scan.

Excluding or Prioritizing Zones and Servers

Once you have NetScan up and running you might find that there are a few servers that you don't want it to scan. They might be full of information that isn't useful to the average user, or they might belong to users that have asked that their machine be considered off-limits. You can exclude any server, zone, or type of entity from NetScan's indexing by creating a file called "NetScan Entity Exclusions" and placing it in the same folder as the NetScan application that is indexing. Each line in the plain text file should contain a zone name followed by a tab followed by an NBP entity type (e.g. "Laserwriter" for printers, "Workstation" for user machines, "AppleShare" for file servers, etc.), followed by another tab and the name of the entity:

zone name {tab} NBP type {tab} entity name

NetScan will automatically skip over any entity listed in this file. You can leave out any of the values to match anything in that position. For instance, "Test Zone{tab}{tab}" would cause NetScan to skip everything in the "Test Zone" zone. "{tab}{tab}Private Machine" would cause NetScan to skip anything in any zone whose name is "Private Machine". The file shouldn't have any blank lines in it, and you should be careful not to over-generalize, or NetScan might skip more than you intend (e.g. "{tab}{tab}" would match any zone, any type and any name, excluding everything). The contents of this file are read when the NetScan application launches, so if you change it, you must quit and relaunch NetScan to get the changes to take effect.

In very large networks, it's sometimes useful to get NetScan to scan the zones in other than alphabetical order, so that the most important zones will be indexed when you stop the scanning after a week and copy the indices over to the query server. To get NetScan to scan several zones before going on to doing the remaining zones in alphabetical order, create a plain text file called "NetScan Priority Zones" and place it in the same folder as the NetScan application. The file should contain a list of zone names, one per line. If any of the zones listed in the file can't be found, then it will simply be ignored. The contents of this file are read when the NetScan application launches. Disabling Server Content Scanning If you have a particularly large network and it takes too long for NetScan to completely scan every zone in the network when it has to work its way through large AppleShare servers in each zone, you can get NetScan to scan the network without looking at the content of the servers. If you hold down the option key while selecting Start Scanning from the File menu, then NetScan will go through all of the zones scanning for the top level entities only (e.g. just the names of the servers, printers, workstations, etc.). Since users often use NetScan to search for printers, etc. this is a good way to ensure that NetScan knows about things like printers from every zone, even if it hasn't had time to index the content of every zone yet.

Typically, for very large networks you would let NetScan index for a week or two, stop the indexing at some convenient time, merge the indices and start another scan while holding down the option key. After that scan completes in an hour or less, enable queries to get the accessors built and cached, copy the indices over to the query machine, and then start all over again. NetScan might never get through all of the zones on your large index, but at least it will have the names of everything in every zone indexed.

The Query Server

Once you have an indexing engine up and running and you're close to having an index that you want to make available, its time to start setting up another Macintosh to answer queries on the index.

Setup Instructions

1. Give the Machine a Recognizable Name

Since users will occasionally have to browse the network from the NetScan Client looking for the server, it's important that you give it a recognizable name. You might want to call it something like "NetScan Server" or "My Company's NetScan Server". The Sharing Setup control panel lets you set the name of the machine. While you're editting the name, make sure that you set a password for the machine that is known only to the administrator.

2. Turn on Program Linking

Since the NetScan server accepts queries via remote AppleEvents, it must have program linking turned on to work. Since program linking must be turned on for all applications on the Macintosh, not just NetScan, for maximum security you probably shouldn't be running any other scriptable applications on that machine.

3. Set up the Special NetScan User.

When the NetScan Client sends an AppleEvent to the server for a query, it connects as the user "NetScan Query" with password "NetQuery". You'll have to go to the Users and Groups control panel and add this new user. This user should only be allowed to do program linking (file sharing should not be allowed and this user shouldn't be allowed to change the password).

4. Launch NetScan.

You can launch NetScan by double-clicking on one of the two index files (in which case it will ask for the other file), by selecting both of the files and dropping them on the NetScan application or by double-clicking the NetScan application and then opening the index from the File menu.

5. Enable Queries.

Select Enable Queries from the File menu. If you built the accessors before closing the index on the indexing machine, then this should take less than a minute. At this point the server is ready to accept queries.

6. Make Sure the Server is Working.

Before leaving the server running, you should probably go to another machine, launch the NetScan Client and try a quick search to make sure the the query server is accessible and running properly.

Message of the Day

You will probably find that you occasionally want to send a message to all of the regular users of your NetScan server, perhaps to warn them that the server will be unavailable for a few days, or that it will be moving to a new zone. NetScan provides a convenient mechanism for getting a message to users of the NetScan Client front-end. If you create a file in the same folder as the NetScan application called "NetScan MOTD", then NetScan will include the contents of that file (which should be a fairly short message) with the reply to any queries sent to the server. The NetScan Client always watches for a message attached to the replies that it gets from the server, and if that message is different that the last message presented to the user (or the user has never been presented with a message), then it is displayed in a dialog box for the user.

Since the user only sees any message once, if you need to rebroadcast a message more than once, you can always make subtle changes to the message like adding another space, etc. The contents of this file are read when the NetScan application launches, so if you change it, you must quit and relaunch NetScan to get the changes to take effect.

NetScan Client User's Guide

What is NetScan?

One person in your organization will set up and maintain a NetScan server. Everyone else who just wants to search for things using NetScan should get a copy of the NetScan Client that is described on this page. The NetScan Client is small enough you can just leave it running all of the time. You can use it to search for documents that contain certain words or are similar to other documents, or to search for a printer with a specific name.

Why Should I Care?

Have you ever needed the latest copy of an application that your organization site licenses, but didn't know where to find it? Just type its name into NetScan, hit return, double-click on one of the found copies and you've got it. No need to search for the right zone, server, volume, folder, etc.

Has someone ever asked you to put something in their drop folder, but then neglected to tell you which of dozens of zones their machine is actually in? You can use NetScan to search for the name (or part of the name) of their workstation, double-click on it and then pick one of their shared volumes to automatically mount on your desktop.

Interested in more information from a press release that you're sure your company published recently, but don't know where to find it? Type a few words that you think are in the press release and you'll get back a list of documents that contain those words ranked to place the most relevant ones at the top. Or maybe you need the form for reporting vacation days, but have no idea where electronic forms are kept. Try searching for "vacation form" or "vacation authorization" in NetScan and see what you get.

Looking for some sample source code that uses a particular toolbox manager? Try typing in the names of a few of the functions you expect to call and you'll probably get back pointers to several ".c" or ".cp" files that use the functions you're interested in.

 

How Do I Use NetScan?

  • Get the latest version of the NetScan Client (it's part of the full NetScan distribution, and the person that set up the NetScan server should have also put it in a convenient location).
  • Put the file "MountVolume" into the folder "System Folder:Extensions:Scripting Additions" (there is no need to reboot your machine). This Scripting Addition just makes it possible for the NetScan Client to mount a server when you find a file that you want to retrieve.
  • The NetScan Client is very easy to use: just select the kind of thing you're looking for and type in one or more words. You'll get back a list of matches. In most cases, you can just double-click on one of the matches and NetScan will mount the appropriate server, open the folder that contains the matched file and then highlight the file for you (note that it doesn't actually open the file since that might trigger a big download over the network that you didn't really want).
  • If you double-click a printer in the list of matches, NetScan will try to choose it as your default printer. If you double-click on a workstation or file server in the list of matches, NetScan will mount the publicly shared volume from that machine. If more than one volume is shared, it will give you a choice of which volume(s) you want to mount. Double-clicking on a folder in the list of matches will mount the server and open that folder for you.
  • NetScan returns a screen-full of matches at a time. Just click the "More Matches" button if you'd like to see more choices. When there are no more possible matches the More button will be disabled. If you want to find more matches at once and you have a big monitor, just resize the window larger.
  • You can have several search windows open at one time, just use the "New Search" menu item to open another window.
  • You can drag text from a drag-enabled application (e.g. Claris Emailer) directly into the search field and it will automatically begin searching for the text you drop. You can also drop one or more text files on the NetScan Client icon and NetScan will search for files similar to the dropped file(s).
  • The NetScan Client also has Balloon Help if you can't figure something out.
  • The only tricky thing to remember is that for file content and file names, NetScan indexes and searches using WHOLE WORDS. It breaks documents and queries into words at any punctuation or number and it doesn't index any punctuation, numbers or words shorter than three characters. For instance, searching for "RS232" would unfortunately not find references to the Macintosh serial port (which is an RS-232 port) since NetScan discards the number and "RS" is too short to be indexed. NetScan is case insensitive (e.g. "TEST" will match "test").

See Also