Please consider a donation to the Higher Intellect project. See https://preterhuman.net/donate.php or the Donate to Higher Intellect page for more info. |
Curling Up to Universal Resource Locators
Revision as of 00:40, 28 July 2020 by Netfreak (talk | contribs) (Created page with "<pre> Curling Up to Universal Resource Locators version 0.2 (7 January 1994) by Eric S. Theise/[email protected] INTRODUCTION "Yeah, you can get it from sumex."...")
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
Curling Up to Universal Resource Locators version 0.2 (7 January 1994) by Eric S. Theise/[email protected] INTRODUCTION "Yeah, you can get it from sumex." --well meaning net.friend "When anonymous FTP is enabled, there is a special login name called anonymous. If you start ftp, connect to some remote computer, and give anonymous as your login name, ftp will accept any string as your password. It is generally considered good form to use your electronic mail address as the password ..." --Ed Krol, *The Whole Internet User's Guide & Catalog* Both of these statements are intended to direct an Internet user -- you -- to a particular Internet resource. The first is maddeningly vague. There's no information about the access scheme, the hostname of the Internet computer you're supposed to connect to, the directory path to the file you want, and the name of the file itself. Even if you know what and where sumex is, the sheer number of files at that site can effectively preclude you from finding what you're looking for. The second statement is wonderful if you're new to the Internet. It's informative, patient, and clear, but once you've worked your way to an intermediate level of experience, you'd like your pointers in more distilled form. Nothing extraneous, nothing omitted. Readable by human or machine. Simply put, that's the goal of the Uniform Resource Locator (URL), an Internet resource specification currently under development. The strongest and earliest push for URLs came from the World-Wide Web initiative. Since the Web is chiefly concerned with providing high-level, automated access to a rich range of Internet services, compacting the necessary information into something reliably read by a machine was the driving force. Those of us arriving on the scene today are motivated by the need for a way to catalogue online resources in a way that will not conflict with emerging standards, yet are usable by humans, often with unsophisticated access to the Internet, e.g., 2400bps dial-up to a text-only, commercial service, such as Netcom, Delphi, Holonet, or The WELL. It's my hope that this document will evolve into something that will be indispensable for anyone preparing their first URL, and that people having to decode URLs will find much of value here, too. Areas where I have questions for the URL community are marked off by {Question: ...}. NOTES ON FORMAT In the following descriptions of URLs, items separated by the pipe character, |, represent a choice, so that void | /path means that either the field is left blank, or that the file path is specified. Items in square brackets, [ and ], are exceptional and only specified when necessary, so that [:port] is replaced by a colon and a numeric value when the port of the resource being catalogued is not the standard one. In all cases, spaces have been included for clarity of exposition and should not be included in the genuine URL. Examples in each section should serve as your guide. FTP (port 21) The file transfer protocol, ftp, remains the basic way of delivering text or binary files over the Internet. Since we're primarily interested in cataloging publicly available information, we focus mainly on files available through anonymous ftp, where a user uses login: anonymous (or ftp) and their full e-mail address as the password, or servers that allow access using the password: guest, or servers that publish the needed login, password combination. The URL does not currently address the issue of whether or not a file is to be transferred as a binary file, leaving it up to the intelligence of the human or machine to recognize binary file extensions (e.g., .Z, .gz, .gif, .sea) and issue the appropriate command. The full URL specification for an ftp session is: ftp:// [user [:password] @] host [:port] / void | /path Examples: You want to catalog the file folklore-faq located in the /pub/usenet-by-group/news.announce.newusers directory of rtfm.mit.edu, available via anonymous ftp. Since this is a standard use of anonymous ftp (port 21, login: anonymous, password: e-mail-address), the URL is: ftp://rtfm.mit.edu/pub/usenet-by-groups/news.announce.newusers/folklore-faq You want to catalog the file alien.visitors located in the /pub/et directory of martians.org. Their systems administrators are new to the planet, and don't know about Internet conventions. They run ftp on port 666, and require people to use login: humanoid, password: visitor. The URL is: ftp://humanoid:[email protected]:666/pub/et/alien.visitors Note that this is a completely fictitious example; I have never come across an anonymous ftp site running on an alternative port, or one that wishes its contents to be public while requiring some nonstandard login/password combination. TELNET (port 23) Telnet is the primary way of establishing connections to remote computers over the Internet. The standard port for telnet is 23, although a number of specialized servers use different ports for different services. Similarly, many hosts require well-publicized login/password combinations to access special services. The full URL specification for a telnet session is: telnet:// [user [:password] @] host [:port] Examples: You want to catalog the InterNIC, a central repository of information about the Internet, offering whois, wais, gopher, x.500 addressing, and other lookup services. Since the InterNIC runs on the standard telnet port and requires no login, the URL is: telnet://rs.internic.net You want to catalog the University of Michigan's Weather Underground. It requires no login, but it does run on the non-standard port 3000. The URL is: telnet://madlab.sprl.umich.edu:3000 You want to catalog the American Type Culture Collection. It runs on the standard telnet port 23, but requires the login: search, password: common, before you can get in. This information is widely known (e.g., it's listed in Scott Yanoff's Special Internet Connections). The URL is: telnet://search:[email protected] {Question: the URL Internet Draft has a Specific Scheme paragraph devoted to telnet, rlogin, and tn3270. I do not see a need to address rlogin, but given the existence of numerous Internet Libraries requiring tn3270 connections, I wonder if there should be a tn3270: URL?} MAIL/SMTP (port 25) E-mail is the most universal service provided by the Internet and other systems in the Matrix (e.g., BITNET, FidoNet, uucp, and commercial services such as CompuServe, Prodigy, America OnLine, MCIMail, GEnie, and others). Because mail is handled by the user's machine and not through a client/server interaction with a remote machine, the // that typically separates the scheme from the address is not included in the mailto URL. The full URL specification for a mail message is: mailto:user@host Examples: You've come across this entry in the List of Lists. [email protected] A moderated list for Intel 80386 topics, including hardware and software questions, reviews, rumors, etc. Open to owners, users, prospective users, and the merely curious. Archives are available via an electronic mail server. Details about its use can be obtained by sending a request to [email protected]. All requests to be added to or deleted from this list, problems, questions, etc., should be sent to [email protected]. List Maintainer: James Galvin <[email protected]> List Moderator: Bill Davidsen <[email protected]> The URL for subscribing to the list is: mailto:[email protected] The URL for participating in list discussions is: mailto:[email protected] The URL for the list moderator is: mailto:[email protected] and the URL for the list maintainer is: mailto:[email protected] {Question: information is scarce on the mailto: URL. Those of us in the cataloging business would like to see ways to encode common instructions for using mail services into the URL, such as the common SUB LISTNAME-L Firstname Lastname convention for listservs.} GOPHER (port 70) Gopher is one of the simplest Internet services to use, but one of the more complicated ones to create URLs for. The good news is that current gopher clients will display a resource's URL for you when you use their "Get Information" command. Macintosh clients: command i unix clients: = NeXT clients: command i PC clients: choose "Item Inspector" from the menu The full URL specification for a gopher session is: gopher://host [:port] [/gophertype [selector] ][? search] The standard gopher port is 70. You have to specify the port only if the gopher is running on a nonstandard port. However, if you cut-and-paste your URLs, there is no need to delete the ":70" port information. Standard gophertypes include: 0: text file 1: directory 2: CSO/qi phone book server 3: error 4: Macintosh .hqx/BinHex file 5: DOS binary archive file 6: uuencoded file 7: index/search server 8: telnet session; use the telnet URL described above 9: binary file T: tn3270 session Experimental gophertypes include: s: sound file g: GIF file M: MIME (multipurpose internet extensions) file h: html (hyper text markup language) file I: image file i: inline text type (used by panda; I don't know what this is) The selector is the string used to give the path to a particular area of a gopher. It isn't needed if you're cataloging an entire gopher. Note that the selector and the sequence of menu choices usually bear some resemblance to each other, but typically they are not the same. The selector is the definitive path to the resource. Because the selector includes the gophertype, most gopher: URLs look like they repeat themselves. Examples: You want to catalog the University of Minnesota's Mother of all Gophers. You want the whole thing, and it's a standard gopher. The URL is: gopher://gopher.micro.umn.edu The longer gopher://gopher.micro.umn.edu:70 is also perfectly acceptable. You want to catalog the Civic Nets, Community Nets, Free-Nets, and ToasterNets section of the WELLgopher. It's a standard gopher, but you only want one section of it, so the selector is important. The URL is: gopher://gopher.well.sf.ca.us/11/Community/communets Remember that the first '1' in the '11' is the directory gophertype, and the second '1' is actually part of the selector, 1/Community/communets. You want to catalog the veronica server at the University of Manitoba. It has the search type, 7, and runs on the nonstandard port 2347. The URL is: gopher://gopher.umanitoba.ca:2347/7 Here are two examples where a gopher URL is not a gopher URL. Type=1 Name=ANS CO+RE Systems, Inc. (US and Int'l) Path=ftp:ftp.ans.net@/pub/info/ Host=gopher.cic.net Port=70 URL: ftp://ftp.ans.net/pub/info/ The definition of a gopher directory is broad enough that a gopher can point to an ftp site. That's what is happening here, and the URL given by the "Get Information" command is correct as is. Type=8 Name=NYSERNet (NY) Path=nysernet Host=nysernet.org Port=23 URL: gopher://nysernet.org:23/8nysernet There are two things to note here. Type=8 indicates this is a standard telnet session, and this is confirmed by the Port=23 line. If you were to try this gopher item, you'd see the telnet indicator, <TEL>, be told that you were leaving gopher, and that you should log in using the name "nysernet", which is taken from the Path=nysernet line. The correct URL for this entry is: telnet://[email protected] {Question: there is no finger: URL. Because a fair amount of useful information is only available this way, I think there should be one. FINGER (port 79) Although finger was originally intended as a way to access information about users (time and date of last login, personal information) at local or remote sites, many novel uses of finger have appeared on the Internet. These include election reports, random quizzes, earthquake and weather information, and the infamous appliance reports, e.g., vending machines. When used without a userid, finger often supplies a list of presently logged-in users. At present there is no finger: URL. If there were, the full URL specification for a finger request might be: finger:// [user] @host [:port] Example: You want to illustrate how quickly the Internet community responds in the face of network or personal emergency. When Brendan Kehoe, author of Zen and the Art of the Internet, was critically injured in an auto accident, Cygnus Support made a finger address available for getting up-to-date information on his condition. The URL would be: finger://[email protected] } HTTP (port 80) Http stands for hyper text transport protocol. It is an increasingly common Internet service due to the growing popularity of the World-Wide Web, and its Mosaic and Cello clients. The full URL specification for an http session is: http://host [:port] [/path] [? search] Example: You want to catalog the FBI's information page for the UNABOM unsolved bombing case. It's a standard html (hypertext markup language) file on a server using the standard port. The URL is: http://naic.nasa.gov/fbi/FBI_homepage.html NEWS/NNTP (port 119) Network news, aka USENET, is like e-mail in that it passes easily across network boundaries. Like mail, news is typically accessed from a local rather than a remote server, and for this reason the // that usually separates the scheme from the address is not included in the news URL. Although the news URL allows you to specify an article's unique message identifier, this is rarely used. Two primary reasons for this are that there is no established protocol for accessing archived articles, and that many of the more important articles, such as FAQs, are reissued periodically. The full URL specification for news is: news: * | newsgroup | message_identifier@host {Question: is "host" appropriate here? The URL Internet Draft says that "News host names are NOT part of news URLs." (p. 9)} Example: You want to catalog the newsgroup, sci.virtual-worlds, which is one of the primary sources of information and discussion about all aspects of virtual reality. The URL is: news:sci.virtual-worlds PROSPERO (port 191) The full URL specification for a prospero session is: prospero://host [:port] /path [% 0 0 version [attributes] ] {Question: I have no direct experience with prospero; could someone supply me with an illustrative example?} WAIS/Z39.50 (port 210) Wais stands for Wide Area Information Servers, and refers to a type of distributed database search that has become popular on the Internet over the past few years. The full URL specification for a wais search is: wais://host [:port] /database [? search] Example: You want to catalog the k-12-software archive offered for wais search. The URL is: wais://info.curtin.edu.au/k-12-software.src {Question: I always use wais via telnet to wais.com; can somebody verify the format of this URL?} {Question: I have left the x500: and whois: URLs out of this version since the Internet Draft highlights them as subjects for future study. Does anyone have more current information?} REFERENCES Marc Andreessen, "A Beginner's Guide to URL's". <ftp://ftp.ncsa.uiuc.edu/Web/mosaic-papers/url-primer.ps.Z>. Tim Berners-Lee, "Uniform Resource Locators: a unifying syntax for the expression of names and addresses of objects on the network", Internet Draft Version 7 (14 October 1993). <gopher://rusmv1.rus.uni-stuttgart.de/00/software/ftp_server/stgt/org/ ietf/uri/draft-ietf-uri-url-07.ps> DISCLAIMER This document will hopefully undergo rapid change in the first few weeks of its existence. Even in its present, rickety form, I'd appreciate it if you keep it intact. This document has benefited from discussions with Dirk Herr-Hoyman, David Robison, and Larry Masinter. Errors and omissions are mine. Let me know when you find them, and please suggest ways to make this document more useful. The definitive source for this document is gopher://gopher.well.sf.ca.us/00/matrix/internet/curling.up.02 -- Eric S. Theise <[email protected]> P.O. Box 460177, San Francisco, CA 94146.0177 Internet Domain Editor, Millennium Whole Earth Catalog The WELL: internet, matrix, & news conference host + gophermeister