Dialog
Print Transcript
close window
 

Script:
Dialog at a Glance module:

Understanding Word- and Phrase-Indexing on Dialog

Title slide (Slide 1)
Welcome to "Understanding Word- and Phrase-Indexing on Dialog." This short module will explain how to identify fields that require word-by-word searching with proximity connectors, and those that require full phrases, complete with punctuation.

This session assumes that you are familiar with conducting a basic search on Dialog, and it may also serve as a refresher on using proximity connectors and on using prefix fields. It will clarify what is meant by the Basic and Additional Indexes, and why some fields appear as suffix qualifiers and some fields are prefixes.

Slide 2 (Agenda)
By the end of this brief session you will understand the differences between the basic and additional indexes.

You will know how to use the Bluesheets to identify fields that are word-indexed, those which have phrase-indexing, and those that are both word- and phrase-indexed.
You will be able to differentiate between searching the Basic Index and the Additional Indexes.

You will learn or review why the EXPAND command is essential to comprehensive and effective retrieval.

Slide 3
What do the terms word and phrase indexing mean? Once you know the difference it will make searching that much easier.

Word indexed terms are entered word by word because each word (with the exception of stop words) is indexed individually. Therefore, you must consider which proximity connectors to use between terms. For example, SELECT RENEWABLE (W) ENERGY retrieves all the records in the database that have the two words next to each other in that order. You search word by word in a data pool that is "word indexed."

In addition, suffix codes are used to restrict retrieval to specific fields, for example, SELECT ENERGY/TI. Since these suffix fields are word-indexed, proximity operators and truncation can be used in the SELECT statement.

Are there times you can and should enter a phrase? Yes, indeed! Dialog provides in all of its databases fields that are "word indexed" and fields that are "phrase indexed". What does "phrase indexed" mean? It means that the search terms in some fields are created as phrases, complete with punctuationÑhyphens, periods, apostrophes, commas, etc. Additional Indexes include every field that is indexed using two-letter prefix codes, and these fields are generally phrase-indexed. When searching phrase-indexed fields truncation can be used.

To retrieve records when searching in the phrase-indexed fields, you must search the complete phrase or use truncation, e.g., SELECT CO=SCHERING-PLOUGH CORP? Usually the phrase resides on a prefix field, such as the Company Name (CO=) field. Note the use of truncation. That ensures you retrieve not only CO=SCHERING-PLOUGH CORP, but also SCHERING-PLOUGH CORP., SCHERING-PLOUGH CORPORATION, and all the variations following "CORP".

Slide 4
The Basic and Additional Indexes go together with word and phrase indexes. The Basic Index includes all the words from fields that express subject information, such as Title, Abstract, Descriptors and Text. In a business literature database, it's the article itself, including the title or headline, the text and the descriptors. Search for terms word by word and use the proximity connectors to connect different terms. In the Basic Index, you can qualify your search by using field suffixes, such as Title (/TI), Descriptor (/DE), Lead Paragraph (/LP).

Additional indexes include all non-subject fields, for example author, company, journal and many others. Each additional index field contains a prefix, AU for author, CO for company and JN for journal. For searching, prefixes are followed by an = sign and the name listed as a phrase, for example, AU=DAGIT, L.

It is important to check the Bluesheets to see how terms are entered in the database. We will do that next.

Slide 5:
This is a screenshot of the Basic Index for EMBASE (File 73). All of the fields are listed in column 3 under Field Name. You can restrict to one or more fields in your search by using a slash followed by the two-letter suffix field code in column 1. Notice in column 4 EMBASE has made most of these fields both word and phrase-Indexed. That is particular to EMBASE, so it is important to check the Bluesheet first before starting a search, or enter HELP FIELD followed by the database number because each database may contain different fields in the Basic Index.

Slide 6:
Here is the Additional Index for File 73. Notice in column 4, the Author name is phrase-indexed. Look at the Example column on the Bluesheet which tells you how to search the Author Name in this database. In this case, you would enter AU=last name, a space and at least the first initial. We advise people to use truncation after the first initial, or, better yet, EXPAND AU= followed by the last name, and browse the index. We'll show you this a bit later or you can review the short module on EXPAND for more details.

The date fields are always phrase-Indexed, as is the Journal Name. A phrase-indexed field such as the Journal Name must be entered exactly as it appears in the index to obtain results.

Slide 7: Proximity Connectors
We now know the differences between word and phrase indexing and also between the Basic and Additional Indexes. So, let's take a look at searching the Basic Index.
When searching the Basic Index, you are searching everything that is not in the Additional Indexes. As mentioned before, the Basic Index contains all the subject fields, such as Title, Abstract, Text, Descriptors, etc. For example, SELECT DIAGNOSTIC (3W)IMAGING SAME CANCER truncated finds articles in which the words diagnostic and imaging are within three words of each other and imaging and cancer are in the same paragraph or subfield.

If you free-text search a keyword, you are searching the Basic Index, but not the Additional Indexes. Use proximity connectors as shown here to search terms in the Basic Index: (W) or () to retrieve words next to each other in exact order; (N) for words next to each other in either order; (#W) or (#N) for terms with intervening words and (S) for words in the same paragraph or subfield.

Slide 8:
When searching the Basic Index, you can gain greater precision by qualifying your search terms to particular fields, such as the Title or the Descriptor or the Lead Paragraph. If a topic is in the title of an article, the topic is probably a major focus of the article, so, for example, you can retrieve articles really focused on solar power, rather than articles that mention solar power in passing, but the article is really about soybeans.

In fulltext databases, the Lead Paragraph is a great field to use. Writers try to get who, what, where, why and when into the lead paragraph. You can be pretty sure the article is going to be about a topic if the terms are in the lead paragraph.

The Product Name field (PN) is often in both the Basic and Additional Index. In the Basic Index, you have the leeway of not having to know the full phrase, but using terms and proximity connectors to see what you can find.

Another field in some scientific databases is the Identifier Field. This is often a supplemental author keyword, non-controlled vocabulary, field and it helps to locate articles about particular concepts.

Qualifying to the Abstracts field allows you to check for the presence of an abstract. Certain fulltext files, such as ABI/INFORM, provide abstracts in addition to the fulltext. Here, if records come up based on an abstract search, you can be pretty sure the article is about that topic.

Slide 9:
You can qualify your search terms to more than one suffix. The comma acts as OR.
In this example, we want to find all records that contain the words economic and stimulus within three words of each other, as long as these terms appear in the Title or the descriptor or the lead paragraph.

Slide 10:
To search in the Additional Indexes, you're going to be working with prefix fields followed by the equal sign and phrases.

As mentioned before, the Author name, Journal Name, Company Name, Publication Year and Date are the most common prefix fields. Your best route to conduct the most comprehensive search in the Additional Indexes is to use EXPAND. Browse the index and SELECT the appropriate E Reference Numbers.

Slide 11:
The principle of phrase-indexing is that you must enter the entire phrase. Truncation comes into play here. Imagine that for the Company Name, you have words like "Co.," "Inc.," "Corp.", etc. following the company name. Using truncation after the root company name will gather all those variations.

The same goes for Author Names, and other fields. Using truncation after the au= structure of Last Name, space, at least the first initial, will pull in variations, such as a 2nd initial or middle name, etc. Check the Bluesheet for the database you are searching to see if a comma appears after the last name. Punctuation can vary from database to database.

EXPAND reveals all the variations after the root stem and shows you how to search on company or author phrases.

Slide 12:
Several databases have thesauri, and the EXPAND command guides you to both the phrases with linked terms, and related terms as well. For more information on this topic, take the Dialog at a Glance short module on EXPAND.

Slide 13:
As you saw in the EMBASE Bluesheet, some fields are both word- and phrase-indexed. Here's another example of how the EXPAND command can help, even in the Basic Index. By using EXPAND you can quickly pick up descriptor phrases you hadn't thought of that may be right on target for your search.

The Corporate Source field is often word-indexed, which means that when you search on the prefix CS=, you must use proximity connectors and enclose the terms in parentheses following the CS= as in cs=(kyoto(W)univ?).

Check the Bluesheet of the file you wish to search to see if the Corporate Source field is also phrase-indexed.

Another example of a field that is both word- and phrase-indexed is the Patent Assignee field. Again, check the Bluesheets and use EXPAND to browse the index.

Slide 14:
Other fields are available in the Basic Index as word-indexed, suffix-fields, and also in the Additional Indexes as phrase-indexed prefix fields.

Company Name is an example. You can see why it might be easier to search Procter & Gamble with proximity connectors to work around the question: is it indexed with the ampersand or with the word "and?" Of course, you would use EXPAND to check, but the operator 1W pulls both possibilities in.

Other fields that are often word and phase indexed include the Industry Name, the Drug or Chemical Name, the Product Name, and several others.

Slide 15:
Here are a few more examples of fields that offer both Basic Index and Additional Index options : geographic names and brand names.

Slide 16:
The take-home lesson is to always use EXPAND.

In the Basic Index, you can EXPAND and pick up valuable descriptor terms that will target your search in just the right way.

It goes without saying to EXPAND on fields in the Additional Index, especially companies, authors and journal names.

A word about the word "and" in a phrase term. When you want to SELECT a phrase that has the word "and" in it, put the phrase in quotation marks, and use truncation. Otherwise the system considers "and" the Boolean operator. Put quotes around phrases that contain apostrophes, slashes or colons.

Note that you do not need quotes around phrases that contain ampersands.

Slide 17:
We hope this module has given you new understanding of how and when to search with proximity connectors or phrases.

Here are things to think about when you search the Basic or Additional Indexes.
First: Use Bluesheets as your guide. They will tell you what fields the database contains, which fields are in the Basic Index, and which fields are prefix fields in the Additional Indexes.

Second: From the Basic and Additional Index displays you learn which fields are word-indexed; which fields are phrase-indexed; and which fields provide both options. With word-indexed fields, remember to use proximity connectors between words.
With phrase-indexed fields, especially in the Additional Indexes, use truncation at the end of the phrase to ensure you retrieve all the ending variations, such as periods, "Co.," "Inc.", etc. Also with phrase-indexed fields, you must use the two-letter prefix followed by the = sign. Remember, phrase-index fields require the entire phrase.

Slide 18:

Slide 19:
Thank you for your interest in Dialog and let us know other short modules that would be helpful to you.


Print Transcript
close window