Specific Thl Efforts At Web Services

THL Toolbox > Web Services & Interoperability > Specific THL Efforts At Web Services

Specific THL Efforts At Web Services

Contributor(s): David Germano, Than Grove.

Json output for Translation tool

Ed and Andres has created a JSON output format for the online translation tool. Based upon that, Than created a translation plug-in for a web page. It was relatively easy as Ed and Andres had already created the code for retrieving Tib translation tool output as a JSON object and Jed had already done his test highlight-and-translate page. So, I just put those two together slightly modified and come up with the following.

external link: http://staging.thdl.org/testing-area/than/ajax-translation-test.php

You highlight a selection of Wylie and press Ctrl + Shift + ?, and a box with the translation for each word should appear next to the highlighted phrase. It works in Firefox and IE 7 on the PC (for me at least).

I tried at first to use Unicode Tibetan but that gets URL encoded in the Get call and the translation tool won't handle that. Does anyone know a work around for that? As far as I know the getJSON function in JQuery has to be a GET and it doesn't let you set the contentType for a GET. Doug Cooper Suggestion about Interoperability

Regarding your on-line dictionaries, my original intention was simply to 'reframe'

your page: submit the query from my page, then direct the result from your server to a new frame in the user's browser.

The way that your pages are implemented makes this a little difficult. However, the direction that Andres took is perfectly consistent with his overall plan for the page, and makes sense within that context. It's also exactly what his first note to me said:

1) when searching the string and language are not passed as parameters within the URL, but sent as a POST through a form, and

2) the search happens in AJAX, so the search results that are returned are not meant to be displayed as a document in itself, but as a section of a bigger search interface.

My goal for the long run, though, is to start the process of developing standards for on-line dictionary access and interoperability for projects. I think that the simplest sort of Web API protocols (i.e. http query + arguments, or SRU, or whatever you want to call it) will be sufficient for our environment (as opposed to SOAP etc.).

The real issue will be establishing just what sorts of functionality should be provided in abstract terms. We all have dictionaries, and we all have different, but largely overlapping, sets of bells and whistles for restricting or extending lookup and return, but it's going to take work, and a lot of playing around with existing resources, to evolve a standard (that folks can implement any way they want to) that makes sense to all interested parties.

I was wondering if your server implementation might be amenable to allowing Web API style queries sooner rather than later. Allowable queries would be the same (but be GET arguments in the URL), while the return would be simpler in one way it can be one long page, rather than n items + a link to the next n items but a little more complicated in another: it should probably include a few extra lines that establish it as a document; i.e. head, body, and any .CSS includes.

Note that this is not intended to be the ultimate solution. Rather, if it's easy for you to implement, it would be a convenient way to expose the essential functionality of your dictionaries in a way that lets us assess the collective capabilities of _all_ of our dictionaries.

Please let me know if this a) makes sense, b) seems like something you'd be interested in doing, and c) if a && b, what kind of time-frame we're looking at (yes, I'm trying to avoid having to hack your current AJAX return!).

Jed Verity's Outline of Web Servcies

Contributor(s): Jed Verity

Phase 1: Geotours Search

As of 8/7/07, the earliest stages of web services development are happening on the external link: GeoTourism site.

Phase 1-a: Search Forms (8/7/07: in progress)

XML

In this phase, basic search forms are created in XML with the necessary elements and attributes to function with whatever application they provide an interface for, as well as some descriptive information. Eventually, the "action" parameter of these forms should be consistent across all applications, looking to a web services url that will be the main hub of all searching. The forms will then contain a hidden input that informs the web services superbrain where to perform the search(es). See a very basic example below:

<?xml version="1.0" encoding="UTF-8"?>

<search_template>
	<description></description>
	<help>If you would like to browse the entire database of over 35,000 images, just click "Search" below, without changing the form. You can also browse by Photographer or Collection, but note that you probably want to select EITHER a Photographer OR a Collection. If there are values in multiple fields below, the result is an "AND" search.</help>
	<form action="WEB_SERVICES_URL" method="get" enctype="multipart/form-data">
		<fields>
			<input type="hidden" name="search_context" value="images" />
			<input type="hidden" name="output" value="xml" />
			<select label="Collection Title" name="CollectionTitle" id="collectionTitle">
				<option>…</option>
				<lookup src="WEB_SERVICES_URL" fetch="collections" format="option" />
			</select>
			<select label="Photographer" name="Photographer" id="photographer">
				<option>…</option>
				<lookup src="WEB_SERVICES_URL" fetch="photographers" format="option" />
			</select>
			<input label="Caption" size="25" type="text" id="caption" name="Caption" />
			<input label="File Name" size="25" type="text" id="fileName" name="FileName" />
			<input label="Search all fields for" size="25" type="text" id="searchTerms" name="searchTerms" />
			<input type="reset" value="Clear Form" />
			<input type="submit" value="Search" />
		</fields>
	</form>
</search_template>

The result very closely resembles XHTML so that developers across applications can write relatively simple XSLTs to present the forms.

A few things to note:

  • WEB_SERVICES_URL: This variable will be retrieved from a centralized location, yet to be determined. The advantages of this are: 1) it makes it more difficult for prying eyes to find, and more importantly 2) if the web services superbrain is ever moved or changed, all the XML and XSLT documents will continue to function normally. For now, this variable is set in a “conf.xsl” stylesheet that should be imported into any stylesheet working with web services xml.

XSL

A few things to note, in reference to the XML above:

  • <lookup>: This tag should be transformed by XSLT to provide the dynamic list of options for searching. This spec calls for a branch of the web services superbrain to provide for such form data. Though not strictly necessary, offloading such form options to web services will greatly enhance the portability of the applications. Also note that the search_context input should be included with the <lookup> information, and that the XSLT with thus piece together a url looking something like this (assuming external link: http://thdl.org/ws/ is the WEB_SERVICES_URL): external link: http://thdl.org/ws/?search_context=images&fetch=photographers&format=option

Alternative Approaches

It might make sense to have the XML documents generated dynamically so that form options could be generated immediately as part of the document instead of pulled out during or after transformation. This would eliminate a couple of steps and would allow for greater dynamism within the XML, but it also reduces the platform agnosticism of an otherwise manifest XML document that includes urls for the retrieval of dynamic content. In a nutshell, is portability or performance more important? The motivation for the construction of this functionality was portability, and thus this remains the priority for now.

In somewhat the same vein, we should potentially include all available search fields from the databases in the XML documents. This will require careful indexing and an audit once the databases are more complete (many of them are about to change from one platform/language to another).

It seems that XFORMS could be helpful for this project, but browser support is too spotty at the time of this writing to consider seriously implementing them.

Phase 1-b: Search Results (future)

Description of Functionality

Though it will be important to keep an eye turned towards internal cross-application development and the implementation of an API that will allow for non-THDL sites systematically to integrate THDL data and functionality into their applications, the immediate phase calls for a basic xml wrapper that the web services superbrain calls after extracting data from existing data sources and applications.

This will allow for the standardization of search results across the THDL site, preventing both the UI re-orientation that users currently have to undergo when searching for different elements of the site and, for developers, the multi-platform isolation of different applications providing results in different formats via different access points.

Of course, working with different data requires different interfaces and functionalities, and such distinctions will be present to an extent in search results lists, but the idea is to confine the majority of such functionalities to discreet specialized areas. For example, if a user does a site-wide search for “rdo rje”, the search results list might show hits from the dictionary, audio-video, and people. The results list will show the source of the hit, a few relevant details for each (e.g. short or summarized definition, thumbnail and summarized description, and short biographical detail, respectively), and some metadata about the record information, with the page title of the result hyperlinked in Google fashion. When users click a page title, they are taken to a detail view of their selection, which includes maximum functionality and shifts their overall working context to that of their selection (e.g. dictionary, audio-video, or people).

Once in the new context, the search results are still standardized, but a more sophisticated layer of functionality can be included with results, if desired.

Technical Details

There is no question that a DTD will eventually be necessary to ensure the smooth operation and comprehensive standardization that this application aspires to, but if one developer is primarily responsible for this application’s construction, it may make sense for him/her to create a couple of xml wrappers and then create the DTD from them. This isn’t always advisable but in this case could speed up development and better inform the DTD’s contents than a top-down theoretical approach would.

Given that the search application will be querying different THDL databases and relaying to other applications, there are two options for ensuring access to desired resources:

  1. The ideal and unlikely scenario: Developers of other applications provide web services-like urls, abiding by our standards, that can be used independent of platform to access data.
  2. The likely scenario: Databases are directly queryable by whatever language the web services application has access to. As of 8/13/07, nascent scripts have been written in PHP, but this is not necessarily the best language to use. It may be less resource-intensive for this kind of application than a framework like Ruby on Rails, but the latter’s emphasis on convention would be a good match for a project where the main intent is to improve data conformity.

If possible, there should be a flexible, modular architecture to the results xml generation to allow for the easy integration of new data sources and application relays (this is perhaps another recommendation for Rails). For example, if the THDL develops a database of yak products, a web services developer should not have to go into the source code, add hooks to a yak-app object, and then modify the infrastructure to accommodate this new source. Instead, there should be an instruction repository containing documents similar to but not exactly like external link: wsdl’s for both the search forms and search results (NOTE: the dynamic aspects of the search form prototype currently do not abide by this but can be easily modified). These instruction documents will describe what data is available for an application and how to access it, and thus could reference external web services as well as internal ones (if the TBRC were to offer web services, for example, we could integrate their data into our search results). The web services superbrain then will be responsible for iterating through these instruction documents. This information could also be stored in a database instead of in a folder as documents, but the latter greatly improves portability and scalability. Again, the issue is performance vs. portability. In either case, the web services would run within a session so that all the data sources/applications to be queried would be loaded on the first request and not again for that particular user and session.

NOTE: Another alternative to the wsdl functionality is something like PHP5’s auto-loading of classes. Objects could be designed for each different data source/application and then dropped in the class directory. This is less desirable, however, because it will require more genericized (and probably eval’ed) code in the main web services program, and greatly limits the portability of the individual web service instructions.

The XSLTs for search results should be relatively straightforward. They should: Create a tidy results list Include icons and/or text descriptions for the provenance of result items (unless the user has specifically targeted one area of the site, in which case the descriptor is simply at the top of the results list) For the different types of data, make sure the unique features of each are displayed and functional (e.g. thumbnails for images, biographical sketch for people, etc.)

Ajax

It might make good sense to have results piped in via Ajax and to handle the navigation of results pages similarly. This would be a relatively straightforward project, filling the content div with the basic results list, and would allow for the ability to search from any given page and get very fast results, with the option then to go to more complete search pages, with more options, instructions, etc.

mplementation

In terms of rolling out the different pieces of this phase, it might go as follows, according to priority and logical flow:

  1. Existing data/application audit to determine a phasing of their inclusion in web services. For example, images, gaz, and dictionary might be the first applications to include because of their utility and accessibility, alongside a general site search. (Purely hypothetical - I haven’t investigated them thoroughly.)
  2. App-by-app evaluation of query parameters - e.g. for images, you should be able to search by photographer, collection, caption, file name, and keyword in all fields.
  3. Creation of forms and iframe search results from existing applications as a temporary solution.
  4. (This item can also follow the XML wrapper step below.) DTD creation, describing all possible search results items and their elements/attributes.
  5. Creation/Modification of web services scripts that access data from identified applications.
  6. XML wrapper for results.
  7. XSLTs.
  8. Conversion/Extension of web service scripts to WSDL-like documents that describe how to access data (where and in what format), and a method in the web services scripts that iterates through this repository of instructions. Ideally, this would happen earlier, but it’s likely that it would be deemed a later phase.
  9. Session implementation so that instructions immediately above are not read with every request.

Note that a few of these steps will require collaboration with developers of the other applications. Andres Montano, developer of many THDL applications, has already been contacted, but only in an introductory way.

Phase 2: Cross-Application Implementation (future)

This stage is marked mostly by a mature DTD, as much of the work needed for cross-application functionality will have been done in Phase 1. It will be crucial here to define all possible elements of search results xml so that developers of new applications can reference and include the DTD when building a web services hook, thus saving other developers the work of investigating the new data/app and building xml wrappers for it.

Phase 3: External API (future)

The primary difference between this and the previous phase is the need for authentication (if desired), the delineation of public and private data, and the construction of a public script that plugs into the web services superbrain to provide easy and efficient query hooks.

For how to create JSON files for Data Table lists such as TAR Monasteries and the like, see the section on creating JSON for Data Tables on the Place Dictionary API page.

Provided for unrestricted use by the external link: Tibetan and Himalayan Library