<?xml version="1.0" encoding="utf-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: More on GTxA the Show</title>
	<atom:link href="http://grandtextauto.org/2007/10/18/more-on-the-gtxa-the-show/feed/" rel="self" type="application/rss+xml" />
	<link>http://grandtextauto.org/2007/10/18/more-on-the-gtxa-the-show/</link>
	<description>A group blog about computer narrative, games, poetry, and art.</description>
	<lastBuildDate>Wed, 19 May 2010 01:52:58 -0700</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.5</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Steven Dow</title>
		<link>http://grandtextauto.org/2007/10/18/more-on-the-gtxa-the-show/comment-page-1/#comment-134939</link>
		<dc:creator>Steven Dow</dc:creator>
		<pubDate>Fri, 26 Oct 2007 19:24:59 +0000</pubDate>
		<guid isPermaLink="false">http://grandtextauto.org/?p=1626#comment-134939</guid>
		<description>I&#039;m glad you enjoyed &lt;i&gt;AR Façade&lt;/i&gt; Andrew.  And yes, thanks Michael, readers out there should indeed hire me.  

The speech interfaces for interactive drama remains an interesting challenge. There are at least three reasons speech recognition is not feasible (at least in Façade):  a) recognition is difficult in uncontrolled noisy environments,  b) Façade limits the user to 8 words and there is no way to effectively truncate the user during the middle of a speech act, and c) Façade allows unconstrained language where most speech recognition software rely on limited vocabularies.  To maintain some illusion of conversation, we must employee wizard methods.  The question is how do we structure that interaction?  As Michael suggests, we could give the player a hand held device with two buttons, one to enter the text when it finally appears and another to clear the text if the characters have already moved on to a different topic. Our earlier study points out that &quot;natural&quot; interfaces may not be ultimately what the player needs to have proper hooks into the game interaction. So I&#039;m not against adding buttons.  Timing is key.  In our studies of desktop Façade, we noticed many players typed in words only to erase them later.  This happened a lot. The explicit act of &quot;entering&quot; the text does not have an equivalent with the delayed speech interface. 

I am actually investigating another approach. What if we tear out the NLP completely and task our wizard with selecting high-level discourse acts?  So, instead of typing &quot;Hi Trip, good to see you&quot;, the wizard would select the built in Façade construct called &quot;GREET&quot;.  Façade has about 30 such discourse acts with parameters. So it will force the wizard to be an expert with the interface and to know the story well, but it&#039;s possible that this approach could lower the chance of communication breakdowns.  It will in most cases be faster than typing, and it will likely eliminate some of the NLU breakdowns (the type 1 breakdowns mentioned above by Andrew). We have implemented this as a separate page in the wizard interface and it&#039;s currently an option available to the docents at the Beall, so we will see.  My initial observation is that some time delay still exists and that the player still needs some sort of feedback as to when the system &quot;hears&quot; their statement. I will report on what I find later.  

This approach certainly does diminish the AI appeal of the piece to some extent. We tore out the natural language parser, but most of the AI engine remains untouched. In my view, we are still presented with an opportunity to riff on a player&#039;s expressive input. Only now, we are using an intermediary–the wizard–who not only interprets the meaning of the player&#039;s speech act, but can also read into the player’s gestures and emotions.  If we did, in fact, enable a wider range of expressive possibilities for the player, how would this change the design of the behaviors for Trip and Grace?</description>
		<content:encoded><![CDATA[<p>I&#8217;m glad you enjoyed <i>AR Façade</i> Andrew.  And yes, thanks Michael, readers out there should indeed hire me.  </p>
<p>The speech interfaces for interactive drama remains an interesting challenge. There are at least three reasons speech recognition is not feasible (at least in Façade):  a) recognition is difficult in uncontrolled noisy environments,  b) Façade limits the user to 8 words and there is no way to effectively truncate the user during the middle of a speech act, and c) Façade allows unconstrained language where most speech recognition software rely on limited vocabularies.  To maintain some illusion of conversation, we must employee wizard methods.  The question is how do we structure that interaction?  As Michael suggests, we could give the player a hand held device with two buttons, one to enter the text when it finally appears and another to clear the text if the characters have already moved on to a different topic. Our earlier study points out that &#8220;natural&#8221; interfaces may not be ultimately what the player needs to have proper hooks into the game interaction. So I&#8217;m not against adding buttons.  Timing is key.  In our studies of desktop Façade, we noticed many players typed in words only to erase them later.  This happened a lot. The explicit act of &#8220;entering&#8221; the text does not have an equivalent with the delayed speech interface. </p>
<p>I am actually investigating another approach. What if we tear out the NLP completely and task our wizard with selecting high-level discourse acts?  So, instead of typing &#8220;Hi Trip, good to see you&#8221;, the wizard would select the built in Façade construct called &#8220;GREET&#8221;.  Façade has about 30 such discourse acts with parameters. So it will force the wizard to be an expert with the interface and to know the story well, but it&#8217;s possible that this approach could lower the chance of communication breakdowns.  It will in most cases be faster than typing, and it will likely eliminate some of the NLU breakdowns (the type 1 breakdowns mentioned above by Andrew). We have implemented this as a separate page in the wizard interface and it&#8217;s currently an option available to the docents at the Beall, so we will see.  My initial observation is that some time delay still exists and that the player still needs some sort of feedback as to when the system &#8220;hears&#8221; their statement. I will report on what I find later.  </p>
<p>This approach certainly does diminish the AI appeal of the piece to some extent. We tore out the natural language parser, but most of the AI engine remains untouched. In my view, we are still presented with an opportunity to riff on a player&#8217;s expressive input. Only now, we are using an intermediary–the wizard–who not only interprets the meaning of the player&#8217;s speech act, but can also read into the player’s gestures and emotions.  If we did, in fact, enable a wider range of expressive possibilities for the player, how would this change the design of the behaviors for Trip and Grace?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: michael</title>
		<link>http://grandtextauto.org/2007/10/18/more-on-the-gtxa-the-show/comment-page-1/#comment-134600</link>
		<dc:creator>michael</dc:creator>
		<pubDate>Sun, 21 Oct 2007 03:46:28 +0000</pubDate>
		<guid isPermaLink="false">http://grandtextauto.org/?p=1626#comment-134600</guid>
		<description>During our initial design discussions for &lt;i&gt;AR Façade&lt;/i&gt;, we had a number of discussions about how the design of &lt;i&gt;Façade&lt;/i&gt; nicely fit many of the design constraint for AR experiences (as well as allowing us to sidestep many of the technical difficulties that continue to plague AR, such as the need for precise virtual object alignment with the real world). For those not able to make it to the Beall show, there&#039;s decent documentation of the project on the &lt;a href=&quot;http://www.gvu.gatech.edu/arfacade/&quot; rel=&quot;nofollow&quot;&gt;&lt;i&gt;AR Façade&lt;/i&gt; website&lt;/a&gt;. Our &lt;a href=&quot;http://www.gvu.gatech.edu/arfacade/files/AEL-ARFacade-ACE06.pdf&quot; rel=&quot;nofollow&quot;&gt;ACE 2006 paper&lt;/a&gt; describes the technical design of &lt;i&gt;AR Façade&lt;/i&gt;, and includes a discussion  of both the helpful and challenging aspects of &lt;i&gt;Façade&#039;s&lt;/i&gt; design from the point of view of doing an AR adaptation. 

We&#039;re very aware that the wizard interface introduces another layer of noise and delay in the interaction, and that the fact the constraints imposed by &lt;i&gt;Façade&#039;s&lt;/i&gt; NLU are not made explicit in the (wizarded) voice interface is particularly problematic. However, it&#039;s interesting to note that if we&#039;d somehow magically had a robust speech recognition solution, this would not have solved the problem. Speech recognizers don&#039;t recognize words one by one (since they&#039;re using language models that contain statistics of word co-occurrences to compute the maximally probable textual representation of a speech signal), so the &quot;display words as they are recognized and show when you hit a boundary&quot; interface wouldn&#039;t work. We talked about creating a speech recognition interface in which, after your speech has been &quot;recognized&quot; (in our case, by the wizard), the text would appear in a buffer at the bottom of the screen, with a truncation indication if it hit the buffer boundary; the player would then explicitly commit the recognized text, the speech equivalent of hitting the enter key in the text interface. But we decided not to implement this (at least for the first interface approach), as this requires giving the player an additional interface element to operate (such as a clicker they hold in their hand, or the introduction of meta speech commands which are not heard by the characters). This certainly begins interfering with the &quot;naturalness&quot; of the speech interface; it becomes something you have to start training on (like typing). All this is to say that speech doesn&#039;t magically make interface issues go away; if anything, it makes the interface design issues more pressing, because a speech interface is supposed to be all about doing away with interface (you know, &quot;natural&quot;, &quot;transparent&quot;). 

Our &lt;a href=&quot;http://www.gvu.gatech.edu/arfacade/files/AEL-PresEngage-CHI07.pdf&quot; rel=&quot;nofollow&quot;&gt;CHI 2007 paper&lt;/a&gt; did a detailed comparison study of player&#039;s reactions to traditional desktop &lt;i&gt;Façade&lt;/i&gt;, a wizarded speech recognition version of desktop &lt;i&gt;Façade&lt;/i&gt; (just like the original, except you talk to it instead of typing to it) and &lt;i&gt;AR Façade&lt;/i&gt;. The big result is that increased sense of presence does not necessarily yield increased engagement. In the HCI community, there is a long held belief that the more present you can make someone feel in an experience (and this is tied to notions of naturalness and transparency in the interface), the better (meaning more effective, more engaging) the experience will be. In our study we found this is not that case, and that this was not an effect of immature technology (if the underlying technologies were perfect, you&#039;d still have the effect). It&#039;s no surprise to media artists that mediation has positive value, and that you should explicitly design your mediation instead of wishing it away. But within the discourse of HCI, this has been rarely discussed. 

We also have an &lt;a href=&quot;http://www.gvu.gatech.edu/arfacade/files/AEL-ARFacade-AAMAS07.pdf&quot; rel=&quot;nofollow&quot;&gt;AAMAS 2007 paper&lt;/a&gt;, using the same AI traces and interviews as for the CHI study, on the effectiveness of Facade&#039;s NLP. Our approach was to note places in the retrospective protocols where player&#039;s noted breakdowns in the conversation, and to correlate this with what was happening in the AI. There&#039;s not a simple one-to-one relationship between failures in AI processing and perceived breakdowns in the experience. Sometimes NLP has a failure (like a failure to understand an utterance) and the player notices nothing unusual in the conversation, and sometimes NLP does exactly what it&#039;s supposed to do, and the player is confused and feels like they are experiencing a bug (so this is a conversation design bug, rather than an NLP failure). We wanted  to understand this relationship in detail. The paper reports some interesting results. 

I also just want to point out that as a member of the &lt;i&gt;AR Façade&lt;/i&gt; team and a GVU faculty member during the time the project was done, I was fully involved in the development of &lt;i&gt;AR Façade&lt;/i&gt;. However, Steven Dow, a Ph.D. student at Tech, and someone who will be on the job market soon (you should hire him), was responsible for bringing the project to fruition. He served as the unofficial project manager, bringing all the pieces together, making sure that all the technology, experience design and physical design needed to pull off this project melded together into a coherent unity.</description>
		<content:encoded><![CDATA[<p>During our initial design discussions for <i>AR Façade</i>, we had a number of discussions about how the design of <i>Façade</i> nicely fit many of the design constraint for AR experiences (as well as allowing us to sidestep many of the technical difficulties that continue to plague AR, such as the need for precise virtual object alignment with the real world). For those not able to make it to the Beall show, there&#8217;s decent documentation of the project on the <a href="http://www.gvu.gatech.edu/arfacade/" rel="nofollow"><i>AR Façade</i> website</a>. Our <a href="http://www.gvu.gatech.edu/arfacade/files/AEL-ARFacade-ACE06.pdf" rel="nofollow">ACE 2006 paper</a> describes the technical design of <i>AR Façade</i>, and includes a discussion  of both the helpful and challenging aspects of <i>Façade&#8217;s</i> design from the point of view of doing an AR adaptation. </p>
<p>We&#8217;re very aware that the wizard interface introduces another layer of noise and delay in the interaction, and that the fact the constraints imposed by <i>Façade&#8217;s</i> NLU are not made explicit in the (wizarded) voice interface is particularly problematic. However, it&#8217;s interesting to note that if we&#8217;d somehow magically had a robust speech recognition solution, this would not have solved the problem. Speech recognizers don&#8217;t recognize words one by one (since they&#8217;re using language models that contain statistics of word co-occurrences to compute the maximally probable textual representation of a speech signal), so the &#8220;display words as they are recognized and show when you hit a boundary&#8221; interface wouldn&#8217;t work. We talked about creating a speech recognition interface in which, after your speech has been &#8220;recognized&#8221; (in our case, by the wizard), the text would appear in a buffer at the bottom of the screen, with a truncation indication if it hit the buffer boundary; the player would then explicitly commit the recognized text, the speech equivalent of hitting the enter key in the text interface. But we decided not to implement this (at least for the first interface approach), as this requires giving the player an additional interface element to operate (such as a clicker they hold in their hand, or the introduction of meta speech commands which are not heard by the characters). This certainly begins interfering with the &#8220;naturalness&#8221; of the speech interface; it becomes something you have to start training on (like typing). All this is to say that speech doesn&#8217;t magically make interface issues go away; if anything, it makes the interface design issues more pressing, because a speech interface is supposed to be all about doing away with interface (you know, &#8220;natural&#8221;, &#8220;transparent&#8221;). </p>
<p>Our <a href="http://www.gvu.gatech.edu/arfacade/files/AEL-PresEngage-CHI07.pdf" rel="nofollow">CHI 2007 paper</a> did a detailed comparison study of player&#8217;s reactions to traditional desktop <i>Façade</i>, a wizarded speech recognition version of desktop <i>Façade</i> (just like the original, except you talk to it instead of typing to it) and <i>AR Façade</i>. The big result is that increased sense of presence does not necessarily yield increased engagement. In the HCI community, there is a long held belief that the more present you can make someone feel in an experience (and this is tied to notions of naturalness and transparency in the interface), the better (meaning more effective, more engaging) the experience will be. In our study we found this is not that case, and that this was not an effect of immature technology (if the underlying technologies were perfect, you&#8217;d still have the effect). It&#8217;s no surprise to media artists that mediation has positive value, and that you should explicitly design your mediation instead of wishing it away. But within the discourse of HCI, this has been rarely discussed. </p>
<p>We also have an <a href="http://www.gvu.gatech.edu/arfacade/files/AEL-ARFacade-AAMAS07.pdf" rel="nofollow">AAMAS 2007 paper</a>, using the same AI traces and interviews as for the CHI study, on the effectiveness of Facade&#8217;s NLP. Our approach was to note places in the retrospective protocols where player&#8217;s noted breakdowns in the conversation, and to correlate this with what was happening in the AI. There&#8217;s not a simple one-to-one relationship between failures in AI processing and perceived breakdowns in the experience. Sometimes NLP has a failure (like a failure to understand an utterance) and the player notices nothing unusual in the conversation, and sometimes NLP does exactly what it&#8217;s supposed to do, and the player is confused and feels like they are experiencing a bug (so this is a conversation design bug, rather than an NLP failure). We wanted  to understand this relationship in detail. The paper reports some interesting results. </p>
<p>I also just want to point out that as a member of the <i>AR Façade</i> team and a GVU faculty member during the time the project was done, I was fully involved in the development of <i>AR Façade</i>. However, Steven Dow, a Ph.D. student at Tech, and someone who will be on the job market soon (you should hire him), was responsible for bringing the project to fruition. He served as the unofficial project manager, bringing all the pieces together, making sure that all the technology, experience design and physical design needed to pull off this project melded together into a coherent unity.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
