Clean a specified Context of HTML

Ever wanted to get rid of the html from a specific context? Have you needed to create a clean MDD and get rid of the annoying characters that are messing up your Reports for survey tables? If the answer is yes to either one or both of these questions then you should read this article. In it we explain how to open up your MDD  file in your dms script and clean the metadata of all those unwanted characters before you export the data.

The first thing we need to do is to create our basic dms script and check that it works. In this example we  will take some DDF data and export it to a SAV. To create the base DMS, create a new dms file and then hold down the [ctrl] key and press the [M] key, then  press [I] followed by the [Tab] key and you should end up with the following code.

InputDatasource(Input, "")
    ConnectionString = ""
    SelectQuery = "SELECT * FROM VDATA"
End InputDatasource

Notice where the cursor key is positioned , right between the speech marks of the ConnectionString. Don’t  move it but click on the tools menu and chose the ConnectionString builder option. Change the Metadata  location to specify the mdd that we created and the Case  Data Type to DDF and Specify the case data location. Click here to find out more about the connection string builder.

What you should end up with is a connection string like this one, assuming that the data is in the same  directory as the script we are creating.

InputDatasource(Input, "")
    ConnectionString = _
       "Provider=mrOleDB.Provider.2;Data Source=mrDataFileDsc;Location=" _
       + sFile + ".ddf;Initial Catalog=" + sFile + ".mdd"
    SelectQuery = "SELECT * FROM VDATA"
End InputDatasource

You will notice that we have changed it slightly , we have modified the last bit of the string to use a  variable. We have declared out variable up the top of the script like so.

#define sFile "./ShortDrinks"

We have done this so that it is easier to manage out script. If we needed to set it up for another ddf file  we have just one place to change the file name. next we create the output string. Hold down the [ctrl] key and press the [M] key, then press [O] followed by the [Tab] key and you should end  up with the following code.

OutputDatasource(Output, "")
    ConnectionString = ""
    MetaDataOutputName = "???.mdd"
End OutputDatasource

Again notice where the cursor is positioned and then with the connection string builder set the output to be  an sav file. You should end up with this, don’t forget to add in our variable sFile.

OutputDatasource(Output, "")
    ConnectionString = _
      "Provider=mrOleDB.Provider.2;Data Source=mrSavDsc;Location=" _
      + sFile + "_out.sav"
    MetaDataOutputName = sFile + "_OUT.mdd"
End OutputDatasource

Next we need to think about another event to clean our metadata in. When we write DMS scripts we have access  to lots of events , one of these events is the “OnBeforeJobStart” event. This is the event we will use, but  if you would like to read about all the other available events then open up your copy of the DDl and run a  search on “Example TimeLine”. To create the event hold down the [ctrl] key and press the [M] key, then press�
the down arrow until you get to [EBJS] then press the  [Tab] key and you should end up with the following  code.

Event(OnBeforeJobStart, "")
End Event

Next type in the following inside the event.

' Create the MDM object and open the Short Drinks .mdd file in read-write mode
Set oMDM = CreateObject("MDM.Document")
oMDM.Open(sFile + ".mdd", ,2)
oMDM.Contexts.Current = "Analysis"

The code that we have just typed in takes the MDD file that we are editing and opens it up in read-write  mode. If you did not know this, in Dimensions it is possible to have different words displayed in the survey  as opposed to the words displayed in reporting. This functionality is managed by Contexts and by default when you create your survey the text you type goes into the question context. in the background automatically if we  don’t specify anything the Analysis context takes on all the text from the Question context. So because we  want to clean up our reports we need to deal with the Analysis context and take out all the unwanted  characters and HMTL. So the last bit of code we typed in sets the Current context to be the Analysis context. Next type the following in.

 Dim oVar,oElement
'Using the StipHTML function
For Each oVar in oMDM.Variables
 if oVar.IsSystem = False then
  oVar.Label = StripHTML ( oVar.Label )
     For each oElement in oVar.Elements.Elements
      oElement.Label = StripHTML ( oElement.Label )
     Next
 end if
Next

The above bit of code loops through the questions in our MDD file and runs the function StripHTML on all  question labels and the response labels for questions that are not System variables. The StripHTML function  then returns the text and stores it back into the relevant lable element. Next type the following,

'Using the StipOut function
For Each oVar in oMDM.Variables
 if oVar.IsSystem = False then
  oVar.Label = StripOut ( oVar.Label )
     For each oElement in oVar.Elements.Elements
      oElement.Label = StripOut ( oElement.Label )
     Next
 end if
Next

Again we loop the variables but this time we run the StripOut function. We will see later what the stripout  function does. Once the above is typed in , add the following

oMDM.Save(sFile + ".mdd")
Set oMDM = null

This code saves the MDD file with all the changes in place, next we need to add the two functions StripHMTL  and StripOut, first we will enter the StripHTML

Function StripHTML ( sText )
  Dim objRegExp, strOutput
  Set objRegExp = CreateObject("VBScript.RegExp")
  objRegExp.IgnoreCase = True
  objRegExp.Global = True
  objRegExp.Pattern = "<(.|n)+?>"
  'Replace all HTML tag matches with the empty string
  strOutput = objRegExp.Replace(sText, "")
  'Replace all < and > with < and >
  strOutput = Replace(strOutput, "<", "<")
  strOutput = Replace(strOutput, ">", ">")
  StripHTML = strOutput    'Return the value of strOutput
  Set objRegExp = null
End Function

This function is using two concepts, the first is something called Regular Expressions. If you search the web for this word you will find a hundred and one sites dedicated to this sort of stuff, basically regular  expressions allow us to do some very advanced pattern matching and by finding patterns we can so do some quite  complex searching and replacing. So in this example the key line really is the following,

 objRegExp.Pattern = "<(.|n)+?>"

which basically says find an open html tag which is a < and then find a close html tag which in the case is the  > When this is found we just blank it. As well as the regular expression we use a search and replace to find  any single occurrences of < and replace them with < and the same with > we replace them with > Whilst this function is not 100% accurate , you will find that it will take out most of the html code you have  in your context. Next we type the StripOut function

Function StripOut ( sText )
 Dim sAnswer
  sAnswer = sText
  sAnswer = Replace(sAnswer,"[WHAT_ARE_YOU_LOOKING FOR]","")                                                                �
 StripOut = sAnswer
End Function

This function allows you to search for specific pattern that you may have entered and replace it with nothing.  You could have multiple lines like this one

sAnswer = Replace(sAnswer,"[WHAT_ARE_YOU_LOOKING FOR]","") 

if you have multiple things to take out. And there you have it your script once debugged and working will  take out all the HTML from your analysis context and any other unwanted characters.

The full code for this article can be found in :
http://CodeCorner.SmaterDimensions.Com @ Strip out html from metadata context

3 thoughts on “Clean a specified Context of HTML”

  1. I am having troubles getting this to run. I sent it to SPSS support and their response was “I don’t know if we’ll be able to debug code? I did run your dms and got the error, but not sure where the problem is at? I’ll check with some of my co-workers to see if they may know how to fix it, if not, not sure how to help since not all VB code runs with our stuff.”

    Could I send my dms to you to check? I’m fairly new with SPSS.

    Thanks.

Leave a Comment

%d bloggers like this: