Ever wanted to get rid of the html from a specific context? Have you needed to create a clean MDD and get rid of the annoying characters that are messing up your Reports for survey tables? If the answer is yes to either one or both of these questions then you should read this article. In it we explain how to open up your MDD file in your dms script and clean the metadata of all those unwanted characters before you export the data.
The first thing we need to do is to create our basic dms script and check that it works. In this example we will take some DDF data and export it to a SAV. To create the base DMS, create a new dms file and then hold down the [ctrl] key and press the [M] key, then press [I] followed by the [Tab] key and you should end up with the following code.
InputDatasource(Input, "") ConnectionString = "" SelectQuery = "SELECT * FROM VDATA" End InputDatasource
Notice where the cursor key is positioned , right between the speech marks of the ConnectionString. Don’t move it but click on the tools menu and chose the ConnectionString builder option. Change the Metadata location to specify the mdd that we created and the Case Data Type to DDF and Specify the case data location. Click here to find out more about the connection string builder.
What you should end up with is a connection string like this one, assuming that the data is in the same directory as the script we are creating.
InputDatasource(Input, "") ConnectionString = _ "Provider=mrOleDB.Provider.2;Data Source=mrDataFileDsc;Location=" _ + sFile + ".ddf;Initial Catalog=" + sFile + ".mdd" SelectQuery = "SELECT * FROM VDATA" End InputDatasource
You will notice that we have changed it slightly , we have modified the last bit of the string to use a variable. We have declared out variable up the top of the script like so.
#define sFile "./ShortDrinks"
We have done this so that it is easier to manage out script. If we needed to set it up for another ddf file we have just one place to change the file name. next we create the output string. Hold down the [ctrl] key and press the [M] key, then press [O] followed by the [Tab] key and you should end up with the following code.
OutputDatasource(Output, "") ConnectionString = "" MetaDataOutputName = "???.mdd" End OutputDatasource
Again notice where the cursor is positioned and then with the connection string builder set the output to be an sav file. You should end up with this, don’t forget to add in our variable sFile.
OutputDatasource(Output, "") ConnectionString = _ "Provider=mrOleDB.Provider.2;Data Source=mrSavDsc;Location=" _ + sFile + "_out.sav" MetaDataOutputName = sFile + "_OUT.mdd" End OutputDatasource
Next we need to think about another event to clean our metadata in. When we write DMS scripts we have access to lots of events , one of these events is the “OnBeforeJobStart” event. This is the event we will use, but if you would like to read about all the other available events then open up your copy of the DDl and run a search on “Example TimeLine”. To create the event hold down the [ctrl] key and press the [M] key, then press�
the down arrow until you get to [EBJS] then press the [Tab] key and you should end up with the following code.
Event(OnBeforeJobStart, "") End Event
Next type in the following inside the event.
' Create the MDM object and open the Short Drinks .mdd file in read-write mode Set oMDM = CreateObject("MDM.Document") oMDM.Open(sFile + ".mdd", ,2) oMDM.Contexts.Current = "Analysis"
The code that we have just typed in takes the MDD file that we are editing and opens it up in read-write mode. If you did not know this, in Dimensions it is possible to have different words displayed in the survey as opposed to the words displayed in reporting. This functionality is managed by Contexts and by default when you create your survey the text you type goes into the question context. in the background automatically if we don’t specify anything the Analysis context takes on all the text from the Question context. So because we want to clean up our reports we need to deal with the Analysis context and take out all the unwanted characters and HMTL. So the last bit of code we typed in sets the Current context to be the Analysis context. Next type the following in.
Dim oVar,oElement 'Using the StipHTML function For Each oVar in oMDM.Variables if oVar.IsSystem = False then oVar.Label = StripHTML ( oVar.Label ) For each oElement in oVar.Elements.Elements oElement.Label = StripHTML ( oElement.Label ) Next end if Next
The above bit of code loops through the questions in our MDD file and runs the function StripHTML on all question labels and the response labels for questions that are not System variables. The StripHTML function then returns the text and stores it back into the relevant lable element. Next type the following,
'Using the StipOut function For Each oVar in oMDM.Variables if oVar.IsSystem = False then oVar.Label = StripOut ( oVar.Label ) For each oElement in oVar.Elements.Elements oElement.Label = StripOut ( oElement.Label ) Next end if Next
Again we loop the variables but this time we run the StripOut function. We will see later what the stripout function does. Once the above is typed in , add the following
oMDM.Save(sFile + ".mdd") Set oMDM = null
This code saves the MDD file with all the changes in place, next we need to add the two functions StripHMTL and StripOut, first we will enter the StripHTML
Function StripHTML ( sText ) Dim objRegExp, strOutput Set objRegExp = CreateObject("VBScript.RegExp") objRegExp.IgnoreCase = True objRegExp.Global = True objRegExp.Pattern = "<(.|n)+?>" 'Replace all HTML tag matches with the empty string strOutput = objRegExp.Replace(sText, "") 'Replace all < and > with < and > strOutput = Replace(strOutput, "<", "<") strOutput = Replace(strOutput, ">", ">") StripHTML = strOutput 'Return the value of strOutput Set objRegExp = null End Function
This function is using two concepts, the first is something called Regular Expressions. If you search the web for this word you will find a hundred and one sites dedicated to this sort of stuff, basically regular expressions allow us to do some very advanced pattern matching and by finding patterns we can so do some quite complex searching and replacing. So in this example the key line really is the following,
objRegExp.Pattern = "<(.|n)+?>"
which basically says find an open html tag which is a < and then find a close html tag which in the case is the > When this is found we just blank it. As well as the regular expression we use a search and replace to find any single occurrences of < and replace them with < and the same with > we replace them with > Whilst this function is not 100% accurate , you will find that it will take out most of the html code you have in your context. Next we type the StripOut function
Function StripOut ( sText )
Dim sAnswer sAnswer = sText sAnswer = Replace(sAnswer,"[WHAT_ARE_YOU_LOOKING FOR]","") � StripOut = sAnswer End Function
This function allows you to search for specific pattern that you may have entered and replace it with nothing. You could have multiple lines like this one
sAnswer = Replace(sAnswer,"[WHAT_ARE_YOU_LOOKING FOR]","")
if you have multiple things to take out. And there you have it your script once debugged and working will take out all the HTML from your analysis context and any other unwanted characters.
The full code for this article can be found in :
http://CodeCorner.SmaterDimensions.Com @ Strip out html from metadata context
I am having troubles getting this to run. I sent it to SPSS support and their response was “I don’t know if we’ll be able to debug code? I did run your dms and got the error, but not sure where the problem is at? I’ll check with some of my co-workers to see if they may know how to fix it, if not, not sure how to help since not all VB code runs with our stuff.”
Could I send my dms to you to check? I’m fairly new with SPSS.
Thanks.
Strange answer from IBM Support , anyway , please send the dms file to smarterdimensions@live.com.au and i will take a look for you.