So you need to buy some hardware to host a Data collection setup. You have no idea what you need so where do you start. This series of article will help you make some of those important decisions and get you thinking as to what your final solution will be. When creating a Data Collection system it really boils down to how fault tolerant you want the system to be and how much money you have. Having an idea on how many session engines you require is a starting point , but it does not answer all your questions.
When we start to talk about fault tolerance we are talking about how will our system cope if we have a hardware failure. Will it just die and we won’t be able to do anything until it is fixed or will it be able to continue to capture data at least while things are being resolved.
If you are a small time user and have a limited budget then you may be forced to have everything on one machine as show in the following image.
This is known as a “Single Machine Install” and whilst it is more than adequate to handle considerable load you have to consider what would happen if one bit of the hardware failed , let’s say a hard disk crash, how long will your server be down and what will happen to all the live jobs you are running.
To run a Data Collection system you must have certain components in place, and in a “Single Machine Install” all these components are on the same machine , Lets explain them,
Operating System : This is what drives the hardware. at the time of writing this article Windows 2003 sp2 Standard or Enterprise , 32 or 64 bit is supported. If you want to have one machine with more than 8gig of memory then you will have to have the Enterprise version.
Internet Information Services : This is the application that comes with your server operating system that allows you to run the Data Collection Interviewer server. Microsoft Internet Information Server (IIS) 6.0 or later is supported.
Microsoft Sequel Server : This is a Microsoft product that is used to store your data or respondent answers. Microsoft SQL Server™ 2008 or SQL Server 2005 or Microsoft SQL Server 2005 Express Edition are the supported versions. Just one thing to note here, if you have allot of expected load then don’t use the express editions, these products are not built for heavy load systems.
DotNet Framework : This is another Microsoft product that helps Data Collection Run. When it was designed parts of the DotNet Framework was used to produce the final system so Data Collection will not run without it. Microsoft .NET Framework 2.0 is the supported version.
Data Collection Server Products : These are the product you can purchase from SPSS. They include all the different server based capture modes. In a later article we will talk about each product and some of the things you need to consider when setting up a system, but for now we will concentrate on the different tiers that they can be installed to. Data Collection uses 3 tiers and they are,
Web Tier : At this level the Players, Web Service and Image cache are hosted. The Players are the components that server up the HMTL page to the respondent. There are a few players provided out of the box ( WEB,XML,CATI ) , and it is also possible to create your own. The Web Service helps control what is happening on the site, this includes the session engines talked about in previous articles. And finally the Image cache, this is the application that helps you , present your surveys with the look and feel you require. For more information on any of these components in the Web Tier search the DDL using the words “Web Tier”
Interviewer Service Tier : This tier manages the connections between the Web Tier and the Database Tier. amongst other things this is where the Session Engines do their work and present the questions as required. For more information on the Interviewer Services Tier search the DDL using the words “Interviewer Service Tier”
Database Tier : This tier hosts the case data, sample management, and Project Management databases. There are no actual Data Collection components installed to the server everything is controlled from the Interview Service Tier. For more information on the Database Tier search the DDL using the words “Database Tier”
For more info Search for “Interviewer Server Architecture” in the DDL
How fault tolerant is a Single Machine Setup?
Now that we have an understanding of the different components lets think about how fault tolerant this system would be. And to be honest its simple. If anyone of the above components fail, then your system will be down until you can get it fixed. So there is no redundancy in this system at all.
What Hardware I Required for a Single Machine Setup?
If you have decided that a single install is the way to go for you there are a few things you need to consider when it comes to hardware. In the best case scenario you would need the following,
RAM: Ram or random Access Memory is what the session engines and your system need to run, too little ram and things will not function the way they should. At a minimum we recommend 3 gig, this would allow around 1 gig for the session engine, 1 gig for SQL and 1 for the Operating system. If you require additional session engines then start at 2 gig and add a gig for each engine. It should be noted that if your OS is just standard windows 2003 , this OS will only know how to handle 8 gig of memory. Anything above that and you will need the Windows 2003 advanced server.
CPU Processor: Get the best & the fastest you can afford here, at a minimum a dual core but the more grunt you have in this area the faster things will happen. If you have a large number of session engines, let’s say 8 , then seriously consider a quad core machine
Hard disk : Normally when you purchase a single machine it will come with a hard disk, in an ideal world we would recommend that you purchase additional ones. Lets explain why.
OS Disk : The disk that comes with the machine should be allocated to the OS. The OS will do its own reading and writing to the disk and it is best to try and keep that separate from other actions. For the OS disk we would recommend 80 gig.
Data Disk : Normally all your programs get installed to the OS disk, but the data that they create should be written to a separate disk. Data Collection has some files that it cannot do without, when surveys are running / collection data. These files are stored in a windows share called the FMROOT, and if we lose this we are in big trouble. Putting this folder on its own disk allows us to easily backup all files on the drive. And in the event of an OS hard disk failure you don’t lose your data disk. This Disk should be at least 80 gig.
SQL Disk : For the best performance of SQL we would recommend that you get two disks. When SQL saves data it stores it in two files. The first is the “.mdf” file which is the actual data, the second file is the “.ldf” which is a step by step set of instruction for SQL to re-make the data in the “.mdf” again if needed. Now if you imagine the a record player with the record spinning around and around with the needle starting at the outside and working its way into the middle, this is exactly what the hard disk head is doing when it is writing a “.ldf” file to the disk. If you now imagine the needle of the record player jumping around the record going in and out randomly , then this is like the disk head when it is writing a “.mdf” file. So having these two process on separate disks is going to be faster than having it happen on the same disk. The size of these disks in an ideal world would be at least 200 gig each.
Raid : In simple terms RAID is the ability to allow you to , whilst your server is running , pull out one of the hard disks when it fails and replace it, but still keep the system running. There are different levels of RAID and depending on what you chose will depend on what happens , but basically if you have decided to have a server that has 4 physical hard disks and you wanted them to have RAID you would have to buy another 4 disk , so a total of 8 disk. these extra 4 disk then , with the help of the RAID , mirror what is happening on the other disks, so that they can be pulled out in an emergency if required.
So now you know a fair bit about the hardware that you will need to purchase in a Single Machine Setup. And to be honest, ask yourself the question, can I afford to lose the data and collect it again ? That should help you decide what you need in this setup.
In our next article we will begin to expand on this setup and see what the next logical topology would be and talk about the pros and cons of it.