Wednesday, May 5, 2021

A Tale of Two Analysis Programs: Excel and SPSS


Excel and SPSS (statistical packages for the social sciences) each have their strong points and weaknesses when it comes to data analysis and presentation.  In this example, I am going to use Excel and SPSS to help me demonstrate how crime increases and decreases through various months in Washington, D.C.  I am going to use the 2013 Crime Incidents csv file from the Washington, D.C. Office of the City Administrator available at: http://data.dc.gov/ .   After downloading the file, I open it with MS Excel and perform a “save-as” in order to save the csv file as a spreadsheet.  This will allow me to use formulas and better manage the data.  The first thing to realize about this dataset is that it contains 35152 records!  That is a lot of data to manage!
 

You can see that the report and time variable in column B is in the format mm/dd/yyyy  hh:mm.  Ideally, I would like a way to convert these dates to a three letter month abbreviation for ease of analysis.    Fortunately, Excel’s rich function library will allow me to perform the task.  I insert a column  to the right of column B, the REPORTDATETIME column.  I call the new column MONTHS.  In cell C2,  I use the following formula to convert the date/time format into the three letter date abbreviation:  =TEXT(B2, "mmm") where TEXT is the name of the function, B2 is the location of the data to be converted, and “mmm” represents the three letter abbreviation.  In a smaller dataset, I would simply copy the formula down using my mouse.  However with 35000 plus rows, the better solution is to highlight the cell containing the formula, move the mouse to the lower right hand side, and double click.  Doing so populates the entire column with the formula and results. 



My goal is to demonstrate how crime increases and decreases through the months.  While the spreadsheet contains a list of the months and a list of the reported offenses, it does not give me a count of reported offenses nor does it categorize the counts of crimes according to each month.  Excel does have PivotCharts for this purpose, but the process with this much data is cumbersome at best.  The easier solution, as we will see, is to import the data we need to analyze into SPSS.

SPSS allows users to import entire spreadsheets (both Excel and csv formats), text files, and MS Access files to name a few.  But for the purpose of this example, we only need to import two columns of data, the months column and the offense column.  The first thing I want to do is make sure I already have SPSS open and ready to use.  In Excel, I highlight each column, copy them, and paste the columns into SPSS.



I need to perform a little maintenance on the data I just pasted.  Row 1, containing MONTHS and OFFENSE must be cleared since these are labels, not variables.  In the SPSS Variable View window, I will use months and offense as labels to identify the variables.   The data is now ready for analysis.
 

SPSS allows a user to simply count variables, called frequency analysis in SPSS.  SPSS can be used to provide the counts or frequencies of the individual crimes and individual months as shown below.

      
However, one of SPSS strongest tools is crosstab analysis.  Crosstabing creates a table of summaries that shows how data interacts.  When we crosstab Months with Offenses, we can count the number of crimes committed in each month.  We can count individual categories of crimes for each month.  We can also count each category of crime for the year, all while viewing one table of data.  For example, in the month of February, there were 2234 total crimes reported.  Of these, 184 were burglaries.  And there were 3346 burglaries for the year.  

 
 As useful as SPSS is for performing numeric analysis and visualizing this analysis in organized tables, I find SPSS graphing functions to be a bit tedious when compared to Excel. Excel also produces better quality graphs that are more easily edited and enhanced for presentation.   Importing data from SPSS into Excel is a relatively easy process. I simply right click over the MONTH/OFFENSE crosstab and select copy.  I copy this data into a blank spreadsheet in Excel, making sure I use the paste special option so that the table maintains data integrity.





Creating a graph that demonstrates rate of crime through the months  is a matter of selecting the correct data and proper chart, in this case, months and total crimes for our data and a line chart as our display graph.   A couple of formatting edits of the chart and I am ready to discuss the data, which seems to show that crime increased from February on, and peaked in September.   


I can easily create an additional graph comparing numbers of the types of crime committed by selecting the crime labels, the totals for each label, and an appropriate pie chart. 




The charts can easily be used in a PowerPoint presentation or on a web site.  The ability to work with different software and move data from one tool to another is an easy and profitable skill to develop, and all it takes is a little practice.