Statistics – Bar Graphs – Stacked (and Stacked Percent)

Creating a Stacked Bar Graph in STATA

This set of instructions is based on the protocol developed by Mr Manoharan ANDIAPPAN, Statistician at King’s Dental Institute December 2014

It takes advantage of the Data Management facilities within Microsoft Excel by preparing the data array in such a way that it may be exported to
the STATA Data Editor.  The STATA command line is then straight forward.

1.   The Reference Data Set (RDS) for the specific ethnic or ancestral group stored in Microsoft Excel is opened.

2.   Extract from the specific RDS, in this example the UK-Caucasian RDS, all the data for a given tooth.

3.   In this example the Tooth Develoment Stages for the female Lower Left Third Permanent Molar (LL8*f [FDI 38] will be used.

4.   This is opened and a new Excel Worksheet is opened a the end of the 240+ Worksheets already in the Workbook.

5.   Name this workshhet XYZABC  ie any name that appropriately identifies all the Chrononological Ages (CA)
An example, as used here,  is LL8m-STACKED  which is for the Lower Left Third Molar males

6.   Go to Worksheet for LL8Am and select all the Ages at Attainment for this Tooth Development Stage.
This data is in column F.  Copy this column of data into column A of LL8m-STACKED Worksheet.
The starting cell for this copy procedure is A2.
The reason for this is that there is no name for the variable at the top of column F in te LL8Am worksheet
Once the data is copied (starting at A2) go to A1 and name ir ll8maaa [Lower Left 8 male Age at Attainment]
The variable name used is ll8maaa  It is in lowercase as STATA reads and creates variables in lowercase.
The variable comprises ll8m to identify the tooth and the gender.  The aaa is to indicate it is the Age at Attainment.

8.   The variable name ll8maaa is entered into A1 of the LL8m-STACKED Worksheet.

9.   Go To column B and in B1 enter the variable name ll8mstage  The importance of this is that the first half of the variable denotes ll8m
The second part of the variable name indicates which of the Tooth Development Stages (TDS)  the data of AaA refere to in column A.

The Demirjian Tooth Development Stages are given the Codes A, B, through to H.  Some computer software handles these alphabetic
names poorly so for the purposes of this procedure the letters may need to be changd to numbers.  1 = A, 2 = B, 3 = C, 4 = D,
5 = E, 6 = F, 7 = G, and 8 = H.   However, for STATA the designation A, B, C, D, E, F, G, H works satisfactorily.

10.  In row two of column B i.e. B2 enter 1 to indicate that the AaA in A2 is one of the A categories.

11.  Repeat this procedure for all the AaA’s in column A so that a A appears in column B adjacent to all the AaA values of column A that are linked
or related to Stage A.

12.  Repeat this process for each of Stage B, C, D, E, F, G, & H.

13.  In column C create a new variable  lb (for lower border) using the ‘intervals’ that you have chosen.
In this case the intervals chosen are 0.5 of  year and the youngest age is 6.5.
Thus the intervals go 6.5, 7, 7.5, 8, 8.5, through to 22.5 years.  Both these limits (lower and upper) will need to selected to match the whole data array.

14.  In Column D a new variable ub (for upper border) using the ‘intervals’ that you have chosen.  Each step is 0.5 years (as in the previous column)
and each entry into column C is 0.5 of a year greater than the matching entry in column B

15.  In column E create a new variable ci (Class Interval).  It is the Column C and D combined e.g. 5.5 – 6, 6 – 6.5
This case goes up to 22.5 – 23

16.  This next procedure is part of the FREQUENCY FUNCTION in Microsoft Excel.

The FREQUENCY function calculates how often values occurr within a specific range of values.  As with  The Probability Disribution Function
you need to decide how wide the range of values should be.  In statistics these are called BINS.  In the example of DAE we are concerend, for example,
with the number of subjects in an array of data for the AaA of  LL8Am who fall between 15.500 and 15.999.  This age range is an example of a bin.
Thus for the counts of ages of Stage A within the 15.500 to 15.999 age bin, them the 16.000 to 16.499 age bin, the 16.500 to 16.999 bin and
so on up to 16.500 years the function is  = FREQUENCY(A2: A25,D2:D37)

17.  To achieve this it is necessary to go to column F of the LL8 data array and select F2 down to F37.  Then in F2 type in the formula
=FREQUENCY(A2:A47,D2:D37)
When you SAVE this formula it is activated and because you have selcted the column down to D37, the cells D2 to D37 are populated
with the same formula. BUT SEE 18 BELOW FOR SAVE ROUTINE.
The immediate result is that the counts of Stage A are then made in each of the cells.

18. The SAVE for the FREQUENCY function should be carried out after each column is populated.  THIS IS IMPORTANT.
The SAVE is achieved by pressing Ctrl-Shift and whilst holding the two keys press ENTER.

19.  The process is repeated in each of columns G, H, I, J, K, L, and M for each of the Tooth Development Stages B, C, D, E, F, G, and H respectively.

20.  This file is saved as  LL8m-STACKED.  This saves the file with all the functional elements of the worksheet.

21.  To enable it for STATA, resave the worksheet in a separate worksheet as Comma Separated Values  worksheet (.csv).
Use the same name for ease of continuity.

22.  Copy and Paste the whole of the .csv file into STATA.

23.  The command for the Stacked Bar Graph is      graph bar (asis) a-h, over(ub, label(angle(270))) stack

24.  If the Percent Stacked bar graph is required, then add to the command the option     percent
the command will now look like this         graph bar (asis) a-h, over(ub, label(angle(270))) stack percent

25.  Once the graph HAS been created it can be edited and finessed with the STATA graph editor.

~~~~~~~~~~ ||||| ~~~~~~~~~~