Hello today the blog is about SAS graphs on SAS OnDemand
Academics.
Graphs using Gplot procedure:
Code:
- Gplot to plot simple scatter plots.
- Plot statement says plot y-axis variable * x-axis
variable
|
Output |
Code
- There is something symbol option.
- It is a global statement just like title and
footnote statements
- The i = join it turns scatter plot into line
plot.
- The value option changes the shape of data
points
- The colour/c option changes the colour of data
point
- Go reset options reset the symbol statement
basically in next step of code you can apply new symbols rather than using the
same symbol for another graph.
|
Code for symbol statement |
|
Output when we mention symbol statement |
Example dataset: You can find the same
example in a book that I had referenced at the end of blog
Univariate procedure and scatterplot/ dataset:
|
Glimpse of example dataset |
|
Code and tasks in comment section |
Code:
We have two questions to answer one is relationship that is scatter
plot on the variable mortality and hardness. Another one is hypothesis that is whether the mortality and
hardness of water value differs in cities that are located at north and south side
of country?
- Plan is plot scatter plot simple
- Then generate basic statistics and histogram to
know distribution of these variable
- Get probability plots (probplot)
- We got basically from univariate procedure histogram
shows mortality variable approximately follows normal distribution.
- Hardness of water follows more skewed
distribution so either we have to follow non-parametric test to answer second
question or some transformation is need that is log transformation.
- You know by condition statement at first, we
created a geofactor variable telling which city is located on which side referencing
to location attribute that was present in raw data.
- Just a simple scatter plot
|
For simple scatter plot code mortality versus hardness of water |
|
Output graph mortality versus hardness of water |
- Plot statement
plot y-axis variable * x-axis variable = categorical variable.
Sorting and Correlation of variables:
- Sort the data by that north-south geofactor
variable.
- Then use correlation procedure to generate correlation
of variables in south and north sides.
Below code for getting correlation between variables in both south and north side of cites.
·
|
Correlation procedure output for northern cities |
|
Correlation procedure output for southern cities |
Interpretation of Correlation Procedure:
In north side of cities, the variables share negative relationship with spearman coefficient (-0.40)
In south side of cities, the variables share negative relationship with spearman coefficient (-0.59)
Transformation of hardness variable for normal approximation:
- Transformation of hardness variable by creating
a new variable.
- Univariate procedure to look into distribution
of new log transformed variable.
|
Code for log transformation of variable and univariate procedure |
- Transformed one approximately follows normal
distribution.
Now we move on to our hypothesis
|
Code for T test and Wilcoxon test
|
|
Output for T test procedure - 1 |
|
Distribution plot in output for mortality variable |
|
Output of T test procedure for log hardness (transformed) variable- 2 |
|
Distribution of log transformed hardness variable |
Interpretation of T test procedure:
From above outputs and plots we can say that for mortality rate there is significant mean difference that is the mean value of mortality for south cities is different than north cities in which north side cites have higher mortality rate than south cities (<0.001)
For log transformed hardness variable there is significant difference between mean hardness values (0.0003). From distribution plot the log transformed hardness value is higher in south cities than in north cities.
Wilcoxon test: non parametric test for untransformed hardness of water variable
|
Code for Wilcoxon test |
|
Output for Wilcoxon test for hardness of water - 1 |
|
Plot for untransformed hardness of variable |
Interpretation of Wilcoxon test:
From table and plot we can see that there is significant difference in the hardness value for south cites and north cites.
Reference
- A Handbook of
Statistical Analyses
using SAS Second Edition By Geoff Der and Brian S. Everitt Section 2.1 Chapter Two
Clean, sorted and very helpful
ReplyDelete