Sunday, November 24, 2019

Statistics Coursework Authorship Essay Example

Statistics Coursework Authorship Essay Example Statistics Coursework Authorship Essay Statistics Coursework Authorship Essay My aim is to investigate whether it is possible to gain information about authorship of a text by using statistical measures. I will investigate the authorship of an Adult text and a Child text. I will calculate the mean of the distribution for both populations. From this, I will calculate the standard deviation and variance. I will use the unbiased estimator for both populations. I will calculate the standard error and confidence intervals for both populations. I will represent my data using frequency distribution tables. I will put my results into a frequency distribution graph. For the confidence intervals, I will use normal distribution diagrams. Hypothesis I predict that there will be more letters per word in Great Expectations by Charles Dickens and fewer in Charlie and the Great Glass Elevator by Roald Dahl. Therefore, the mean in Great Expectations will also be larger. I expect Great Expectations to have a larger standard deviation because of the use of a larger vocabulary. Population I will randomly select 50 pages from each book by using the RAND function in Microsoft Excel. Once I have 50 random pages for each book, I will select a random line for each page. I will finally select a random word from each line. Using the RAND function I got my random numbers by using the following process. e.g. 248 à ¯Ã‚ ¿Ã‚ ½ RAND (248 = number of pages in book) 36 à ¯Ã‚ ¿Ã‚ ½ RAND (36 = number of lines on page) 13 à ¯Ã‚ ¿Ã‚ ½ RAND (13 = number of words on line) I will count the number of lines on each page and times this with the RAND function to make the random number correct each time. I will also use this same process with which word to select on each line. Sampling Sampling is the selection of individual members from a population. The advantage of taking a sample is that it is cheaper, quicker and the results are easier to analyse than the results of a census. However, the disadvantage is the results may include natural variation or bias and so may not be representative of the whole population and it may not be accurate. There are rules that must be followed when choosing a sample. The sample size must be large enough so that the results are more accurate. A very small sample may not represent the rest of the population. So I must make sure that any sample I take is large enough to be representative of the population as a whole. So in order to get more accurate results and for the data I collect to be representative of the whole population, I am going to take 50 samples in total for both the books. The sample should be taken at random. If a random sample is not taken, then my results may be biased. If I choose which page and which line, I wanted to count the number of words then I will end up with data, which is unrepresentative. So in order to get a set of data, which is representative, I used the RAND function in Microsoft Excel to get the random page number, line number and word number. Method For this investigation, I am finding out whether it possible to gain information about authorship of a text. I will be using and adult text and a child text. The adult text that I will be using is Great Expectations by Charles Dickens. This book consists of 484 pages. The child text I will be using is Charlie and the Great Glass Elevator by Roald Dahl. This book consists of 190 minus eight pages at the beginning of the text. I will select 50 random pages from each book. I will then select a random line and word on each of these pages. Assumptions The distribution of the parent population is normal. We have to assume that the distribution of the sample is also normal to have accurate results. I have assumed that a sample size of 50 would give me a normal distribution. Statistical Theory After collecting the data, I will set the results out in a tally chart because it is easier to understand and analyse. I will use the central limit theorem because it will make it easier to make predictions about the distribution of the sample mean even if the distribution of the parent population is not known. I will draw a frequency distribution graph to show the distribution of the data for both books. I will also work out the mean to work out the average. The variance and standard deviation would help me measure the spread of the data. If I work out the standard error then it will help me to be confident in my estimate of the population mean. In addition, I am going to use unbiased estimation because this will help me to find the variance of the parent population.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.