Social Web Analytics
The Group Project provides you with a chance to analyses the Social Web using knowledge obtained
from this unit and a computer based-statistical package. For this project, we will focus on identifying
a chosen public figure’s Twitter image.
To complete this project:
1. Read through this specification.
2. Form a group and register your group using the Project Groups
section of vUWS.
3. Choose a Public Figure that is active on Twitter, check that it is not already on the list of Group
Project Twitter Handles (https://vuws.westernsydney.edu.au/webapps/blogsjournals/
Then submit the Twitter handle of the Public Figure using the same link. Note that a given Public
Figure cannot be allocated to more than one group. If duplicate names are found on the list, the
group with the later timestamp will be asked to find a new Public Figure.
4. Complete the data analysis required by the specification.
5. Write up your analysis using your favourite word processing/typesetting program (Word, RMarkdown,
etc) and convert it to a PDF file, making sure that all of the working-out is shown
and presented well.
6. Include the student declaration text on the front page of your report. Please make sure that the
names and student numbers of each group member are displayed on the front page. If a group
member did not contribute to any part of the project, state their contribution as
0% (no contribution means 0 marks).
7. Submit the report as a PDF file by the due date using the Submit Group Project
content_id=_5877971_1&course_id=_34914_1&group_id=&mode=cpview) (link will be available
later). All code and the outputs must be shown in the project, also include comments in the code
to explain what you tried to do. Any submissions other than a PDF file will not be marked.
3 Group Size and Organisation
Students in groups of size 4 or 5 are to work together to complete this project. One project report is to
be submitted per group.
The group must be formed by signing-up to a group within the Project section of 300958 in vUWS.
Zero marks will be awarded to lone submissions.
Groups must be formed by week 8. Once the group is formed, any person within the group can
submit the report. You can submit your report as many as you want, only the last submission will be
4 Due date and Submission
The project report is due by 11:59 p.m. on Friday of week 13 (16th of October). The report must be
submitted as a PDF file using the assignment submission facilities in the Group Project Section of
300958 in vUWS. Only one student from each group needs to submit the assignment.
5 Report Format
Once the required analysis is performed by the group, the members of the group are to write up the
analysis as a report. Remember that the assessor will only see the group’s report and will be marking
the group’s analysis based on your report (R file should not be submitted however screenshot of the
code is required in the pdf file). Therefore, the report should contain a clear and concise description
of the procedures carried out, comments on the code, explanations of what you tried to do, the
analysis of results and any conclusions reached from the analysis. Include all the R code along
with its output in your assignment. Output without the code, or code without the output, will
result in zero marks for the relevant section.
The required analysis in this specification covers the material presented in lectures and labs.
Students should use the computer software R to carry out the required analysis and then present the
results from the analysis in the report.
This project is worth 30% of your final grade, and so the project will be marked out of 30. The project
consists of four investigations and will be marked using the following criteria:
Marks Criteria Satisfied
There are also two marks allocated to the presentation (based on the report formatting, style,
grammar, clarity and mathematical notation). If the report looks like something that would be
submitted to an employer, then the full two marks will be awarded.
The following declaration must be included in a clearly visible and readable place on the first page
of the report.
“Names and Student IDs of all group members who contributed the project”
Student Name Student Number Contribution(%)
By including this statement, we the authors of this work, verify that:
· We hold a copy of this assignment that we can produce if the original is lost or damaged.
· We hereby certify that no part of this assignment/product has been copied from any other
student’s work or from any other source except where due acknowledgement is made in the
· No part of this assignment/product has been written/produced for us by another person except
where such collaboration has been authorised by the subject lecturer/tutor concerned.
· We are aware that this work may be reproduced and submitted to plagiarism detection software
programs to detect possible plagiarism (which may retain a copy on its database for
future plagiarism checking).
· We hereby certify that we have read and understood what the School of Computing and
Mathematics defines as minor and substantial breaches of misconduct as outlined in the learning
guide for this unit.
Note: An examiner or lecturer/tutor has the right not to mark this project report if the
above declaration has not been added to the cover of the report.
8 Project Description
A well-known Public Figure is investigating its public image and has approached your team to
identify what the public associates with their name. They want the four pieces of analysis to be
8.1 Analysis of Twitter language about the Public Figure
The Public Figure wants you to examine the language used in tweets about them.
They would like to have a general idea about what people are talking about.
Use rtweet package to download tweets.
1. Use search_tweets function from the rtweet library to search for 2000 tweets about the person
you selected. Save as “tweets”.
2. Pre-process your data and construct a document-term matrix of the tweets by using TF-IDF
3. Construct a word cloud of the word in your document term matrix.
4. Comment on your findings.
8.2 Clustering the tweets
Public figure wants you to identify the topics in the tweets. They want to be
aware of at least 2 topics. We do not want to present all tweets to them, so we must identify if there is
a set of common tweet themes between tweets.
By using your pre-processed document term matrix that you generated in section 1, compute the
5. Find the most appropriate number of clusters using the elbow method for the tweets by using
6. Cluster the tweets using k-means clustering.
7. Identify the number of tweets in each cluster. Which cluster is the largest?
8. Visualize your clustering in 2-dimensional vector space to present it. Show each cluster in a
9. Create the dendrograms of the words in the most populated two clusters only. You should build
two separate dendrograms by using complete linkage clustering for these clusters. You do not need
to visualize all words in your dendrogram, set up appropriate boundaries to improve your
visualization. Make sure your visualizations is readable!
10. Comment on your findings.
8.3 Who to follow
The public figure wants to understand how they can increase their follower
number. We believe that it is best to be active on Twitter for this aim, but we are unsure if this is true.
To examine this, we want to test if there is a relationship between the number of followers and the
number of tweets that a user posted. To perform this:
11. Find the top 100 tweets that are retweeted the most in your tweets.
12. Identify the users of these tweets.
13. Get the follower count and the statuses count (number of tweets they posted) of these twitter
14. Apply the appropriate statistical test to test the relationship between follower and statuses
15. Comment on your result.
8.4 Building Networks
In this section, we want to create an outlook of the public figure’s Twitter
network. To perform this(you can use the twitteR package in this section) :
16. Find the most popular 10 friends of the chosen Twitter handle.
17. Obtain a 1.5-degree egocentric graph centred at the chosen Twitter handle and plot the graph.
The egocentric graph should contain the most popular 10 friends of the chosen Twitter handle
18. Compute the popularity of each vertex in your graph by using Page Rank method. List the top 3
most popular people in your graph according to the Page Rank scores.
19. Comment on your result.
It is strongly recommended that you save your tweets once you have downloaded them (e.g. to an
RData file). Otherwise, you will get different tweets each time you run your script and you will need
to change your clustering.
Note that in Section 8.4, depending on the friends of the chosen twitter handle, you possibly will
reach the rate limit of the Twitter API. I strongly recommend that you save your objects as an RData
file once you download friends – so you can continue downloading friends the following day or with a
different authentication key. For more information on how to save your objects see:
See this https://developer.twitter.com/en/docs/basics/rate-limiting.html
(https://developer.twitter.com/en/docs/basics/rate-limiting.html) for more information on the rate limit.
The Public Figure wants the above three-part analysis to be written up as a professional report. Each
part should have its section of the report and all questions should have thoughtful answers. Include
all the R code along with its output in your assignment. Output without the code, or code without the
output, will result in zero marks for the relevant section.
Such a cheap price for your free time and healthy sleep
All online transactions are done using all major Credit Cards or Electronic Check through PayPal. These are safe, secure, and efficient online payment methods.