99爱在线视频这里只有精品_窝窝午夜看片成人精品_日韩精品久久久毛片一区二区_亚洲一区二区久久

合肥生活安徽新聞合肥交通合肥房產生活服務合肥教育合肥招聘合肥旅游文化藝術合肥美食合肥地圖合肥社保合肥醫(yī)院企業(yè)服務合肥法律

TCS3393 DATA MINING代做、代寫Python/Java編程

時間:2024-03-24  來源:合肥網hfw.cc  作者:hfw.cc 我要糾錯



FACULTY OF ENGINEERING, BUILT-ENVIRONMENT, AND INFORMATION
TECHNOLOGY (FOEBEIT)
BACHELOR OF INFORMATION TECHNOLOGY (HONS)
JANUARY-MAY 2024 INTAKE
TCS3393 DATA MINING
GROUP ASSIGNMENT [2-3 members per group]
This assignment is worth 25% of the overall marks available for this module. This assignment
aims to help the student explore and analyse a set of data and reconstruct it into meaningful
representations for decision-making.
The online landscape is ever-evolving, with websites serving as crucial assets for businesses,
organizations, and individuals. As the internet continues to grow, the need for accurate and
efficient website classification becomes paramount. Understanding the nature of websites, their
content, and the user experience they provide is vital for various purposes, including online
security, marketing strategies, and content filtering.
Embarking on a data science project, you collaborate with a cybersecurity firm dedicated to
enhancing web security measures. The firm provides you with a rich dataset encompassing
various attributes of websites, including their URLs, user comments, and assigned categories.
Your objective is to develop a classification model capable of accurately categorizing websites
based on these variables.
The dataset includes information on the URLs of different websites, user comments associated
with those websites, and pre-existing categories assigned to them. The challenge lies in creating
a model that not only accurately classifies websites but also adapts to the dynamic nature of the
online environment, where new types of websites constantly emerge.
Introduction
2
Your goal is to implement advanced data analysis techniques to train a model that enhances the
efficiency of web classification.
Techniques
The techniques used to explore the dataset using various data exploration, manipulation,
transformation, and visualization techniques are covered in the course. As an additional feature,
you must explore further concepts which can improve the retrieval effects. The datasetprovided
for this assignment is related to the website classification.
Dataset
This dataset contains information on 1407 websites URL. It includes 3 variables that describe
various categories of websites. The dataset will be analyzed using subsets of these variables for
descriptive and quantitative analyses, depending on the specific models used.
Objective:
Develop a classification model to categorize websitesusing advanced data science techniques.The
model should robustly classify the website based on comments stated in the dataset.
Tasks:
1. Data Exploration:
• Conduct an initial exploration of the dataset to understand its structure, size, and
variables.
• Examine the distribution of website categories to identify any imbalances in the
dataset.
• Explore the distribution of URLs and user comments length to gain insights into the
data.
Assignment Task: Websites Classification
3
2. Descriptive Analysis:
A. Basic Exploration:
• Describe the structure of the dataset. How many observations and variables
does it contain?
• What are the data types of the variables in the dataset?
B. Statistical Summary:
• Provide a statistical summary of the 'Category' variable. What are the most
common website categories?
• Calculate basic descriptive statistics (mean, median, standard deviation) for
relevant numeric variables.
C. URL Analysis:
• Analyze the distribution of website URLs. Are there any patterns or
commonalities?
• Are there any outlier URLs that need special attention?
3. Data Preprocessing:
A. Cleaning Text Data:
• Explore the 'cleaned_website_text' variable. What preprocessing steps would
you take to clean text data for analysis?
• Implement text cleaning techniques and explain their importance in preparing
data for text-based analysis.
B. Handling Missing Values:
• Identify if there are any missing values in the dataset. Propose strategies for
handling missing values, specifically in the 'cleaned_website_text' column.
4. Visualization:
A. Category Distribution Visualization:
• Create a bar chart or pie chart to visually represent the distribution of website
categories.
• How does the visualization help in understanding the balance or imbalance of
the dataset?
B. Text Data Visualization:
• Generate word clouds or frequency plots for the 'cleaned_website_text'
variable. What insights can be gained from these visualizations?
4
5. Model Development
A. Data Mining Analysis:
• Split the dataset into training and testing sets for model evaluation.
• Implement various machine learning algorithms for classification, such as logistic
regression, decision trees, or random forests.
B. Training and Evaluation
• Evaluate the performance of each model using metrics like accuracy, precision, recall,
and F**score.
• Discuss the challenges and considerations specific to evaluating a model for website
classification.
6. Advanced Techniques:
i. Feature Engineering:
• Propose additional features that could enhance the model's performance.
How might these features capture more nuanced information about websites?
ii.Dynamic Nature of Websites:
• Given the dynamic nature of the online environment, how could the model
adapt to newly emerging website types? Discuss strategies for model
adaptation.
7. Create Dashboard, Report and Conclusions:
• Summarize the findings, including insights gained from exploratory data analysis and
the performance of the classification model.
• How interpretable is the chosen model? Can you explain the decision-making process
of the model in the context of website classification?
• Provide recommendations for further improvements or considerations in the dynamic
landscape of web classification.
• Reflect on the challenges encountered during the analysis. What potential
improvements or future work would you recommend to enhance the model's
performance?
This assignment allows students to apply knowledge of data exploration, preprocessing, data
modelling, and model building to solve a real-world problem in the business domain. It also
encourages them to explore additional concepts for improving model performance.
5
• The complete Python program (source code (ipynb)) and report must be submitted to
Blackboard.
• Python Script (Program Code):
o Name the file under your name and SUKD number.
o Start the first two lines in your program by typing your name and SUKD
number. For example:
# Nor Anis Sulaiman
#SUKD20231234
o For each question, give an ID and explain what you want to discover. For example:
a. Explore the distribution of website categories in the dataset. Are there any specific
categories that are more prevalent than others?
b. Visualize the distribution of URL lengths and user comments lengths. Are there patterns
or outliers that could be informative for the classification model?
c. What steps would you take to clean and preprocess the URLs and user comments for
effective analysis?
d. How might you handle any missing values in the dataset, and what impact could they
have on the classification model?
e. Provide descriptive statistics for key variables such as URL lengths and user comments
lengths. What insights can be derived from these statistics?
f. Explore potential additional features that could enhance the model's ability to classify
websites accurately.
g. How might the inclusion of features derived from URLs or user comments contribute
to the overall model performance?
h. Choose a classification algorithm suitable for website classification. Explain your
choice.
i. Implement the chosen algorithm using Python and relevant libraries. What
considerations should be taken into account during the model implementation phase?
j. Split the dataset into training and testing sets. How would you assess the performance
of the model using metrics like accuracy, precision, recall, and F**score?
k. Discuss potential challenges in evaluating the model's effectiveness and generalization
to new websites.
l. Create visualizations to interpret the model's predictions and showcase its classification
performance.
Deliverables
6
As part of the assessment, you must submit the project report in printed and softcopy form,
which should have the following format:
A) Cover Page:
All reports must be prepared with a front cover. A protective transparent plastic sheet can be
placed in front of the report to protect the front cover. The front cover should be presented with
the following details:
o Module
o Coursework Title
o Intake
o Student name and ID
o Date Assigned (the date the report was handed out).
o Date Completed (the date the report is due to be handed in).
B) Contents:
• Introduction and assumptions (if any)
• Data import / Cleaning / pre-processing / transformation
• Each question must start in a separate page and contains:
o Analysis Techniques - data exploration / manipulation / visualization
o Screenshot of source code with the explanation.
o Screenshot of output/plot with the explanation.
o Outline the findings based on the results obtained.
• The extra feature explanation must be on a separate page and contain:
Documents: Coursework Report
7
o Screenshot of source code with the explanation.
o Screenshot of output/plot with the explanation.
o Explain how adding this extra feature can improve the results.
C) Conclusion
• Depth and breadth of analysis
• Quality and depth of feedback on the analysis process
• Reflection on learning and areas for improvement
D) References
• The font size used in the report must be 12pt, and the font is Times New Roman. Full
source code is not allowed to be included in the report. The report must be typed and
clearly printed.
• You may source algorithms and information from the Internet or books. Proper
referencing of the resources should be evident in the document.
• All references must be made using the APA (American Psychological Association)
referencing style as shown below:
o The theory was first propounded in 1970 (Larsen, A.E. 1971), but since then has
been refuted; M.K. Larsen (1983) is among those most energetic in their
opposition……….
o /**Following source code obtained from (Danang, S.N. 2002)*/
int noshape=2;
noshape=GetShape();
• A list of references at the end of your document or source code must be specified in the
following format:
Larsen, A.E. 1971, A Guide to the Aquatic Science Literature, McGraw-Hill, London.
Larsen, M.K. 1983, British Medical Journal [Online], Available from
http://libinfor.ume.maine.edu/acquatic.htm (Accessed 19 November 1995)
Danang, S.N., 2002, Finding Similar Images [Online], The Code Project, *Available
from http://www.codeproject.com/bitmap/cbir.asp, [Accessed 14th *September 2006]
Further information on other types of citation is available in Petrie, A., 2003, UWE
Library Services Study Skills: How to reference [online], England, University of
請加QQ:99515681  郵箱:99515681@qq.com   WX:codehelp 

掃一掃在手機打開當前頁
  • 上一篇:ECM1410代做、代寫java編程設計
  • 下一篇:代做CS 550、代寫c++,Java編程語言
  • 無相關信息
    合肥生活資訊

    合肥圖文信息
    急尋熱仿真分析?代做熱仿真服務+熱設計優(yōu)化
    急尋熱仿真分析?代做熱仿真服務+熱設計優(yōu)化
    出評 開團工具
    出評 開團工具
    挖掘機濾芯提升發(fā)動機性能
    挖掘機濾芯提升發(fā)動機性能
    海信羅馬假日洗衣機亮相AWE  復古美學與現代科技完美結合
    海信羅馬假日洗衣機亮相AWE 復古美學與現代
    合肥機場巴士4號線
    合肥機場巴士4號線
    合肥機場巴士3號線
    合肥機場巴士3號線
    合肥機場巴士2號線
    合肥機場巴士2號線
    合肥機場巴士1號線
    合肥機場巴士1號線
  • 短信驗證碼 豆包 幣安下載 AI生圖 目錄網

    關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 hfw.cc Inc. All Rights Reserved. 合肥網 版權所有
    ICP備06013414號-3 公安備 42010502001045

    99爱在线视频这里只有精品_窝窝午夜看片成人精品_日韩精品久久久毛片一区二区_亚洲一区二区久久

          9000px;">

                亚洲欧洲综合另类| 奇米影视在线99精品| 免费高清不卡av| 欧美日韩一区高清| 亚洲成人午夜电影| 日韩欧美在线观看一区二区三区| 另类欧美日韩国产在线| 中文字幕精品一区| 欧美撒尿777hd撒尿| 麻豆国产一区二区| 久久久久久久久久电影| 91首页免费视频| 午夜精品久久久久| 欧美激情综合在线| 欧美在线播放高清精品| 狠狠色丁香婷婷综合久久片| 亚洲色图欧洲色图| 日韩欧美中文字幕一区| 99国产精品久| 久久精品国产秦先生| 亚洲视频一区在线| 欧美电影免费观看高清完整版| 99久久精品一区| 激情另类小说区图片区视频区| 中文字幕一区二区三区不卡在线| 欧美一区二区三区啪啪| 不卡视频一二三四| 麻豆成人久久精品二区三区小说| 国产综合久久久久影院| 自拍偷拍亚洲欧美日韩| 色菇凉天天综合网| 国产suv精品一区二区883| 视频一区欧美精品| 亚洲综合自拍偷拍| 亚洲视频图片小说| 国产亚洲福利社区一区| 一本大道综合伊人精品热热| 国产精品一二三在| 久久国产麻豆精品| 天堂影院一区二区| 一区二区久久久久| ●精品国产综合乱码久久久久 | 蜜臀av一区二区三区| 国产精品毛片久久久久久久| 精品免费国产一区二区三区四区| 欧美午夜精品一区二区蜜桃| 成人午夜激情视频| 国v精品久久久网| 福利一区二区在线| 成人一区二区三区在线观看| 国产在线国偷精品产拍免费yy| 美女高潮久久久| 免费成人在线网站| 麻豆国产精品视频| 国产高清精品久久久久| 成人午夜看片网址| 欧美亚洲一区二区三区四区| 欧美日韩国产一级片| 欧美日韩精品电影| 日韩一级片网站| 国产无人区一区二区三区| 亚洲国产精品黑人久久久| 国产精品欧美极品| 亚洲欧美激情一区二区| 伊人夜夜躁av伊人久久| 日韩在线一区二区| 久久91精品国产91久久小草| 国产老肥熟一区二区三区| 99久久精品免费观看| 精品视频在线看| 日韩欧美专区在线| 欧美国产1区2区| 亚洲国产成人av| 国产美女久久久久| 色成人在线视频| 日韩免费看网站| 国产精品资源在线看| 成+人+亚洲+综合天堂| 在线观看日韩国产| 91精品久久久久久蜜臀| 精品福利一区二区三区免费视频| 欧美韩日一区二区三区四区| 亚洲美女屁股眼交| 青青草国产成人av片免费| 狂野欧美性猛交blacked| 国产a视频精品免费观看| 欧美日精品一区视频| 久久免费电影网| 亚洲一区二区3| 国产成人免费视频一区| 欧美色偷偷大香| 精品国产网站在线观看| 亚洲日本丝袜连裤袜办公室| 亚洲一区在线观看免费观看电影高清| 看电影不卡的网站| 欧美伦理视频网站| 国产精品国产三级国产普通话三级 | 久久精品国产99久久6| 91成人免费电影| 久久久国产综合精品女国产盗摄| 国产精品午夜春色av| 久久aⅴ国产欧美74aaa| 欧美性猛交xxxx乱大交退制版| 久久欧美一区二区| 一区二区三区影院| 国产老女人精品毛片久久| 欧美一区二区福利在线| 亚洲一区二区三区美女| 本田岬高潮一区二区三区| 久久99国产精品免费网站| 欧美日韩不卡一区二区| 一区二区三区日韩欧美精品| 激情综合五月婷婷| 69p69国产精品| 综合久久国产九一剧情麻豆| 国产真实乱偷精品视频免| 在线一区二区三区做爰视频网站| 久久久青草青青国产亚洲免观| 亚洲不卡一区二区三区| 91麻豆精品秘密| 国产欧美精品在线观看| 国产一区视频在线看| 欧美成人欧美edvon| 另类调教123区| 欧美日本在线看| 亚洲国产日日夜夜| 97久久精品人人爽人人爽蜜臀| 久久久久免费观看| 国产成人在线色| 久久精品视频在线免费观看| 蜜臀av亚洲一区中文字幕| 91精品国产手机| 一区二区三区欧美久久| 91小视频免费观看| 亚洲欧美激情小说另类| 99亚偷拍自图区亚洲| 亚洲人成网站色在线观看| av电影在线不卡| 亚洲精品日韩综合观看成人91| 91黄视频在线观看| 最新热久久免费视频| 欧美手机在线视频| 丝袜美腿亚洲色图| 精品国产凹凸成av人网站| 国产精品77777竹菊影视小说| 国产精品美女久久久久aⅴ| 99久久精品99国产精品| 亚洲一线二线三线久久久| 日韩欧美中文一区二区| 丁香网亚洲国际| 亚洲黄色小视频| 欧美一区二区人人喊爽| 国产在线精品不卡| 亚洲人xxxx| 日韩欧美亚洲国产另类| 波多野洁衣一区| 日本欧美在线观看| 中文字幕一区二区在线播放| 欧美午夜精品一区| 东方欧美亚洲色图在线| 日韩和欧美一区二区三区| 久久蜜桃av一区精品变态类天堂| 色乱码一区二区三区88| 久久99在线观看| 一区二区三区在线免费| 久久这里只有精品6| 91成人国产精品| 波多野结衣亚洲| 久久精品久久综合| 亚洲一卡二卡三卡四卡 | 六月丁香综合在线视频| 国产精品久久久久久久久久久免费看| 在线一区二区视频| 韩国欧美一区二区| 日韩综合一区二区| 最近中文字幕一区二区三区| 欧美一区二区日韩| 欧美在线观看一二区| 懂色中文一区二区在线播放| 久久精品国产色蜜蜜麻豆| 亚洲国产视频网站| 亚洲综合色网站| 亚洲欧美色一区| 国产精品欧美一区二区三区| 日韩免费一区二区| 正在播放一区二区| 欧洲一区二区三区免费视频| 99热99精品| proumb性欧美在线观看| 国产精品一级在线| 国产精品一区二区三区乱码| 极品少妇一区二区三区精品视频| 日韩综合一区二区| 蜜臀精品一区二区三区在线观看 | 国产精品福利一区二区| 国产午夜久久久久| 国产亚洲欧洲997久久综合| 日韩欧美在线不卡| 日韩久久久久久| 精品三级在线观看|