合肥生活安徽新聞合肥交通合肥房產生活服務合肥教育合肥招聘合肥旅游文化藝術合肥美食合肥地圖合肥社保合肥醫院企業服務合肥法律

        代寫MLDS 421: Data Mining

        時間:2024-02-21  來源:合肥網hfw.cc  作者:hfw.cc 我要糾錯


        Individual Assignment (100 points)

        Instructions:

        • Submit the paper review as a word or pdf file.

        • Submit code as a Python notebook (.ipynb) file along with the HTML version.

        • Write elegant code with substantial comments. If you have referred to or reused code from a website add the links as reference.

        1. Paper Review – Following the guidelines review any one of the technical papers from Group2 (20)

        2. Generate random multidimensional (n=1000, D >= 15) data using sklearn. (20)

        • Build a K-means function from scratch (without using sklearn) and make assumptions to simplify the code as needed.

        • Use the elbow method to find an appropriate value for k

        • Use the silhouette plot to evaluate your clusters

        • Re-cluster the data to see if you can improve your results

        • Perform PCA on the original dataset and retain the most important PCs.

        • Run K-means on the PCA output, compare results with respect to cluster quality and time taken

        3. Using the data from 2, perform hyperparameter optimizations of the following clustering algorithms. (20)

        • Agglomerative hierarchical clustering (number of clusters, linkage criterion)

        • Density-based clustering (DBSCAN) (eps, minPts)

        • Model-based clustering (GMM) (number of clusters)

        4. Data mining and Cluster analysis of the following dataset (40)

        https://data.cdc.gov/NCHS/NCHS-Injury-Mortality-United-States/vc9m-u7tv/about_data

        The dataset contains the number of injury deaths per year by different injury intents from years 1999 to 2016 in the US. There are different groupings by age group, gender, race, and injury intent.

        As a data science consultant, your goal is to mine the dataset and extract meaningful insights for your clients in the health care industry. The course of action is as follows:

        • Review and understand the structure of the data.

        o Columns are year, sex, age group, race, injury mechanism, injury intent, deaths, population, age specific rate, and the statistics of age specific rate

        • Data Transformation

        o For each year, group by age group, sex, or race and summarize data as needed for subsequent analysis.

        • Exploratory Data Analysis (10)

        o Create statistical summaries.

        o Create boxplots, correlation/pairwise plots.

        o Perform basic outlier analysis.

        • Clustering (15)

        o In a few lines create a plan that describes the 3-4 questions that are suitable for cluster analysis.

        o List the various clustering algorithm(s) you’d use and why:

        o E.g., K-means, K-medians, K-modes, Hierarchical methods, DBSCAN, etc.

        o Apply the above algorithms to the filtered dataset based on your plan.

        o Report on the quality of the clusters, pros/cons, and summarize your findings.

        • Bias/Fairness Questions (15)

        Data

        o In the dataset under study, from a bias/fairness (b/f) perspective, there are 2 sensitive features: race and gender.

        o Analyze the data by a combination (2) of features (sensitive and other). Example features to include in the analysis: location (county, state), and other features you consider relevant. Though these features may not be considered sensitive they can be a proxy for sensitive features.

        o Determine feature groupings that are relevant for your analysis and explain your choices.

        o Do you detect bias in the data?

        o Present the results visually to show salient insights with respect to bias.

        o Based on the EDA and your project objective, develop a hypothesis about where b/f issues could arise in the modeling (cluster analysis).

        Modeling

        o Based on your hypothesis, assess the fairness of your model/analysis by applying the fairness-related metrics that are available in any of the following tools: Python Fairlearn package, R Fairness/Fairmodels package, or other similar tools.

        o Explain the reasoning for the groups that you selected for the fairness metrics.

        o Compare the fairness metrics for the different groups.

        o If you developed multiple models compare the fairness metrics for the models.

        o Comment on the results.

        o Suggest how the bias/fairness issues could be mitigated.

        o Present the results visually to show salient insights.

        Note: In the Fall Quarter you attended lectures on Bias/Fairness. Additionally, the following is a useful resource for analyzing b/f in data and modeling: Fairness & Bias Metrics
        請加QQ:99515681  郵箱:99515681@qq.com   WX:codehelp 

        掃一掃在手機打開當前頁
      1. 上一篇:代寫 Behavioural Economics ECON3124
      2. 下一篇:代寫COMP1721、代做java程序設計
      3. 無相關信息
        合肥生活資訊

        合肥圖文信息
        出評 開團工具
        出評 開團工具
        挖掘機濾芯提升發動機性能
        挖掘機濾芯提升發動機性能
        戴納斯帝壁掛爐全國售后服務電話24小時官網400(全國服務熱線)
        戴納斯帝壁掛爐全國售后服務電話24小時官網
        菲斯曼壁掛爐全國統一400售后維修服務電話24小時服務熱線
        菲斯曼壁掛爐全國統一400售后維修服務電話2
        美的熱水器售后服務技術咨詢電話全國24小時客服熱線
        美的熱水器售后服務技術咨詢電話全國24小時
        海信羅馬假日洗衣機亮相AWE  復古美學與現代科技完美結合
        海信羅馬假日洗衣機亮相AWE 復古美學與現代
        合肥機場巴士4號線
        合肥機場巴士4號線
        合肥機場巴士3號線
        合肥機場巴士3號線
      4. 上海廠房出租 短信驗證碼 酒店vi設計

        主站蜘蛛池模板: 日韩精品一区二区午夜成人版| 精品无码综合一区| 国产精品无码一区二区三区在| 视频在线一区二区| 亚洲色偷精品一区二区三区| 亚洲一区二区三区在线观看精品中文| 国产精品av一区二区三区不卡蜜 | 亚洲AV无码一区二区大桥未久| 天天爽夜夜爽人人爽一区二区| 国内自拍视频一区二区三区| 骚片AV蜜桃精品一区| 最美女人体内射精一区二区| 国产av福利一区二区三巨| 一区免费在线观看| 国产精品毛片a∨一区二区三区| 国产一区二区视频免费| 国产精品综合一区二区| 日韩精品一区二区三区中文版 | 在线观看国产一区亚洲bd| 欧美日韩一区二区成人午夜电影| 中文字幕亚洲综合精品一区| 久久91精品国产一区二区| 免费无码毛片一区二区APP| 一区二区三区视频在线观看| 久久精品午夜一区二区福利| 日韩亚洲AV无码一区二区不卡| 国产一区二区电影在线观看| 日本不卡一区二区视频a| 少妇无码一区二区二三区| 天堂国产一区二区三区| 肉色超薄丝袜脚交一区二区| 欧美日韩精品一区二区在线视频| 在线电影一区二区三区| 日韩电影在线观看第一区| 国产精品一区在线观看你懂的| 国产成人精品无码一区二区三区| 水蜜桃av无码一区二区| 亚洲日韩一区精品射精| 国产中的精品一区的| 国产亚洲情侣一区二区无| 久久久久人妻一区精品性色av|