合肥生活安徽新聞合肥交通合肥房產生活服務合肥教育合肥招聘合肥旅游文化藝術合肥美食合肥地圖合肥社保合肥醫院企業服務合肥法律

        代寫DAT 560M、代做R編程語言

        時間:2023-12-09  來源:合肥網hfw.cc  作者:hfw.cc 我要糾錯



        DAT 560M – Big Data and Cloud Computing 2023 – Homework #4
        - 1 -
        DAT 560M: Big Data and Cloud Computing
        Fall 2023, Mini B
        Homework #4
        INSTRUCTIONS
        1. This is an individual assignment. You may not discuss your approach to solving these
        questions with anyone, other than the instructor or TA.
        2. Please include only your Student ID on the submission.
        3. The only allowed material is:
        a. Class notes
        b. Content posted on Canvas
        c. Utilize ONLY the codes we practice. Anything beyond will not get any point!
        4. You are not permitted to use other online resources.
        5. The physical submission is due by the next lab.
        6. There will be TA office hours. See the schedule on Canvas.
        ASSIGNMENT
        In this assignment, we are going to practice Spark on a file named loans.csv and the file is located
        in our database. In case you don’t have the file, you can get it from the dataset folder on the server.
        http://server-ip/dataset/loans.csv
        This dataset has information about loans distributed to poor and financially excluded people
        around the world by a company called Kiva. There are a few number of columns in the dataset
        and we would like to do an analysis on that by pyspark. Please answer each question and provide
        a screenshot.
        Part ** Initialize Spark (5 pts)
        ** Start the PySpark engine and load the file. This homework is a little bit complex and its
        better that we assign more resources. Then, when assigning your engine, you can assign
        all available CPU cores on your machine to the Spark to perform faster. To do that, just
        simply put local[*] instead of local (look at the following screenshot). If it crashes or
        doesn’t work properly, you are more than welcome to go back to the normal initialization
        process. (2 pts)
        DAT 560M – Big Data and Cloud Computing 2023 – Homework #4
        - 2 -
        2- Get to know the dataset and do a preliminary examination (for example type of columns,
        summary, …) (2 pts)
        3- Here, we have two identifier for the country of the loan receiver, country, and
        country_code and so one is enough. Then please drop country_code. (1 pts)
        Part 2- Data analysis (50 pts)
        4- Find the three most loan awarded sector when the loan amount is larger than 1000. (5 pts)
        5- For the top sector you found in Q4, list 6 most used activities. (5 pts)
        6- Find the number of given loans per year. For that, use the year from posted_time. You
        may add a new column called “year”. (5 pts)
        7- Using SQL syntax, list the number of loans per sector in decreasing order where the
        countries are the 3 top countries in terms of the number of received loans. (10 pts)
        8- Find the top 20 countries in terms of the total loan amount they have received where the
        use of the loan include the word “stock”. You may use SQL. (5 pts)
        9- Do a wordcount on the “use” column. For that, consider all lower case. If you can, it’s
        great to remove stopwords and then do the wordcount. It’s OK if you don’t know how to
        do so. (10 pts)
        10- Group the loans into 5 categories. If the loan amount is equal or larger than 50000, call it
        “super large”. If less but larger or equal to 10000, call it “large”. If less but larger or
        equal to 5000, call it “medium”. If less but larger or equal to 1000, call it “small”. If less,
        call it “tiny”. Then, find the number of given loans to each category per gender. For
        gender, only consider “male” or “female”. (10 pts)
        Part 3- Feature engineering (10 pts)
        1** Let’s find how many people are involved in each loan application. To find it out, look at
        gender column. You can see sometimes it is one value, and sometimes more than one.
        Count it for each loan and add it to the dataframe. (10 pts)
        DAT 560M – Big Data and Cloud Computing 2023 – Homework #4
        - 3 -
        Part 4- Machine learning (35 pts)
        12- Now let’s focus only on Retail, Agriculture, and Food sectors the remove the rest of the
        rows (10 pts).
        13- We like to predict the loan_amount based on sector, country, term_in_months, year, and
        the new attribute you added in Q11 and drop the rest of the columns. (5 pts)
        14- Prepare your data to do a prediction task. We are interested in predicting the loan amount
        based on the rest of the features. (10 pts)
        15- Perform a regression task for and find the Mean Squared Error and R-square of the model
        (80% training, 20% testing) (10 pts). 
        請加QQ:99515681 或郵箱:99515681@qq.com   WX:codehelp

        掃一掃在手機打開當前頁
      1. 上一篇:CSCI 2122代寫、代做C++設計程序
      2. 下一篇:代寫ISOM 2007、代做 Python 程序設計
      3. 無相關信息
        合肥生活資訊

        合肥圖文信息
        急尋熱仿真分析?代做熱仿真服務+熱設計優化
        急尋熱仿真分析?代做熱仿真服務+熱設計優化
        出評 開團工具
        出評 開團工具
        挖掘機濾芯提升發動機性能
        挖掘機濾芯提升發動機性能
        海信羅馬假日洗衣機亮相AWE  復古美學與現代科技完美結合
        海信羅馬假日洗衣機亮相AWE 復古美學與現代
        合肥機場巴士4號線
        合肥機場巴士4號線
        合肥機場巴士3號線
        合肥機場巴士3號線
        合肥機場巴士2號線
        合肥機場巴士2號線
        合肥機場巴士1號線
        合肥機場巴士1號線
      4. 短信驗證碼 酒店vi設計 deepseek 幣安下載 AI生圖 AI寫作 aippt AI生成圖片 trae

        關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

        Copyright © 2025 hfw.cc Inc. All Rights Reserved. 合肥網 版權所有
        ICP備06013414號-3 公安備 42010502001045

        主站蜘蛛池模板: 综合久久久久久中文字幕亚洲国产国产综合一区首 | 亚洲一区日韩高清中文字幕亚洲 | 亚洲一区二区三区不卡在线播放| 国产成人综合一区精品| 国产成人无码一区二区三区在线 | 国产成人一区二区三区在线观看| 国产精品久久久久一区二区| 肉色超薄丝袜脚交一区二区| 青娱乐国产官网极品一区| 91视频国产一区| 高清一区二区在线观看| 日韩色视频一区二区三区亚洲| 国产福利一区二区三区| 久久精品动漫一区二区三区| 国产在线不卡一区二区三区| 99久久精品费精品国产一区二区 | 插我一区二区在线观看| 日韩精品人妻av一区二区三区| 麻豆精品人妻一区二区三区蜜桃| 精品一区二区三区免费毛片爱| 久久青草精品一区二区三区| 国产午夜精品一区二区三区漫画| 国产成人高清精品一区二区三区| 午夜视频久久久久一区| 波多野结衣中文一区| 国产一区二区在线观看麻豆| 免费精品一区二区三区在线观看| 99偷拍视频精品一区二区| 性色av闺蜜一区二区三区| 成人H动漫精品一区二区| 在线电影一区二区| 国产成人一区二区三区电影网站| 精品乱人伦一区二区| 国产美女av在线一区| 中文字幕一区二区三区视频在线| 中文字幕一区二区人妻| 青青青国产精品一区二区| 国产精品第一区揄拍| 狠狠爱无码一区二区三区| 久久无码一区二区三区少妇 | 国偷自产视频一区二区久|