<?xml version="1.0"?>
<rss version="2.0">
   <channel>
      <title>Assignment Forum by Elsa Phung</title>
      <link>https://padlet.com/elsa_phung/Assign1Forum</link>
      <description>** ETW3482 Data Mining **</description>
      <language>en-us</language>
      <pubDate>2016-03-17 01:48:54 UTC</pubDate>
      <lastBuildDate>2024-11-16 22:02:55 UTC</lastBuildDate>
      <webMaster>hello@padlet.com</webMaster>
      <image>
         <url>https://padlet-assets.storage.googleapis.com/portrait/sticky.jpg</url>
      </image>
      <item>
         <title>1. Assignment Groups</title>
         <author>elsa_phung</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/101955742</link>
         <description><![CDATA[<div>Now that everyone of you is assigned a group, you should start planning on your assignment 1.  A well planned tasks distributions among members will give you sufficient time for running and testing more models, hence more insights and more discoveries for better reporting/analysis to fetch better grade. Cheers and all the best! <strong><em>Elsa.</em></strong></div>]]></description>
         <enclosure url="" />
         <pubDate>2016-03-21 07:32:01 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/101955742</guid>
      </item>
      <item>
         <title>2.  Assignment 1</title>
         <author>elsa_phung</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/102344140</link>
         <description><![CDATA[<div>Use this space for questions with regards to Assign1. I will be moderating the discussions. <strong><em>Elsa.</em></strong></div>]]></description>
         <enclosure url="" />
         <pubDate>2016-03-23 03:30:01 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/102344140</guid>
      </item>
      <item>
         <title>3.  Feedback: Quizzes</title>
         <author>elsa_phung</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/102359430</link>
         <description><![CDATA[<div>Feedback on Lab02 Week03 Quiz is out. <strong><em>Elsa.</em></strong></div>]]></description>
         <enclosure url="" />
         <pubDate>2016-03-23 08:00:08 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/102359430</guid>
      </item>
      <item>
         <title>4. Gp9:</title>
         <author>mamen1</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/102818163</link>
         <description><![CDATA[<div>How do you conduct bivarite and univariate analysis on the 'adult977' dataset?</div>]]></description>
         <enclosure url="" />
         <pubDate>2016-03-27 14:06:01 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/102818163</guid>
      </item>
      <item>
         <title>5. 28 </title>
         <author>elsa_phung</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/102854913</link>
         <description><![CDATA[<div><strong><em>Univariate </em></strong>analysis explores variables (attributes) one by one. Variables could be either <em>categorical</em> or <em>numerical</em>. <br><br><strong><em>Bivariate</em></strong> analysis is the simultaneous analysis of two variables (attributes). It explores the concept of relationship between two variables, whether there exists an association and the strength of this association, or whether there are differences between two variables and the significance of these differences.<br><br><strong><em>How</em></strong> to conduct? <br>Univariate: Explore each of the attributes, you can visualize them using the EDIT/View in WEKA.  Same as in Bivariate,  looking at the "worth" or any corelation between attributes. <br><br>You have learnt these during lab sessions.<br><br><strong><em>Elsa</em></strong>.</div>]]></description>
         <enclosure url="" />
         <pubDate>2016-03-28 08:56:29 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/102854913</guid>
      </item>
      <item>
         <title>6. Gp4</title>
         <author>nzha87</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/102983781</link>
         <description><![CDATA[<div>Do we have to present any theoretical explanation of relationship between a particular attribute and the class?<br><br>It depends. If you are using it in your final model, then yes, in your report. <strong><em>Elsa.</em></strong></div>]]></description>
         <enclosure url="" />
         <pubDate>2016-03-29 06:44:08 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/102983781</guid>
      </item>
      <item>
         <title>7. Gp4</title>
         <author>nzha87</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/102992565</link>
         <description><![CDATA[<div>So at the beginning we just preprocess the dataset and explain which attribute seem to influence the other through descrptive analysis right? <br><br>That's right! <strong><em>Elsa.</em></strong></div>]]></description>
         <enclosure url="" />
         <pubDate>2016-03-29 07:59:39 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/102992565</guid>
      </item>
      <item>
         <title>8. Gp-4</title>
         <author>nzha87</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/102993093</link>
         <description><![CDATA[<div>The attribute "Marital status" in the assignment question has got 7 characteristics, but the dataset has got only 6. The "married-AF- spouse" is missing.<br><br>It is not necessary that all must be used? Just like if 10 numbers from 1 to 10 are given, may be the number 5 is not use at all? <strong><em>Elsa.</em></strong></div>]]></description>
         <enclosure url="" />
         <pubDate>2016-03-29 08:06:37 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/102993093</guid>
      </item>
      <item>
         <title>9. Gp-4</title>
         <author>nzha87</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/102993663</link>
         <description><![CDATA[<div>What does an attribute "education-num" ? Is it a total number of years of education?<br><br>Yes. <strong><em>Elsa.</em></strong></div>]]></description>
         <enclosure url="" />
         <pubDate>2016-03-29 08:13:33 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/102993663</guid>
      </item>
      <item>
         <title>10. Gp-4: 29</title>
         <author>nzha87</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/103000683</link>
         <description><![CDATA[<div>What does an attribute "Finalweight" mean?<br><br>"<strong>Final Weight</strong>" is just a measure used by the Census Bureau as a "weighted tallies" to represent some estimated population totals of any specified socio-economic characteristics of the population. For example, people with similar demographic characteristics should have similar weights. <strong><em>Elsa.</em></strong></div>]]></description>
         <enclosure url="" />
         <pubDate>2016-03-29 09:08:08 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/103000683</guid>
      </item>
      <item>
         <title>11. Gp9 Q1 </title>
         <author>mamen1</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/103236827</link>
         <description><![CDATA[<div>Is the file 'adult977' the training set and the file 'adult976' the test set? What do we use 'adult977' for?<br><br>May be some one can answer this question? <strong><em>Elsa</em></strong>.</div>]]></description>
         <enclosure url="" />
         <pubDate>2016-03-30 13:31:24 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/103236827</guid>
      </item>
      <item>
         <title>12. Gp9 Q2 </title>
         <author>mamen1</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/103241335</link>
         <description><![CDATA[<div>Can we use percentage split<br><br>It's all in your own hand what methods you want to use, but keep in mind you have to explain why you are choosing this method instead of another one. <strong><em>Elsa.</em></strong></div>]]></description>
         <enclosure url="" />
         <pubDate>2016-03-30 13:48:32 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/103241335</guid>
      </item>
      <item>
         <title>13. 31 Mar: To ALL Students</title>
         <author>elsa_phung</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/103368896</link>
         <description><![CDATA[<div>Dear ALL, this is suppose to be a discussion Forum, please discuss and share your knowledge you discovered along the way. From now on I will not be answering to every question asked, especially those that you are suppose to know which are given in lecture or we have done through practices in lab. Anyone can answer to questions which you think you have come across and solve while doing the assignment questions. Lets create a healthy environment of learning. To encourage you to do that, I will give <strong><em>BONUS</em></strong> marks for those participating in discussions. <strong><em>Elsa.</em></strong></div>]]></description>
         <enclosure url="" />
         <pubDate>2016-03-31 03:17:12 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/103368896</guid>
      </item>
      <item>
         <title>14. Reply to Gp9 Q1</title>
         <author>sumei</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/103443953</link>
         <description><![CDATA[<div>"Adult 977" is the training set and is used to build training models. Once formed, you have to use the "Adult 976" test set as a supplied test to evaluate the models formed. <br><br>Well said. <strong><em>Elsa</em></strong>.</div>]]></description>
         <enclosure url="" />
         <pubDate>2016-03-31 13:43:13 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/103443953</guid>
      </item>
      <item>
         <title>15. Group 5 Q1</title>
         <author>hzsoh1</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/103469997</link>
         <description><![CDATA[<div>Should we only used one type of preprocess when we conduct the at least three methods of classifying for training set?</div>]]></description>
         <enclosure url="" />
         <pubDate>2016-03-31 15:24:28 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/103469997</guid>
      </item>
      <item>
         <title>16. Group 5 Q2 </title>
         <author>hzsoh1</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/103472066</link>
         <description><![CDATA[<div>Is it possible for us to group two numeric attributes into one? Does Weka accept negative values?<br><br>1. Yes, you can. <br>2. Yes, it is possible to have negative values. <strong><em>Elsa.</em></strong></div>]]></description>
         <enclosure url="" />
         <pubDate>2016-03-31 15:32:48 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/103472066</guid>
      </item>
      <item>
         <title>17. Gp1</title>
         <author>sumei</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/103472265</link>
         <description><![CDATA[<div>For our different models, must we use the same preprocessing techniques (discretization..stdization..etc)? Or can different models use different combinations?<br><br>eg. Model 1- J48 model using standardization and remove attributes <br><br>Model 2 - Naive Bayes model using discretization and remove attributes<br><br><strong><em>Elsa</em></strong>: If you are using like what you mentioned in email: <br>Model 1 - Classifier 1 using Std &amp; Removal of Attributes<br>Model 2 - Classifier 2 using same as above<br>Model 3 - Classifier 3 using same as above<br><br>You can make comparisons and are testing on which Classifier will give you the best results<br><br>For Option 2, you can only identify which is the best "independent model", i.e., Classifier 1 using A1 or C2 using A2 or C3 using A3. You cannot conclude which classifier works best with the problem.<strong><em> Elsa.</em></strong></div>]]></description>
         <enclosure url="" />
         <pubDate>2016-03-31 15:33:39 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/103472265</guid>
      </item>
      <item>
         <title>18. Reply to Gp1 </title>
         <author>mamen1</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/103563934</link>
         <description><![CDATA[<div>You don't HAVE to use different preprocessing techniques. It is up to you to set the parameters for your model. If you use different preprocessing techniques thats fine. The main thing is that we must include these in the report so that it will be know how you got your results. Setting new preprocessing techgniques already gives you a new model.<br><br>haha. You learn fast! <strong><em>Elsa</em></strong>.</div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-01 03:01:00 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/103563934</guid>
      </item>
      <item>
         <title>19. Reply to Group 5 Q1</title>
         <author>mamen1</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/103564283</link>
         <description><![CDATA[<div>It is up to you. If you think that will give you the best model, then go ahead. The point of trying different algorithms and preprocess techniques is to find the model that gives the best result or accuracy.<br><br>(y) <strong><em>Elsa.</em></strong></div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-01 03:06:59 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/103564283</guid>
      </item>
      <item>
         <title>20. Gp9 </title>
         <author>mamen1</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/103719649</link>
         <description><![CDATA[<div>What file names did you use to save your training set and test set?</div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-02 06:41:46 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/103719649</guid>
      </item>
      <item>
         <title>21. Reply to Gp9-</title>
         <author>sumei</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/103733102</link>
         <description><![CDATA[<div>You can use any name. Doesnt matter (as long as you know which files are which)</div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-02 14:49:33 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/103733102</guid>
      </item>
      <item>
         <title>22. Group 2 </title>
         <author>kssew1</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/104133500</link>
         <description><![CDATA[<div>Do we have to concern about outliers in our sample?<br><br>You have to test and confirm that:<br>1. they are really outliers<br>2. whether they affect your results? <strong>Elsa.</strong></div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-05 14:13:27 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/104133500</guid>
      </item>
      <item>
         <title>23. Group 5</title>
         <author>hzsoh1</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/104282512</link>
         <description><![CDATA[<div>Can we discretize the training set and then standardize them? Or we are only suppose to conduct one of the filttering such that we only discretize but cannot standardize after discretizing it?<br><br>Put on your thinking cap. If you discretize the second time, or standardize after discretization, are you changing the "origin" of the data?  <strong><em>Elsa</em></strong>. </div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-06 05:47:41 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/104282512</guid>
      </item>
      <item>
         <title>24. Group 5</title>
         <author>hzsoh1</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/104282623</link>
         <description><![CDATA[<div>Must we delete or remove the missing values using the filter? Can we just leave it and run it without replace the missing values as well as outliers?<br><br>This has been talked about and explained MANY times?<strong><em> Elsa</em></strong>.</div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-06 05:49:48 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/104282623</guid>
      </item>
      <item>
         <title>25. Group 5 </title>
         <author>ctlum1</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/104282797</link>
         <description><![CDATA[<div>Is it possible to combine training and test set together in Excel and then use percentage split in Weka to solve the compatibility issues? (I believe the model we created from this should be a good model)<br><br>No, for the purpose of this assignment, you are not allow to join these 2 data sets. It is given as separate sets for some reasons. <strong><em>Elsa</em></strong>. </div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-06 05:52:41 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/104282797</guid>
      </item>
      <item>
         <title>26. Counter to Group 2</title>
         <author>kssew1</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/104283305</link>
         <description><![CDATA[<div>I'm planning to regression as one of my method. The capital gain violates the normality asssumption. It might be outliers or extremly negative skewed data that probably produce a biased estimation. <br><br>Anyone to answer this Question? <strong><em>Elsa</em></strong>.</div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-06 06:01:37 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/104283305</guid>
      </item>
      <item>
         <title>27. Group 5 </title>
         <author>ctlum1</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/104296298</link>
         <description><![CDATA[<div>So does that mean we cant use other test options since we cant combine them into one file ? if that's the case I think group 9 is not aware of it. (Please have a look at Gp9 Q2's question). Thanks for clarifying<br><br></div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-06 07:45:30 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/104296298</guid>
      </item>
      <item>
         <title>28. Reply to Group 5 Q1 </title>
         <author>sumei</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/104312667</link>
         <description><![CDATA[<div>We can use multiple types of preprocesses for the different models. But if preprocesses across the models are not the same, then we cannot compare which classifier is the same. (Refer gp1)</div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-06 09:46:53 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/104312667</guid>
      </item>
      <item>
         <title>29. Reply Group 5</title>
         <author>sumei</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/104313979</link>
         <description><![CDATA[<div>My thoughts...We can choose to use CV / % split as both methods are used to validate our models. But as Ms elsa said, % split is usually done when we have large sets of data whilst CV is when we dont have enough data (small dataset)</div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-06 09:58:25 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/104313979</guid>
      </item>
      <item>
         <title>30. Reply to Gp5 </title>
         <author>bvkok1</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/104316051</link>
         <description><![CDATA[<div>Some classifiers are sensitive to missing value. e.g. 1R<br>In this case, a new attribute will be created for the missing values. Unless voluntarily done, they ought to be replaced.<br>Others are not, so it doesnt really matter. E.g. Naive Bayes<br><br>However, I believe <br>deleting the instances carrying missing values would result in the data set losing some meaning and thus causing the model to have lower accuracy.</div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-06 10:19:39 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/104316051</guid>
      </item>
      <item>
         <title>31. Counter to Group</title>
         <author>kssew1</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/104326474</link>
         <description><![CDATA[<div>Hi everyone, i think i have found the answer that to encounter the attributes that violates the normality assumption. Use LOGISTIC REGRESSION instead of LINEAR REGRESSION because logistic regression is like logit model and it generates the log-likehood function. There is no normality assumption in logit model. <strong><em>kenneth</em></strong></div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-06 11:45:20 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/104326474</guid>
      </item>
      <item>
         <title>32. Regards to test set not compatible</title>
         <author>kssew1</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/104405695</link>
         <description><![CDATA[<div><a href="https://groups.google.com/forum/#!topic/wekamooc-general/3Sb7gbeclpM">https://groups.google.com/forum/#!topic/wekamooc-general/3Sb7gbeclpM</a><br>here's a link might be useful for yall </div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-06 16:30:54 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/104405695</guid>
      </item>
      <item>
         <title>33. ELSA: </title>
         <author>elsa_phung</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/104490308</link>
         <description><![CDATA[<div><strong>Now this is getting more and more interesting. Keep up the good work!</strong></div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-07 00:38:01 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/104490308</guid>
      </item>
      <item>
         <title>34. Group </title>
         <author>kssew1</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/104550390</link>
         <description><![CDATA[<div>Is the search method cover all possible subset combination? </div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-07 10:46:54 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/104550390</guid>
      </item>
      <item>
         <title>35. Reply To Group 2:</title>
         <author>ajgan6</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/104721534</link>
         <description><![CDATA[<div>Yes I think it does. And i think this reading may help. </div>]]></description>
         <enclosure url="http://padletuploads.blob.core.windows.net/aws/fallback_link.png" />
         <pubDate>2016-04-08 01:00:06 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/104721534</guid>
      </item>
      <item>
         <title>36. Group 8:, Training and test sets are not compatible</title>
         <author>jjlau4</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/104875565</link>
         <description><![CDATA[<div>Does anyone know how to solve the compatibility problem? How to handle different structures of the training and test sets?</div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-09 11:07:23 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/104875565</guid>
      </item>
      <item>
         <title>37. Reply to Group 8 </title>
         <author>sumei</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/104877249</link>
         <description><![CDATA[<div>1.You need to preprocess your data file. </div><div>2.Once you are happy with it, you need to do the same for your test file. </div><div>3.Save both as .arff and open it in WORDPAD. </div><div>4.Check that the headers for both data and test files are exactly the same and then save it. <br>NOTE: both files must have the same number of labels for each attribute, and they must in the same position. </div><div>5.Run the supplied test. </div><div><br>**NOTE: if you discretize your data file, you SHOULD NOT discretize your test file using weka. Instead use “vlookup” on excel. Once done, save it as a csv and use weka to convert it into an .arff file. Then continue with step 3 onwards.   </div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-09 12:15:43 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/104877249</guid>
      </item>
      <item>
         <title>38. Group 3</title>
         <author>hlpan4</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/104882052</link>
         <description><![CDATA[<div>How many combinations of filters can we have?</div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-09 14:23:50 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/104882052</guid>
      </item>
      <item>
         <title>39. Reply to Group 5</title>
         <author>fsmoh7</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/104909188</link>
         <description><![CDATA[<div>You can have any number of filter combinations to identify the highest accuracy. More the merrier. Anyways, time constraints exist.&nbsp;</div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-10 08:03:36 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/104909188</guid>
      </item>
      <item>
         <title>40. Group 8 - Compatible Issue</title>
         <author>wpche8</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/104935556</link>
         <description><![CDATA[<div><br>Hi guys,<br><br>James from my group had discover the issue arise is due to difference in the number of label in nominal attribute of training set and test.<br><br>Further research state that Weka do not only require same amount of attribute for both data-set, but the same amount of label in the nominal attribute in the same order as well. It should be noted that certain label in test set does not exist in the training set and the model build using training set won't not be able to accommodate information that is not capture since it is a categories data and not numerical one , hence, the issue arise. <br><br>For example, if a decision tree model is build based on a training set with only 3 label from one of the nominal data, there would only be 3 separate root for that level of the tree. However, when the test appear with a forth label in its nominal data set, the model would not be able to deal with it as it is only make for 3 label.<br><br>Converting the nominal attribute into binary attribute won't work, as it would turn from uneven number of label to uneven number of attribute. <br><br>The possible solution is to drop any nominal attribute that appear to have any missing label in the training set. For instance, if the attribute like country consist only 25 or 14 country label, it would be safe to assume it is incomplete as the world has more than 100 country. While attribute like gender consist female and male label would be safe to assume it is complete, since there is only 2 gender in this world. <br><br>This is the method I am aware of so far, if there is any better idea to deal with this issue please advice. <strong><em>WP</em></strong></div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-10 18:32:25 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/104935556</guid>
      </item>
      <item>
         <title>41. Reply to group 8 by group 4</title>
         <author>nzha87</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/104972783</link>
         <description><![CDATA[<div>There is a faster way to solve compatibility problem. Open both training and test sets at the same time so that you see both of them opened in two seperate Wekas. Check each attribute from both sets, you have to make sure that the each attribute with nominal data inside has got exactly same number and type of categories. For example the Training set in Native Country attribute has Laos, Puerto-rico and other countries while the test set doesnt have them. for this use filter called "Addvalue" in unsupervised.attribute then add categories. After that, you have to ensure that the categories with each nominal attributes in both test and training sets has the same order. For such use the filter called "Sort". Then save the each sets into new arff file, close the test set and supply the test set into classsifier and then run it. Then if you modify you training set, your test set should be adjusted as such. Hope it works! <strong>BTW the mentioned filters are only available in the 3.7.13 on previous versions Sort filter is not available. NZ</strong></div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-11 04:07:33 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/104972783</guid>
      </item>
      <item>
         <title>42. Group 4 </title>
         <author>nzha87</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/104973969</link>
         <description><![CDATA[<div>How should we deal with the class imbalance problem? we tried some methods but the accuracy decreased under sampling method.</div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-11 04:22:58 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/104973969</guid>
      </item>
      <item>
         <title>43. Reply to group 4 by group 8</title>
         <author>wpche8</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/104974000</link>
         <description><![CDATA[<div>Isn't it more parsimonious to just drop off the attribute that doesn't have the same amount of label, to fix the compitabile issue. As it does not represent the whole population with the sample. Even adding the filter won't not help to increase the accuracy in the model. Since I assume the filter would just allow the model to know there is a missing label, but with without any sample regrding about the missing label like country that won't teach the model anything. Might as well as to drop the attribute which is not helpful in the first place,? <strong><em>WP</em></strong></div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-11 04:23:20 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/104974000</guid>
      </item>
      <item>
         <title>44. Reply to group 8 by group 4 </title>
         <author>nzha87</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/104974290</link>
         <description><![CDATA[<div>The reply to you was regarding the compatibility. I don't really understand you problem. It is not parsimonious, since different filters and use of different classifiers could give different level of accuracy. For now you don't know what awaits you since you first need to preprocess the training data. You could just do preprocessing first then if you delete any attribute, adjust your test set accordingly so that number of attributes and categories and order are exactly the same. <strong>NZ</strong></div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-11 04:26:35 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/104974290</guid>
      </item>
      <item>
         <title>45. </title>
         <author>elsa_phung</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/104974373</link>
         <description><![CDATA[<div>Dear ALL, please put in the date so that it is easier for other students to refer? <strong><em>Elsa</em></strong>.</div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-11 04:27:30 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/104974373</guid>
      </item>
      <item>
         <title>46. </title>
         <author>wpche8</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/104974693</link>
         <description><![CDATA[<div><br>First of all, the issue of compatible arise is due to the number of label are different in both the training set and test set. Which is why I am concerns about the usages of incomplete attribute, even if we add in all the missing categories into the the nominal data of the training set that is assuming that we know all the missing categories which is based on the test set. However, if such model is to be implemented in the real world, we would know that there is actually more missing country in the list. It mean our model would not be able to work in a environment that would have more categories that we did not include in our model build base on both training and test set. Might as well as just drop off these attribute with category we predicted to be missing, so our model would be able to run in long run. Since part of our model is suppose to be build for a long run used and not readjusting each time if a new category arise.<br><br>I am not sure if I am correct is just my personal opinion, thank for the feed back. <strong><em>WP</em></strong>    </div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-11 04:31:46 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/104974693</guid>
      </item>
      <item>
         <title>47. </title>
         <author>nzha87</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/104975784</link>
         <description><![CDATA[<div>Ok i get what you mean. The model will not consider the added category in the training set since the value of additional category is 0 in prediction. The Weka requires the missing label to be added for the purpose of compatibility. <strong>NZ</strong></div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-11 04:45:16 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/104975784</guid>
      </item>
      <item>
         <title>48. </title>
         <author>elsa_phung</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/105003540</link>
         <description><![CDATA[<div>I have changed the format to streaming, so the latest post will be on top. Just keep on posting, scroll down for previous posts, and remember to put in the date and time when posting or replying.  <strong><em>ELSA.</em></strong></div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-11 08:32:04 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/105003540</guid>
      </item>
      <item>
         <title>49. </title>
         <author>mkob16</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/105272198</link>
         <description><![CDATA[<div>what is final weight?</div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-12 10:25:33 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/105272198</guid>
      </item>
      <item>
         <title>50. </title>
         <author>sumei</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/105297025</link>
         <description><![CDATA[<div>*Repost previous answer by Ms Elsa<br><br>"<strong>Final Weight</strong>" is just a measure used by the Census Bureau as a "weighted tallies" to represent some estimated population totals of any specified socio-economic characteristics of the population. For example, people with similar demographic characteristics should have similar weights.</div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-12 12:47:43 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/105297025</guid>
      </item>
      <item>
         <title>51. </title>
         <author>nwshi1</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/105467882</link>
         <description><![CDATA[<div>Is Part B included in the 10 page report, and if not, what is the number of pages we should allocate for it? <strong>NS<br><br></strong><strong><em>Elsa</em></strong><strong>: </strong>Keep<strong> </strong>it straight forward and simple. Max 2 pages should be enough (Part B only). Or up to 3 pages is acceptable.<br><br>Thanks! <strong>NS</strong></div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-13 03:49:06 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/105467882</guid>
      </item>
      <item>
         <title>52. </title>
         <author>sumei</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/105475604</link>
         <description><![CDATA[<div>No it is not included. However, I'm not sure how many pages. </div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-13 05:30:26 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/105475604</guid>
      </item>
      <item>
         <title>53. </title>
         <author>wklou3</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/105568226</link>
         <description><![CDATA[<div>How to check is the model overfitting or underfitting ?</div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-13 14:31:29 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/105568226</guid>
      </item>
      <item>
         <title>54. </title>
         <author>mamen1</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/105751104</link>
         <description><![CDATA[<div>what is the filter needed to run logistic regression on the data. How do you interpret the output?</div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-14 10:29:45 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/105751104</guid>
      </item>
      <item>
         <title>55. </title>
         <author>wjang7</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/105971892</link>
         <description><![CDATA[<div>What does it mean by 'never be used' in question 2 part (c)?</div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-15 09:45:17 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/105971892</guid>
      </item>
      <item>
         <title>56. Reply to &#39;55. &#39;</title>
         <author>mamen1</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/106120415</link>
         <description><![CDATA[<div>It just means which one of the three test options - 'use training set', 'cross-validation' and 'percentage split' must not be used for evaluating a model. You have to select one</div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-16 08:32:14 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/106120415</guid>
      </item>
      <item>
         <title>57. Reply to &#39;53. </title>
         <author>mamen1</author>
         <link>https://padlet.com/elsa_phung/Assign1Forum/wish/106164036</link>
         <description><![CDATA[<div>I am not sure how to check, but I found out that if you use the 'use training set' option to evaluate your model, there will definitely be overfitting. </div>]]></description>
         <enclosure url="" />
         <pubDate>2016-04-17 09:52:00 UTC</pubDate>
         <guid>https://padlet.com/elsa_phung/Assign1Forum/wish/106164036</guid>
      </item>
   </channel>
</rss>
