Welcome, Guest
Username: Password: Remember me

TOPIC: Email classification project for Rapidminer Group

Email classification project for Rapidminer Group 9 years 11 months ago #364

  • Adipta Datta
  • Adipta Datta's Avatar
  • Offline
  • SigmaWay Novice
  • Posts: 39
  • Thank you received: 6
  • Karma: 1
Hi all Rapidminers!

Please see what accuracy level and output you are getting by analyzing subject and body without using concatenation. You will have to apply weights in various combinations for the subject and body so that it totals to 1.0. So, for example, if you apply 0.6 weight to body, you have to apply 0.4 weight to subject. Note the accuracy levels you are getting using 10 validations and see how the test emails are classified and compare it. Please discuss the outputs here.
Last Edit: 9 years 11 months ago by Adipta Datta.
The administrator has disabled public write access.

Email classification project for Rapidminer Group 9 years 11 months ago #373

  • Adipta Datta
  • Adipta Datta's Avatar
  • Offline
  • SigmaWay Novice
  • Posts: 39
  • Thank you received: 6
  • Karma: 1
Hi,

Please refer to the picture for the output which I have got. The best output which I have got is under the following conditions:

1. Validations: 10
2. Subject weight: 0.9
3. Body weight: 0.1
4. Operators used under 'Process Documents from Data' - 'Tokenize, Filter Stopwords, Filter Tokens by Length (lower limit: 3 upper limit: 999), Filter Tokens by Content: string filtered - www with 'inverse' condition.

Please put in your outputs for discussions and deciding on the best combination.
Under this condition, 2 texts have been predicted as Davison correctly.
Attachments:
Last Edit: 9 years 11 months ago by Adipta Datta.
The administrator has disabled public write access.

Email classification project for Rapidminer Group 9 years 11 months ago #376

  • Soutrik Kumar
  • Soutrik Kumar's Avatar
  • Offline
  • SigmaWay Newbie
  • Posts: 12
  • Thank you received: 6
  • Karma: 0
The results are changing in two different conditions. They are:
1. While 'Filter stopwords' and 'Stem(porter)' are used, the result is misclassified for Davison and other categories.
2. While 'Filter tokens by length' and 'Filter tokens by content' are used, the result is classifying emails more correctly.
The administrator has disabled public write access.

Email classification project for Rapidminer Group 9 years 11 months ago #377

  • SOHAM SRIMANI
  • SOHAM SRIMANI's Avatar
  • Offline
  • SigmaWay Expert
  • Posts: 520
  • Thank you received: 67
  • Karma: 11
stem porter is clearly affecting the result , funnily in negative way
The administrator has disabled public write access.

Email classification project for Rapidminer Group 9 years 11 months ago #378

  • SOHAM SRIMANI
  • SOHAM SRIMANI's Avatar
  • Offline
  • SigmaWay Expert
  • Posts: 520
  • Thank you received: 67
  • Karma: 11
Davison: 4
Phone & Network: 4
Web: 3
Others: 9
Weight to Subject: 0.9
Weight to Body: 0.1
Model Accuracy: 71.71% +/- 8.44% (mikro: 71.63%)

Attachment Screenshot2014-05-0117.17.04.png not found

The administrator has disabled public write access.

Email classification project for Rapidminer Group 9 years 11 months ago #663

  • Adipta Datta
  • Adipta Datta's Avatar
  • Offline
  • SigmaWay Novice
  • Posts: 39
  • Thank you received: 6
  • Karma: 1
Hi

It seems that the model is working, and is automated. Though in some cases, the emails are misclassified. But, this is the best, which I am getting. See the XML file.

Attachment trainingdata_validationscheduled_unscheduled_xml.txt not found



Attachment test_output_xml.txt not found

Last Edit: 9 years 11 months ago by Adipta Datta.
The administrator has disabled public write access.
Time to create page: 0.126 seconds
Sign up for our newsletter

Follow us