Welcome, Guest
Username: Password: Remember me
  • Page:
  • 1
  • 2

TOPIC: Filtering Email Data in Rapid Miner

Filtering Email Data in Rapid Miner 9 years 11 months ago #365

  • SOHAM SRIMANI
  • SOHAM SRIMANI's Avatar
  • Offline
  • SigmaWay Expert
  • Posts: 520
  • Thank you received: 67
  • Karma: 11
Post your process of filtration of the email data and queries related to that here.
The administrator has disabled public write access.
The following user(s) said Thank You: sunny

Filtering Email Data in Rapid Miner 9 years 11 months ago #366

  • SOHAM SRIMANI
  • SOHAM SRIMANI's Avatar
  • Offline
  • SigmaWay Expert
  • Posts: 520
  • Thank you received: 67
  • Karma: 11
We need to stick to Unscheduled & Original data, leaving out the Scheduled and Reply data as of now. So we need to omit Scheduled and Reply data from the email data set. The main operator is this regard is 'Filter Documents (By Token)" which is to be used inside the process document operator after reading the data via "Read Excel operator".
Now we will use Filter Document (by content) operator 4 times in order to remove Scheduled and Reply data. The strings used by me are: RE: | Maintenance | Scheduled | Routines. I am not taking the "planned" string as of now in filter documents operator for analyzing.
Any one got any other opinion regarding this?
Last Edit: 9 years 11 months ago by SOHAM SRIMANI.
The administrator has disabled public write access.
The following user(s) said Thank You: sunny

Filtering Email Data in Rapid Miner 9 years 11 months ago #370

  • SOHAM SRIMANI
  • SOHAM SRIMANI's Avatar
  • Offline
  • SigmaWay Expert
  • Posts: 520
  • Thank you received: 67
  • Karma: 11
In case of feeding email body and email subject both to the read excel operator, how you guys are processing? Are you filtering based on subject only and then adding body to the filtered subjects to be used as training set with category?
The administrator has disabled public write access.

Filtering Email Data in Rapid Miner 9 years 11 months ago #371

But while filtering the scheduled and replies through various keywords used as an input to Filter Documents by Content operator, some unscheduled and original emails are being filtered because there is mention of the keyword "scheduled" in some of the email bodies. As a result, I am missing some data points. Any suggestion?
The administrator has disabled public write access.

Filtering Email Data in Rapid Miner 9 years 11 months ago #372

  • SOHAM SRIMANI
  • SOHAM SRIMANI's Avatar
  • Offline
  • SigmaWay Expert
  • Posts: 520
  • Thank you received: 67
  • Karma: 11
That is the problem. I have not used planned. instead i have used routine
The administrator has disabled public write access.

Filtering Email Data in Rapid Miner 9 years 11 months ago #374

  • Adipta Datta
  • Adipta Datta's Avatar
  • Offline
  • SigmaWay Novice
  • Posts: 39
  • Thank you received: 6
  • Karma: 1
Hi Tapasree,

Your sentences confused me. :P

So here's the thing that you follow:

1. Select Read Excel operator. While selecting the checkboxes, select subject and body both. Click Finish.
2. Select Process Documents from Data operator.
3. Under Process Documents from Data operator, bring in Filter Documents by content each time. In each case, enter the following in the string field: re:, scheduled, planned, maintenance. Be sure you have not selected case sensitive and you have selected inverse condition.



Let me know if you face problems. :)
Last Edit: 9 years 11 months ago by Adipta Datta.
The administrator has disabled public write access.
The following user(s) said Thank You: Sayak Dutta
  • Page:
  • 1
  • 2
Time to create page: 0.115 seconds
Sign up for our newsletter

Follow us