php - How to parse recipes ingredient line using coldfusion? -


i using jsoup(http://jsoup.org/) parse html page , extract data page. in that, extracting recipes details cooking time, instructions & ingredients. take data html page , saved in archive table named recipeimport.

before inserting these valid recipes table. have parse ingredients because has been stored in recipe_ingredient table based 3 different master tables namely recipeamount, recipeunittype & recipeingredient.

let me consider simple ingredient "1 cup white sugar". i've separate amount(1), unittype(cup) , ingredient(sugar) match these(recipeamount, recipeunittype & recipeingredient) master table , insert ingredient in recipe_ingredient table reference id's.

recipeamount table

id amounttype amounttypevalue 1       1/2         0.5 2       1           1 

recipeunittype table

id unittype 1   cup 2   tbs 3   tsp 4   gram 

recipeingredient table

id  ingredientname   1      sugar 2      salt 3      honey  

finally, have save ingredient this.

recipe_ingredient table

id   amountid   unittypeid   ingredientid   line_text       ingredient_line  1      2            1             1          white sugar    1 cup white sugar 

so far have used regular expression check valid ingredient.

regex1 = "^((\d+)|(\d+\/\d+)|(\d+)\s(\d+\/\d+)|(\d+-\d+))\s((dash|pinch|tsp|tbs|fl oz|cup|pt|qt|gal|oz|lb|cl|can)|(dash|pinch|teaspoon|tablespoon|fluid ounce|cup|pint|quart|gallon|ounce|pound|fresh|clove|small|medium|large|slice|hand|of|turnip))(s)?\b\s[a-za-z0-9(,|\-|&|:|!|" & "'|" & '"' & ")\s]+[a-za-z(,|\-|&|:|!|" & "'|" & '"' & ")\s]+$";  regex2 = "^((\d+)|(\d+.\d+))\s((kg|g|lb|cl)|(kilo gram|gram|pound))(s)?\b\s[a-za-z0-9(,|\-|&|:|!|" & "'|" & '"' & ")\s]+[a-za-z(,|\-|&|:|!|" & "'|" & '"' & ")\s]+$";  regex3 = "^((a|an|extra))\s[a-za-z0-9(,|\-|&|:|!|" & "'|" & '"' & ")\s]+[a-za-z(,|\-|&|:|!|" & "'|" & '"' & ")\s]+$"; 


Comments

Popular posts from this blog

javascript - Chart.js (Radar Chart) different scaleLineColor for each scaleLine -

apache - Error with PHP mail(): Multiple or malformed newlines found in additional_header -

java - Android – MapFragment overlay button shadow, just like MyLocation button -