php - How to parse recipes ingredient line using coldfusion? -
i using jsoup(http://jsoup.org/) parse html page , extract data page. in that, extracting recipes details cooking time, instructions & ingredients. take data html page , saved in archive table named recipeimport.
before inserting these valid recipes table. have parse ingredients because has been stored in recipe_ingredient table based 3 different master tables namely recipeamount, recipeunittype & recipeingredient.
let me consider simple ingredient "1 cup white sugar". i've separate amount(1), unittype(cup) , ingredient(sugar) match these(recipeamount, recipeunittype & recipeingredient) master table , insert ingredient in recipe_ingredient table reference id's.
recipeamount table
id amounttype amounttypevalue 1 1/2 0.5 2 1 1
recipeunittype table
id unittype 1 cup 2 tbs 3 tsp 4 gram
recipeingredient table
id ingredientname 1 sugar 2 salt 3 honey
finally, have save ingredient this.
recipe_ingredient table
id amountid unittypeid ingredientid line_text ingredient_line 1 2 1 1 white sugar 1 cup white sugar
so far have used regular expression check valid ingredient.
regex1 = "^((\d+)|(\d+\/\d+)|(\d+)\s(\d+\/\d+)|(\d+-\d+))\s((dash|pinch|tsp|tbs|fl oz|cup|pt|qt|gal|oz|lb|cl|can)|(dash|pinch|teaspoon|tablespoon|fluid ounce|cup|pint|quart|gallon|ounce|pound|fresh|clove|small|medium|large|slice|hand|of|turnip))(s)?\b\s[a-za-z0-9(,|\-|&|:|!|" & "'|" & '"' & ")\s]+[a-za-z(,|\-|&|:|!|" & "'|" & '"' & ")\s]+$"; regex2 = "^((\d+)|(\d+.\d+))\s((kg|g|lb|cl)|(kilo gram|gram|pound))(s)?\b\s[a-za-z0-9(,|\-|&|:|!|" & "'|" & '"' & ")\s]+[a-za-z(,|\-|&|:|!|" & "'|" & '"' & ")\s]+$"; regex3 = "^((a|an|extra))\s[a-za-z0-9(,|\-|&|:|!|" & "'|" & '"' & ")\s]+[a-za-z(,|\-|&|:|!|" & "'|" & '"' & ")\s]+$";
Comments
Post a Comment