Hi Can you help me?
My hadoop is up and running. I ran a wordcount program which was running successfully. In HDFS I had 8 big text input files. the output file has the list of words and counts as (key, value) pair. I am going to show all these in a webpage. there I face the problem. My questions are: 1. How do I know from which file a word come from? 2. How to submit a pattern (like avoid "is,the,so..etc and focus only the keywords I give) 3. How do I format my out put file with extra information like file name, line number, pagenumber then word, count. 4. How do I query the map/reduce. 5. How can I do the confidence score for all comparison?
Please help me. I have been struck on these for long days. I have installed hadoop 0.20.2
Please reply as soon as possible