编写一个C++程序来读取文件,过滤掉所有的标记,将过滤掉标记后的内容输出到一个新文件中。
1. 从文件中读取一个字符
2. 确定字符是否是HTML标记的一部分
3. 打印出所有不是HTML标记的字符
/* -------------------------------------------- * This program reads a html file, and writes * the text without the tags to a new file. * --------------------------------------------*/#include之后就可以拿个HTML文件试试了,不过这个程序只是把所有标记过滤掉,还有待完善。如果非标记字符有很多无关内容,效果就差强人意。建议用典型的HTML文件测试,如:// Required for cin, cout, cerr#include // Required for ifstream, ofstream#include // Required for string#include // Required for exitusing namespace std;int main(){ // Declare objects char ch; bool text_state(true); string infile, outfile; ifstream html; ofstream htmltext; // Prompt user for name of input file cout << "Enter the name of the input file : \n( *.*, such as : demo.html ) \n" ; cout << "Make sure the file is under current project file ! \n" ; // My English is poor ~~ cin >> infile; cout<< "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" ; // Prompt user for name of output file cout << "Enter the name of the output file : " ; cin >> outfile; // Open files html.open(infile.c_str()); if(html.fail()) { cout<< "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" ; cerr << "Error opening input file" << endl ; exit(1); } htmltext.open(outfile.c_str()); // Read first character from html file html.get(ch); while(!html.eof()) { // Check state if(text_state) { if(ch == '<') // Beginning of a tag text_state = false; // Change states else htmltext << ch; // Still text, write to the file } else { // Command state, no output required if(ch == '>') // End of tag text_state = true; // Change states } // Read next character from html file html.get(ch); } html.close(); htmltext.close(); cout<< "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" ; cout << "Success transformed ! \n" ; cout << "Look for " << outfile << " in current file.\n" ; cout<< "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" ; return 0;}
我的第一个 HTML 页面 body 元素的内容会显示在浏览器中。
title 元素的内容会显示在浏览器的标题栏中。