Apache commons CSV | How can I ignore/include semicolon, comma in a field?

问题: I am trying to parse a log a file and store it in a CSV file. Here is a sample line below: 218.1.111.50 - - [13/Mar/2005:10:36:11 -0500] "GET http://www.yahoo.com/ HTTP/...

问题:

I am trying to parse a log a file and store it in a CSV file. Here is a sample line below:

218.1.111.50 - - [13/Mar/2005:10:36:11 -0500] "GET http://www.yahoo.com/ HTTP/1.1" 403 2898 "-" "Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)"

For this, I am using the Apach commons CSV library. The problem is that some fields have in the special character ; their value, and they get interpreted as a separator.

If we look for example at the field value Mozilla/4.0 (compatible; MSIE 4.01; Windows 95). This single field is assigned to 3 different values because of the ; .

enter image description here

I don't know the ideal method to go around this. Please see below, a snapchot of the code related to the library I use :

  CSVPrinter printer = new CSVPrinter(writer, CSVFormat.DEFAULT
                    .withHeader(HEADERS));
//
//
Matcher m = p.matcher(line);
                    Date date=formatter.parse(m.group("Time"));

            try {

                printer.printRecord(date.getMonth(), date.getDate(), date.getHours(), date.getMinutes(), date.getSeconds(), m.group("NetworkSrcIpv4"),
                        m.group("ApplicationHttpStatus"),m.group("ApplicationLen"),m.group("ApplicationHttpUserAgent"),
                        m.group("ApplicationHttpQueryString"));

                printer.flush();

            } catch (IOException e) {

                e.printStackTrace();

            }
//

Is there any possibility of automatically ignoring the ;, or perhaps replacing them with some values which won't affect the desired result? Is there any options I might add the my CSVprinter ?

Thank you for your feedback.


回答1:

You can configure TAB as delimiter instead of using DEFAULT delimiter -

CSVPrinter printer = new CSVPrinter(writer, CSVFormat.TDF.withHeader(HEADERS));

https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#TDF

  • 发表于 2019-03-07 01:45
  • 阅读 ( 178 )
  • 分类:sof

条评论

请先 登录 后评论
不写代码的码农
小编

篇文章

作家榜 »

  1. 小编 文章
返回顶部
部分文章转自于网络,若有侵权请联系我们删除