简体   繁体   English

如何在perl中从文件读取的单个字段中处理/存储多行?

[英]How do I handle/store multiple lines into a single field read from a file in perl?

I am trying to process a text file in perl. 我正在尝试在perl中处理文本文件。 I need to store the data from the file into a database. 我需要将文件中的数据存储到数据库中。 The problem that I'm having is that some fields contain a newline, which throws me off a bit. 我遇到的问题是某些字段包含换行符,这会让我失望。 What would be the best way to contain these fields? 包含这些字段的最佳方法是什么?

Example data.txt file: 示例data.txt文件:

ID|Title|Description|Date
1|Example 1|Example Description|10/11/2011
2|Example 2|A long example description
Which contains
a bunch of newlines|10/12/2011
3|Example 3|Short description|10/13/2011

The current (broken) Perl script (example): 当前(损坏的)Perl脚本(示例):

#!/usr/bin/perl -w
use strict;

open (MYFILE, 'data.txt');
while (<MYFILE>) {
    chomp;
    my ($id, $title, $description, $date) = split(/\|/);

    if ($id ne 'ID') {
        # processing certain fields (...)

        # insert into the database (example)
        $sqlInsert->execute($id, $title, $description, $date);
    }
}
close (MYFILE);

As you can see from the example, in the case of ID 2, it's broken into several lines causing errors when attempting to reference those undefined variables. 从示例中可以看出,在ID 2的情况下,它会分成几行,在尝试引用这些未定义的变量时会导致错误。 How would you group them into the correct field? 你会如何将他们分组到正确的领域?

Thanks in advance! 提前致谢! (I hope the question was clear enough, difficult to define the title) (我希望问题很清楚,很难定义标题)

I would just count the number of separators before splitting the line. 在分割线之前,我只计算分隔符的数量。 If you don't have enough, read the next line and append it. 如果你没有足够的,请阅读下一行并附加它。 The tr operator is an efficient way to count characters. tr运算符是计算字符的有效方法。

#!/usr/bin/perl -w
use strict;
use warnings;

open (MYFILE, '<', 'data.txt');
while (<MYFILE>) {
    # Continue reading while line incomplete:
    while (tr/|// < 3) {
        my $next = <MYFILE>;
        die "Incomplete line at end" unless defined $next;
        $_ .= $next;
    }

    # Remaining code unchanged:
    chomp;
    my ($id, $title, $description, $date) = split(/\|/);

    if ($id ne 'ID') {
        # processing certain fields (...)

        # insert into the database (example)
        $sqlInsert->execute($id, $title, $description, $date);
    }
}
close (MYFILE);

Read next line until number of fields is what you need. 阅读下一行,直到您需要的字段数。 Something like that (I haven't tested that code): 类似的东西(我没有测试过那段代码):

my @fields = split(/\|/);
unless ($#fields == 3) { # Repeat untill we get 4 fields in array

  <MYFILE>; # Read next line      
  chomp;

  # Split line
  my @add_fields = split(/\|/); 

  # Concatenate last element of first line with first element of the current line
  $fields[$#fields] = $fields[$#fields] . $add_fields[0]; 

  # Concatenate remaining array part
  push(@fields, @add_fields[1,$#add_fields]);

}

If you could change your data.txt file to include the pipe separator as the last character in every line/record, you could slurp in the whole file, splitting directly into the raw fields. 如果您可以更改data.txt文件以包含管道分隔符作为每个行/记录中的最后一个字符,您可以在整个文件中啜饮,直接拆分为原始字段。 This code would then do what you want: 然后,此代码将执行您想要的操作:

#!/usr/bin/perl
use strict;
use warnings;

my @fields;
{
  $/ = "|";
  open (MYFILE, 'C:/data.txt') or die "$!";
  @fields = <MYFILE>;
  close (MYFILE);

  for(my $i = 0; $i < scalar(@fields); $i = $i + 4) {
    my $id = $fields[$i];
    my $title = $fields[$i+1];
    my $description = $fields[$i+2];
    my $date = $fields[$i+3];
    if ($id =~ m/^\d+$/) {
        # processing certain fields (...)

        # insert into the database (example)
    }
  }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM