繁体   English   中英

多行正则表达式如果超过两个则替换

[英]Multiline regex replace if more than two

我在以下方面遇到困难;

有一个包含问题和答案的Word文件,我需要以特定格式将其导入MQod(在线问题站点)中。 一切都是黑色,接受正确的答案,这些都是绿色的。 起始格式如下:

1. Question example

a. Wrong

b. Wrong

C. Wrong

D. Right

输出应变为

:Question example

:Question example

{

~ Wrong

~ Wrong

~ Wrong

= Right

}

我用Word打开文件,用*替换所有红色的段落标记(我不能用组替换)。 之后,我将.docx文件导出为文本。 在我的linux计算机上打开并在其上抛出以下正则表达式。

sed -i -e 's/^\r/\n/g' tmp #OS X white line replacement                    
sed -i -e 's/\r//g' tmp #remove white lines                           
sed -i -e 's:^[a-z]\.:~:' tmp #Replace Leading question letters with tilde                                                                                               
sed -i -e 's/\(^[0-9]*\.\ \)\(.*\)/}\n::\2\n::\2\n{/' tmp #regenerate tittle                    
sed -i -n '${p;q};N;/\n\*/{s/"\?\n//p;b};P;D' tmp #next line starts with * append to front of current                                                              
sed -i -e 's:^~\(.*\)\(\*.*\)$:=\1:' tmp #move * from back to = to front
sed -i -e 's:^\*:=:' tmp #replace any remaining * with =        
sed '/^$/d' tmp #delete any remaining white lines 

这不是很好,但是效果很好,问题是手工制作的,并且有很多错误,因此我仍然必须手动解决这个问题。 困难的部分是当我有多个正确答案时。 输出应如下所示;

:Question example

:Question example

{

~%-100% Wrong

~%-100% Wrong

~%50% Right

~%50% Right

}

理想情况下,我有一个sed或perl正则表达式,它计算{之间的=数量,并用〜%​​50%替换它们。 并且所有〜都以%-100%唱歌。 我也可以为3个正确答案使用此代码,每个正确答案都变为〜%33%。

这可行吗? 我有1000多个问题,它一定会对自动化实现有帮助。 用sed进行多行替换在两行中比较棘手,因此我想四行或更多行需要perl吗? 我没有Perl的经验。

有人可以帮我解决这个问题吗? 请原谅我的英语不好,我不是母语人士。

my $file = do { local $/; <> };
my @questions = split /(?<=.)(?=[0-9]+\.)/s, $file;
for (@questions) {
   my @lines = split /^/m;

   my $title = shift(@lines);
   $title =~ s/^\S+\s*/:/;

   my $num_right;
   my $num_wrong;
   for (@lines) {
      if    (/Right/) { ++$num_right; }
      elsif (/Wrong/) { ++$num_wrong; }
   }

   my $num_answers = $num_right + $num_wrong;

   my $right_pct = sprintf('%.0f', $num_right/$num_answers*100);
   my $right_prefix = $num_right == 1 ? "=" : "~%$right_pct%";
   my $wrong_prefix = $num_right == 1 ? "~" : "~%-100%";

   for (@lines) {
      if    (/Right/) { s/^\S+/$right_prefix/; }
      elsif (/Wrong/) { s/^\S+/$wrong_prefix/; }
   }

   print(
      $title,
      "\n",
      $title,
      "\n{\n",
      @lines,
      "\n}\n",
   );
}

用适当的内容替换/Right//Wrong/

下面的程序根据我对你所需要的最佳猜测进行工作。 它通过将所有信息读入数组然后对其进行格式化来工作。

就目前而言,数据被合并到源中并从DATA文件句柄中读取。 将循环更改为while (<>) { ... }将允许您在命令行上指定数据文件。

如果我的猜测错了,你必须纠正我。

use strict;
use warnings;

my @questions;

while (<DATA>) {
  next unless /\S/;
  s/\s+$//;
  if (/^\d+\.\s*(.+)/) {
    push @questions, [$1];
  }
  elsif (/^[A-Za-z]\.\s*(.+)/i) {
    push @{$questions[-1]}, $1;
  }
}

for my $question (@questions) {

  my ($text, @answers) = @$question;

  print "::$text\n" for 1, 2;

  my $correct = grep /right/i, @answers;
  my $percent = int(100/$correct);

  print "{\n";

  if ($correct == 1) {
    printf "%s %s\n", /right/i ? '=' : '~', $_ for @answers;
  }
  else {
    my $percent = int(100/$correct);
    printf "~%%%d%%~ %s\n", /right/i ? $percent : -100, $_ for @answers;
  }

  print "}\n";
}

__DATA__
1. Question one

a. Wrong

b. Wrong

c. Right

d. Wrong

2. Question two

a. Right

b. Wrong

c. Right

d. Wrong

3. Question three

a. Right

b. Right

c. Wrong

d. Right

输出

::Question one
::Question one
{
~ Wrong
~ Wrong
= Right
~ Wrong
}
::Question two
::Question two
{
~%50%~ Right
~%-100%~ Wrong
~%50%~ Right
~%-100%~ Wrong
}
::Question three
::Question three
{
~%33%~ Right
~%33%~ Right
~%-100%~ Wrong
~%33%~ Right
}

这可能对您有用:

cat <<\! >file.sed
> # On encountering a digit in the first character position
> /^[0-9]/{
>   # Create a label to cater for last line processing
>   :end
>   # Swap to hold space
>   x
>   # Check hold space for contents.
>   # If none delete it and begin a new cycle
>   # This is to cater for the first question line
>   /./!d
>   # Remove any carriage returns
>   s/\r//g
>   # Remove any blank lines
>   s/\n\n*/\n/g
>   # Double the question line, replacing the question number by a ':'
>   # Also append a { followed by a newline
>   s/^[0-9]*\.\([^\n]*\n\)/:\1:\1{\n/
>   # Coalesce lines beginning with a * and remove optional preceeding "
>   s/"\?\n\*/*/g
>   # Replace the wrong answers a,b,c...  with ~%-100%
>   s/\n[a-zA-z]*\. \(Wrong\)/\n~%-100% \1/g
>   # Replace the right answers a,B,c... with ~%100%
>   s/\n[a-zA-Z]*\. \(Right\)/\n~%100% \1/g
>   # Assuming no more than 4 answers:
>   # Replace 4 correct answers prefix with ~%25%
>   s/\(~%100%\)\(.*\)\1\(.*\)\1\(.*\)\1/~%25%\2~%25%\3~%25%\4~%25%/
>   # Replace 3 correct answers prefix with ~%33%
>   s/\(~%100%\)\(.*\)\1\(.*\)\1/~%33%\2~%33%\3~%33%/
>   # Replace 2 correct answers prefix with ~%50%
>   s/\(~%100%\)\(.*\)\1/~%50%\2~%50%/
>   # Append a newline and a }
>   s/$/\n}/
>   # Break and so print newly formatted string
>   b
>   }
> # Append pattern space to hold space
> H
> # On last line jump to end label
> $b end
> # Delete all lines from pattern space
> d
> !

然后运行:

sed -f file.sed file

您的示例与以下文档不匹配: http : //docs.moodle.org/22/en/GIFT 问题标题和问题由两个冒号而不是一个冒号分隔:

//Comment line 
::Question title 
:: Question {
=A correct answer
~Wrong answer1
#A response to wrong answer1
~Wrong answer2
#A response to wrong answer2
~Wrong answer3
#A response to wrong answer3
~Wrong answer4
#A response to wrong answer4
}

有些人天真地根据您的示例给您答案,而不是找到真实的规格,哎呀。

您的问题无法回答,因为您的格式无法显示哪些是正确的答案。 也就是说:

1. Question

a. Is this right?

b. Or this?

c. Or this?

您说这些是使用原始Word文档中的颜色标识的,并且您对此进行了一些替换以保留信息; 但是,您没有显示此示例! 糟糕...

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM