简体   繁体   English

多行正则表达式如果超过两个则替换

[英]Multiline regex replace if more than two

I'm having a hard time with the following; 我在以下方面遇到困难;

There is a word file with questions and answers that I need to import in moodle (online question site) in a particular format. 有一个包含问题和答案的Word文件,我需要以特定格式将其导入MQod(在线问题站点)中。 Everything is black accept for the right answers, these are green. 一切都是黑色,接受正确的答案,这些都是绿色的。 The start format is the following: 起始格式如下:

1. Question example

a. Wrong

b. Wrong

C. Wrong

D. Right

The output should become 输出应变为

:Question example

:Question example

{

~ Wrong

~ Wrong

~ Wrong

= Right

}

I open the file in word replace all red paragraph marks (I can't do a replace with groups) with *. 我用Word打开文件,用*替换所有红色的段落标记(我不能用组替换)。 After that I export the .docx file to text. 之后,我将.docx文件导出为文本。 Open in on my linux computer and throw the following regex on it. 在我的linux计算机上打开并在其上抛出以下正则表达式。

sed -i -e 's/^\r/\n/g' tmp #OS X white line replacement                    
sed -i -e 's/\r//g' tmp #remove white lines                           
sed -i -e 's:^[a-z]\.:~:' tmp #Replace Leading question letters with tilde                                                                                               
sed -i -e 's/\(^[0-9]*\.\ \)\(.*\)/}\n::\2\n::\2\n{/' tmp #regenerate tittle                    
sed -i -n '${p;q};N;/\n\*/{s/"\?\n//p;b};P;D' tmp #next line starts with * append to front of current                                                              
sed -i -e 's:^~\(.*\)\(\*.*\)$:=\1:' tmp #move * from back to = to front
sed -i -e 's:^\*:=:' tmp #replace any remaining * with =        
sed '/^$/d' tmp #delete any remaining white lines 

This isn't great but works well, questions are hand-made and have a lot of errors so I still have to walk trough this by hand. 这不是很好,但是效果很好,问题是手工制作的,并且有很多错误,因此我仍然必须手动解决这个问题。 The hard part is when I have multiple correct answers. 困难的部分是当我有多个正确答案时。 The output should become like the following; 输出应如下所示;

:Question example

:Question example

{

~%-100% Wrong

~%-100% Wrong

~%50% Right

~%50% Right

}

Ideally I have a sed or perl regex that counts the amount of = sings between { and replaces them with ~%50%. 理想情况下,我有一个sed或perl正则表达式,它计算{之间的=数量,并用〜%​​50%替换它们。 And all the ~ sings with %-100%. 并且所有〜都以%-100%唱歌。 I can have this code also for 3 right answers where every right answer becomes ~%33%. 我也可以为3个正确答案使用此代码,每个正确答案都变为〜%33%。

Is this doable? 这可行吗? I have over 1000 questions and it would sure help to automate this. 我有1000多个问题,它一定会对自动化实现有帮助。 Multiline replacement with sed is kind of tricky with two lines so I guess four or more lines will need perl? 用sed进行多行替换在两行中比较棘手,因此我想四行或更多行需要perl吗? I have no experience in Perl. 我没有Perl的经验。

Could someone help me out with this one? 有人可以帮我解决这个问题吗? Please excuse my bad English i'm a non-native speaker. 请原谅我的英语不好,我不是母语人士。

my $file = do { local $/; <> };
my @questions = split /(?<=.)(?=[0-9]+\.)/s, $file;
for (@questions) {
   my @lines = split /^/m;

   my $title = shift(@lines);
   $title =~ s/^\S+\s*/:/;

   my $num_right;
   my $num_wrong;
   for (@lines) {
      if    (/Right/) { ++$num_right; }
      elsif (/Wrong/) { ++$num_wrong; }
   }

   my $num_answers = $num_right + $num_wrong;

   my $right_pct = sprintf('%.0f', $num_right/$num_answers*100);
   my $right_prefix = $num_right == 1 ? "=" : "~%$right_pct%";
   my $wrong_prefix = $num_right == 1 ? "~" : "~%-100%";

   for (@lines) {
      if    (/Right/) { s/^\S+/$right_prefix/; }
      elsif (/Wrong/) { s/^\S+/$wrong_prefix/; }
   }

   print(
      $title,
      "\n",
      $title,
      "\n{\n",
      @lines,
      "\n}\n",
   );
}

Replace /Right/ and /Wrong/ with something appropriate. 用适当的内容替换/Right//Wrong/

The program below works according to my best guess at what it is you need. 下面的程序根据我对你所需要的最佳猜测进行工作。 It works by reading all the information into an array and then formatting it. 它通过将所有信息读入数组然后对其进行格式化来工作。

As it stands, the data is incorporated into the source and read from the DATA file handle. 就目前而言,数据被合并到源中并从DATA文件句柄中读取。 Changing the loop to while (<>) { ... } will allow you to specify a data file on the command line. 将循环更改为while (<>) { ... }将允许您在命令行上指定数据文件。

You must correct me if my guesses are wrong. 如果我的猜测错了,你必须纠正我。

use strict;
use warnings;

my @questions;

while (<DATA>) {
  next unless /\S/;
  s/\s+$//;
  if (/^\d+\.\s*(.+)/) {
    push @questions, [$1];
  }
  elsif (/^[A-Za-z]\.\s*(.+)/i) {
    push @{$questions[-1]}, $1;
  }
}

for my $question (@questions) {

  my ($text, @answers) = @$question;

  print "::$text\n" for 1, 2;

  my $correct = grep /right/i, @answers;
  my $percent = int(100/$correct);

  print "{\n";

  if ($correct == 1) {
    printf "%s %s\n", /right/i ? '=' : '~', $_ for @answers;
  }
  else {
    my $percent = int(100/$correct);
    printf "~%%%d%%~ %s\n", /right/i ? $percent : -100, $_ for @answers;
  }

  print "}\n";
}

__DATA__
1. Question one

a. Wrong

b. Wrong

c. Right

d. Wrong

2. Question two

a. Right

b. Wrong

c. Right

d. Wrong

3. Question three

a. Right

b. Right

c. Wrong

d. Right

output 输出

::Question one
::Question one
{
~ Wrong
~ Wrong
= Right
~ Wrong
}
::Question two
::Question two
{
~%50%~ Right
~%-100%~ Wrong
~%50%~ Right
~%-100%~ Wrong
}
::Question three
::Question three
{
~%33%~ Right
~%33%~ Right
~%-100%~ Wrong
~%33%~ Right
}

This might work for you: 这可能对您有用:

cat <<\! >file.sed
> # On encountering a digit in the first character position
> /^[0-9]/{
>   # Create a label to cater for last line processing
>   :end
>   # Swap to hold space
>   x
>   # Check hold space for contents.
>   # If none delete it and begin a new cycle
>   # This is to cater for the first question line
>   /./!d
>   # Remove any carriage returns
>   s/\r//g
>   # Remove any blank lines
>   s/\n\n*/\n/g
>   # Double the question line, replacing the question number by a ':'
>   # Also append a { followed by a newline
>   s/^[0-9]*\.\([^\n]*\n\)/:\1:\1{\n/
>   # Coalesce lines beginning with a * and remove optional preceeding "
>   s/"\?\n\*/*/g
>   # Replace the wrong answers a,b,c...  with ~%-100%
>   s/\n[a-zA-z]*\. \(Wrong\)/\n~%-100% \1/g
>   # Replace the right answers a,B,c... with ~%100%
>   s/\n[a-zA-Z]*\. \(Right\)/\n~%100% \1/g
>   # Assuming no more than 4 answers:
>   # Replace 4 correct answers prefix with ~%25%
>   s/\(~%100%\)\(.*\)\1\(.*\)\1\(.*\)\1/~%25%\2~%25%\3~%25%\4~%25%/
>   # Replace 3 correct answers prefix with ~%33%
>   s/\(~%100%\)\(.*\)\1\(.*\)\1/~%33%\2~%33%\3~%33%/
>   # Replace 2 correct answers prefix with ~%50%
>   s/\(~%100%\)\(.*\)\1/~%50%\2~%50%/
>   # Append a newline and a }
>   s/$/\n}/
>   # Break and so print newly formatted string
>   b
>   }
> # Append pattern space to hold space
> H
> # On last line jump to end label
> $b end
> # Delete all lines from pattern space
> d
> !

Then run: 然后运行:

sed -f file.sed file

Your examples do not match this documentation: http://docs.moodle.org/22/en/GIFT . 您的示例与以下文档不匹配: http : //docs.moodle.org/22/en/GIFT Questions titles and questiosn are delimited by two colons not one colon: 问题标题和问题由两个冒号而不是一个冒号分隔:

//Comment line 
::Question title 
:: Question {
=A correct answer
~Wrong answer1
#A response to wrong answer1
~Wrong answer2
#A response to wrong answer2
~Wrong answer3
#A response to wrong answer3
~Wrong answer4
#A response to wrong answer4
}

Some people naively gave you answers based on your examples instead of finding the real spec, oops. 有些人天真地根据您的示例给您答案,而不是找到真实的规格,哎呀。

Your question is not possible to answer because your format does not reveal which are the correct answers. 您的问题无法回答,因为您的格式无法显示哪些是正确的答案。 That is to say: 也就是说:

1. Question

a. Is this right?

b. Or this?

c. Or this?

You say that these are identified using colors in the original Word document and that you do some replacement on that to preserve the information; 您说这些是使用原始Word文档中的颜色标识的,并且您对此进行了一些替换以保留信息; however, you don't show an example of this! 但是,您没有显示此示例! Oops ... 糟糕...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM