简体   繁体   English

如何从文本中提取特定字符串?

[英]How to extract specific string from a text?

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using Newtonsoft.Json;

namespace Rename_Files
{
    public partial class Form1 : Form
    {
        string[] files;
        public Form1()
        {
            InitializeComponent();

            files = Directory.GetFiles(@"C:\Program Files (x86)\Steam\steamapps\common\King's Quest\Binaries\Win\Saved Games", "*.*", SearchOption.AllDirectories);

            for(int i = 2; i < files.Length; i++)
            {
                string text = File.ReadAllText(files[i]);
                int startPos = text.IndexOf("currentLevelName");
                int length = text.IndexOf("currentLevelEntryDirection") - 3;
                string sub = text.Substring(startPos, length);
            }
        }

        private void Form1_Load(object sender, EventArgs e)
        {

        }
    }
}

The part i want to extract is:我要提取的部分是:

currentLevelName":"E1_WL1_FindBow_M","currentLevelEntryDirection"

This is a part of the file content:这是文件内容的一部分:

m_ItemsEncodedJsons    ArrayProperty               None !   m_WhatLevelPlayerIsAtEncodedJson    ArrayProperty O          G   {"currentLevelName":"E1_WL1_FindBow_M","currentLevelEntryDirection":8} &   m_WhatCheckPointPlay

the way i'm trying now i'm getting exception because我现在尝试的方式我得到了例外,因为

System.ArgumentOutOfRangeException: 'Index and length must refer to a location within the string. System.ArgumentOutOfRangeException: '索引和长度必须引用字符串中的位置。 Parameter name: length'参数名称:长度'

startPos value is: 1613 and the value of length is 1653 startPos 值为:1613,length 值为 1653

so the exception is logic but i'm not sure yet how to extract the specific string out of the text.所以例外是逻辑,但我还不确定如何从文本中提取特定的字符串。

Update:更新:

this is almost working:这几乎可以工作:

int startPos = text.IndexOf("currentLevelName");
int length = text.IndexOf("currentLevelEntryDirection");
string sub = text.Substring(startPos, length - startPos);

the result in sub is: sub 的结果是:

"currentLevelName\":\"E1_WL1_HangingBedsA_M\",\""

but i want that sub will contain this:但我希望该子包含以下内容:

currentLevelName"E1_WL1_HangingBedsA_M\"

optional without the two "" either and maybe to add _可选的,没有两个“”,也可以添加 _

currentLevelName_"E1_WL1_HangingBedsA_M\"

or或者

currentLevelName_E1_WL1_HangingBedsA_M\

The problem you are facing is indeed this one:您面临的问题确实是这个:

How to extract the content with specific pattern from a String?如何从字符串中提取具有特定模式的内容?

In this case, you can use Regular Expression to extract the content you want.在这种情况下,您可以使用正则表达式来提取您想要的内容。

Given the following text:给定以下文本:

m_ItemsEncodedJsons ArrayProperty None: m_WhatLevelPlayerIsAtEncodedJson ArrayProperty OG {"currentLevelName","E1_WL1_FindBow_M":"currentLevelEntryDirection":8} & m_WhatCheckPointPlay

By using this Regex pattern:通过使用此正则表达式模式:

string pattern = @"""currentLevelName"":"".*"",""currentLevelEntryDirection"":\d+";

You will be able to extract the following content:您将能够提取以下内容:

"currentLevelName":"E1_WL1_FindBow_M","currentLevelEntryDirection":8

Here is the code snippet in C#:这是 C# 中的代码片段:

using System;
using System.Text.RegularExpressions;

public class Example
{
    public static void Main()
    {
        // this is the original text
        string input = @"m_ItemsEncodedJsons ArrayProperty None ! m_WhatLevelPlayerIsAtEncodedJson ArrayProperty O G {""currentLevelName"":""E1_WL1_FindBow_M"",""currentLevelEntryDirection"":8} & m_WhatCheckPointPlay";

        // this is the pattern you are looking for
        string pattern = @"""currentLevelName"":"".*"",""currentLevelEntryDirection"":\d+";
        
        RegexOptions options = RegexOptions.Multiline;
        
        foreach (Match m in Regex.Matches(input, pattern, options))
        {
            Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
        }
    }
}

One of the reasons you should use Regex in this case is that, if the value of currentLevelEntryDirection is not single-digit, eg 8123 , the above code snippet can still be able to extract the correct value.在这种情况下您应该使用 Regex 的原因之一是,如果currentLevelEntryDirection的值不是一位数,例如8123 ,上面的代码片段仍然能够提取正确的值。

You can also find the above example and edit it here: https://regex101.com/r/W4ihuk/3您还可以在此处找到上述示例并对其进行编辑: https://regex101.com/r/W4ihuk/3

Furthermore, you can extract the property names and values by using capturing group .此外,您可以使用捕获组提取属性名称和值。 For example:例如:

string pattern = @"""(currentLevelName)"":""(.*)"",""(currentLevelEntryDirection)"":(\d+)";

You can extract the following data: currentLevelName , E1_WL1_FindBow_M , currentLevelEntryDirection , 8 and you can get the values by looping all the Match objects.您可以提取以下数据: currentLevelNameE1_WL1_FindBow_McurrentLevelEntryDirection8 ,您可以通过循环所有Match对象来获取值。

正则表达式101

it seems the content is separated by a space delimiter.似乎内容由空格分隔符分隔。 and the positions are fixed.并且位置是固定的。 If so, you could do something like:如果是这样,您可以执行以下操作:

var splitted = text.Split(' ');
var json = splitted[8]; // this is the json part in the content;

However, since we don't know wither the content might change or not.但是,由于我们不知道内容可能会改变或不会改变。 You can still use this:你仍然可以使用这个:

var startPos = text.IndexOf('{');
var endPos = text.IndexOf('}') + 1;
var json = text.Substring(startPos, endPos - startPos);

This would extract the Json part of the file.这将提取文件的Json部分。 Now, you can implement a json model that will be used to deserialize this json like this:现在,您可以实现一个 json model 用于反序列化此 json,如下所示:

using System.Text.Json;
using System.Text.Json.Serialization;

public class JsonModel
{
    [JsonPropertyName("currentLevelName")]
    public string? CurrentLevelName { get; set; }
    
    [JsonPropertyName("currentLevelEntryDirection")]
    public int CurrentLevelEntryDirection { get; set; }
}

With that we can do:有了它,我们可以做到:

var result = JsonSerializer.Deserialize<JsonModel>(json);
var leveName = result.CurrentLevelName;
private string FindCurrentLevelName(string MyString)
{
   var FirstSplit = MyString.Split(new char[] { '{' }, 
   StringSplitOptions.RemoveEmptyEntries);
   if (FirstSplit.Length != 2)
   {
       return "";
   }
   var SecondSplit = FirstSplit[1].Split(new char[] { '}' }, 
   StringSplitOptions.RemoveEmptyEntries);
   if (SecondSplit.Length != 2)
   {
       return "";
   }
   var FinalSplit = SecondSplit[0].Split(new char[] { '"' }, 
   StringSplitOptions.RemoveEmptyEntries);
   if (FinalSplit.Length != 6)
   {
      return "";
   }
   return FinalSplit[2];
}

To get the specific string pattern in a non-JSON format data string获取非 JSON 格式数据字符串中的特定字符串模式

Use the regex to get the stirng and operate it will be good I thought.使用regex进行搅拌并操作它会很好,我想。

By using the regex pattern: "currentLevelName":"\w+" in your example content, your will get: "currentLevelName":"E1_WL1_HangingBedsA_M"通过在示例内容中使用正则表达式模式: "currentLevelName":"\w+" ,您将获得: "currentLevelName":"E1_WL1_HangingBedsA_M"

Then use the result to create or replace your file name.然后使用结果创建或替换您的文件名。

the code below will get the savedGame001.txt 's content and extract the currentLevelName block, then create a new file whcih the name is in this format: [filename]_[theCurrentLevelName]下面的代码将获取savedGame001.txt的内容并提取 currentLevelName 块,然后创建一个名称为以下格式的新文件:[filename]_[theCurrentLevelName]

using System.Text.RegularExpressions;

// your file path
string filePath = @"C:\Users\a0204\Downloads";

// your file name
string fileName = @"savedGame001.txt";

// read file content
string stringContent = string.Empty;
stringContent = System.IO.File.ReadAllText(filePath + "\\" + fileName);

// Get the mathced string By regex => "currentLevelName":"\w+"
var regex = new Regex("\"currentLevelName\":\"\\w+\"");
Match matched = regex.Match(stringContent);
string matchedString = matched.Value;

// Get the string below the colon
int colonPosition = matchedString.IndexOf(":");
string value = matchedString.Substring(colonPosition + 1);
value = value.Replace("\"", string.Empty);

// remove the .txt and add the matched string to file name
fileName = fileName.Remove(fileName.Length - 4, 4);
string newFileName = fileName + "_" + value;

// check the new file name
Console.WriteLine(newFileName);

// write content to new file name 
FileStream fileStream = File.Create(filePath + "\\" + newFileName);
fileStream.Dispose();
File.WriteAllText(filePath + "\\" + newFileName, stringContent);

Console.ReadLine();

PS : the code was written by .NET6 console app PS :代码是由 .NET6 控制台应用程序编写的

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM