简体   繁体   中英

How to extract specific string from a text?

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using Newtonsoft.Json;

namespace Rename_Files
{
    public partial class Form1 : Form
    {
        string[] files;
        public Form1()
        {
            InitializeComponent();

            files = Directory.GetFiles(@"C:\Program Files (x86)\Steam\steamapps\common\King's Quest\Binaries\Win\Saved Games", "*.*", SearchOption.AllDirectories);

            for(int i = 2; i < files.Length; i++)
            {
                string text = File.ReadAllText(files[i]);
                int startPos = text.IndexOf("currentLevelName");
                int length = text.IndexOf("currentLevelEntryDirection") - 3;
                string sub = text.Substring(startPos, length);
            }
        }

        private void Form1_Load(object sender, EventArgs e)
        {

        }
    }
}

The part i want to extract is:

currentLevelName":"E1_WL1_FindBow_M","currentLevelEntryDirection"

This is a part of the file content:

m_ItemsEncodedJsons    ArrayProperty               None !   m_WhatLevelPlayerIsAtEncodedJson    ArrayProperty O          G   {"currentLevelName":"E1_WL1_FindBow_M","currentLevelEntryDirection":8} &   m_WhatCheckPointPlay

the way i'm trying now i'm getting exception because

System.ArgumentOutOfRangeException: 'Index and length must refer to a location within the string. Parameter name: length'

startPos value is: 1613 and the value of length is 1653

so the exception is logic but i'm not sure yet how to extract the specific string out of the text.

Update:

this is almost working:

int startPos = text.IndexOf("currentLevelName");
int length = text.IndexOf("currentLevelEntryDirection");
string sub = text.Substring(startPos, length - startPos);

the result in sub is:

"currentLevelName\":\"E1_WL1_HangingBedsA_M\",\""

but i want that sub will contain this:

currentLevelName"E1_WL1_HangingBedsA_M\"

optional without the two "" either and maybe to add _

currentLevelName_"E1_WL1_HangingBedsA_M\"

or

currentLevelName_E1_WL1_HangingBedsA_M\

The problem you are facing is indeed this one:

How to extract the content with specific pattern from a String?

In this case, you can use Regular Expression to extract the content you want.

Given the following text:

m_ItemsEncodedJsons ArrayProperty None: m_WhatLevelPlayerIsAtEncodedJson ArrayProperty OG {"currentLevelName","E1_WL1_FindBow_M":"currentLevelEntryDirection":8} & m_WhatCheckPointPlay

By using this Regex pattern:

string pattern = @"""currentLevelName"":"".*"",""currentLevelEntryDirection"":\d+";

You will be able to extract the following content:

"currentLevelName":"E1_WL1_FindBow_M","currentLevelEntryDirection":8

Here is the code snippet in C#:

using System;
using System.Text.RegularExpressions;

public class Example
{
    public static void Main()
    {
        // this is the original text
        string input = @"m_ItemsEncodedJsons ArrayProperty None ! m_WhatLevelPlayerIsAtEncodedJson ArrayProperty O G {""currentLevelName"":""E1_WL1_FindBow_M"",""currentLevelEntryDirection"":8} & m_WhatCheckPointPlay";

        // this is the pattern you are looking for
        string pattern = @"""currentLevelName"":"".*"",""currentLevelEntryDirection"":\d+";
        
        RegexOptions options = RegexOptions.Multiline;
        
        foreach (Match m in Regex.Matches(input, pattern, options))
        {
            Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
        }
    }
}

One of the reasons you should use Regex in this case is that, if the value of currentLevelEntryDirection is not single-digit, eg 8123 , the above code snippet can still be able to extract the correct value.

You can also find the above example and edit it here: https://regex101.com/r/W4ihuk/3

Furthermore, you can extract the property names and values by using capturing group . For example:

string pattern = @"""(currentLevelName)"":""(.*)"",""(currentLevelEntryDirection)"":(\d+)";

You can extract the following data: currentLevelName , E1_WL1_FindBow_M , currentLevelEntryDirection , 8 and you can get the values by looping all the Match objects.

正则表达式101

it seems the content is separated by a space delimiter. and the positions are fixed. If so, you could do something like:

var splitted = text.Split(' ');
var json = splitted[8]; // this is the json part in the content;

However, since we don't know wither the content might change or not. You can still use this:

var startPos = text.IndexOf('{');
var endPos = text.IndexOf('}') + 1;
var json = text.Substring(startPos, endPos - startPos);

This would extract the Json part of the file. Now, you can implement a json model that will be used to deserialize this json like this:

using System.Text.Json;
using System.Text.Json.Serialization;

public class JsonModel
{
    [JsonPropertyName("currentLevelName")]
    public string? CurrentLevelName { get; set; }
    
    [JsonPropertyName("currentLevelEntryDirection")]
    public int CurrentLevelEntryDirection { get; set; }
}

With that we can do:

var result = JsonSerializer.Deserialize<JsonModel>(json);
var leveName = result.CurrentLevelName;
private string FindCurrentLevelName(string MyString)
{
   var FirstSplit = MyString.Split(new char[] { '{' }, 
   StringSplitOptions.RemoveEmptyEntries);
   if (FirstSplit.Length != 2)
   {
       return "";
   }
   var SecondSplit = FirstSplit[1].Split(new char[] { '}' }, 
   StringSplitOptions.RemoveEmptyEntries);
   if (SecondSplit.Length != 2)
   {
       return "";
   }
   var FinalSplit = SecondSplit[0].Split(new char[] { '"' }, 
   StringSplitOptions.RemoveEmptyEntries);
   if (FinalSplit.Length != 6)
   {
      return "";
   }
   return FinalSplit[2];
}

To get the specific string pattern in a non-JSON format data string

Use the regex to get the stirng and operate it will be good I thought.

By using the regex pattern: "currentLevelName":"\w+" in your example content, your will get: "currentLevelName":"E1_WL1_HangingBedsA_M"

Then use the result to create or replace your file name.

the code below will get the savedGame001.txt 's content and extract the currentLevelName block, then create a new file whcih the name is in this format: [filename]_[theCurrentLevelName]

using System.Text.RegularExpressions;

// your file path
string filePath = @"C:\Users\a0204\Downloads";

// your file name
string fileName = @"savedGame001.txt";

// read file content
string stringContent = string.Empty;
stringContent = System.IO.File.ReadAllText(filePath + "\\" + fileName);

// Get the mathced string By regex => "currentLevelName":"\w+"
var regex = new Regex("\"currentLevelName\":\"\\w+\"");
Match matched = regex.Match(stringContent);
string matchedString = matched.Value;

// Get the string below the colon
int colonPosition = matchedString.IndexOf(":");
string value = matchedString.Substring(colonPosition + 1);
value = value.Replace("\"", string.Empty);

// remove the .txt and add the matched string to file name
fileName = fileName.Remove(fileName.Length - 4, 4);
string newFileName = fileName + "_" + value;

// check the new file name
Console.WriteLine(newFileName);

// write content to new file name 
FileStream fileStream = File.Create(filePath + "\\" + newFileName);
fileStream.Dispose();
File.WriteAllText(filePath + "\\" + newFileName, stringContent);

Console.ReadLine();

PS : the code was written by .NET6 console app

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM