簡體   English   中英

C#解析一個字符串

[英]C# Parsing a string

我有一堆看起來像這樣的字符串:

mc_gross = 22.99invoice = ff1ca57d9fa80cf93e6b300dd7f063e1protection_eligibility = Ineligibleaddress_status = confirmedpayer_id = SGA8X3TX9HCVYtax = 0.00address_street = 155第五AVE sepayment_date = 16:08:28 2010年11月15日PSTpayment_status = Completedcharset =窗戶-1252address_zip = 98045first_name = jackobmc_fee = 1.08address_country_code = USaddress_name =約翰martinnotify_version = 3.0custom=ff1ca5asdf7d9fa80cf93e6b300dd7f063e1payer_status=unverifiedbusiness=gold-me@hotmail.comaddress_country=United Statesaddress_city =北bendquantity = 1verify_sign = AZussRXZRkuk7frhfirfxxTkj0BDJGA2dJF3eF263eEsjLixS.xRxCzfaYLpayer_email =我@ gmail.comtxn_id = 4DU53818WJ271531Mpayment_type = instantlast_name = Martinaddress_state = WAreceiver_email = cravbill @ hotmail.compayment_fee = 1.08receiver_id = QG8JPB4RZJGG4txn_type = web_acceptitem_name =某些結果項目Specpeciemc_currency = USDitem_number = G10W151residence_country = UShandling_amount = 0.00transaction_subject = ff1ca57d9fad80cf93e6b300dd7f063e1payment_gross = 22.99shipping = 0.00

解析這個的最佳方法是什么? 你會認為創造它的人會在其中放置一些中斷......

無論如何,任何幫助將不勝感激。

編輯:

我感謝大家的帖子。 我想知道我是否可以這樣做:

  1. 創建標簽列表。 恩。 mc_gross=first_name= ,...
  2. 在字符串中執行替換: thestring.replace("first_name","\\r\\nfirst_name")我認為這將為我提供進一步解析它的中斷。

你怎么看?

除非這是固定寬度(非常懷疑),否則我會說你需要獲得一個表示字段的關鍵字列表。 將它們放入數據庫(SQL,XML,CSV等 - 在哪里並不重要),然后使用它們來解析文件。 希望這將以相同的順序出現,並且不會留下任何標簽。 如果是這樣,請執行一個子字符串,在您的標記之后找到等號末尾的值到行中下一個標記的開頭。 這將為您提供與相應標簽對應的值。

所以,例如,如果我們只取第一部分mc_gross=22.99invoice=ff1ca57d9fa80cf93e6b300dd7f063e1protection_eligibility=Ineligibleaddress_status=confirmed ,我們的標簽將是mc_gross, invoice, protection_eligibility, and address_status然后我們將以mc_gross=開頭,使用Substring在字符串中找到它。 為了給出它的長度,我們會去找到我們的下一個標簽, invoice 子串線很復雜,但它應該完成這項工作。 遍歷每個標記。 當你到達最后一個標簽時,你需要找到字符串的結尾而不是另一個標簽。

正如其他人所說,除非您能夠將原始數據包含在適當區域中的換行符,否則下一個最好的方法是獲取關鍵名稱列表。

我假設60K其他行的鍵名與您提供的一個樣本行相同? 如果是這樣,如果某人無法為您提供列表,那么手動(不是以編程方式)自己識別密鑰名稱似乎是唯一的方法。

我親自嘗試過。 這樣做似乎並不太糟糕(最多只有幾分鍾),但可能仍需要知情人才能確認密鑰列表是否正確。

獲得列表后,您可以按鍵拆分,然后將它們重新組合成一個新列表:

string rawData =
    "mc_gross=22.99invoice=ff1ca57d9fa80cf93e6b300dd7f063e1protection_eligibility=Ineligibleaddress_status=confirmedpayer_id=SGA8X3TX9HCVYtax=0.00address_street=155 5th ave sepayment_date=16:08:28 Nov 15, 2010 PSTpayment_status=Completedcharset=windows-1252address_zip=98045first_name=jackobmc_fee=1.08address_country_code=USaddress_name=john martinnotify_version=3.0custom=ff1ca5asdf7d9fa80cf93e6b300dd7f063e1payer_status=unverifiedbusiness=gold-me@hotmail.comaddress_country=United Statesaddress_city=north bendquantity=1verify_sign=AZussRXZRkuk7frhfirfxxTkj0BDJGA2dJF3eF263eEsjLixS.xRxCzfaYLpayer_email=me@gmail.comtxn_id=4DU53818WJ271531Mpayment_type=instantlast_name=Martinaddress_state=WAreceiver_email=cravbill@hotmail.compayment_fee=1.08receiver_id=QG8JPB4RZJGG4txn_type=web_acceptitem_name=Some item of consequenceSpecifiemc_currency=USDitem_number=G10W151residence_country=UShandling_amount=0.00transaction_subject=ff1ca57d9fad80cf93e6b300dd7f063e1payment_gross=22.99shipping=0.00";

string[] keys = {
                    "mc_gross", "invoice", "protection_eligibility", "address_status", "payer_id", "tax",
                    "address_street", "payment_date", "payment_status", "charset", "address_zip",
                    "first_name", "mc_fee", "address_country_code", "address_name", "notify_version",
                    "custom", "payer_status", "business", "address_country", "address_city", "quantity",
                    "verify_sign", "payer_email", "txn_id", "payment_type", "last_name", "address_state",
                    "receiver_email", "payment_fee", "receiver_id", "txn_type", "item_name",
                    "mc_currency", "item_number", "residence_country", "handling_amount",
                    "transaction_subject", "payment_gross", "shipping"
                };

string[] values = rawData.Split(keys, StringSplitOptions.RemoveEmptyEntries);

IEnumerable<string> parsedList = keys.Zip(values, (key, value) => key + value);

foreach (string item in parsedList)
{
    Console.WriteLine(item);
}

這將以這種格式輸出數據:

mc_gross=22.99
invoice=ff1ca57d9fa80cf93e6b300dd7f063e1
protection_eligibility=Ineligible
address_status=confirmed
payer_id=SGA8X3TX9HCVY
tax=0.00
address_street=155 5th ave se
payment_date=16:08:28 Nov 15, 2010 PST
payment_status=Completed
charset=windows-1252
address_zip=98045
first_name=jackob
mc_fee=1.08
address_country_code=US
address_name=john martin
notify_version=3.0
custom=ff1ca5asdf7d9fa80cf93e6b300dd7f063e1
payer_status=unverified
business=gold-me@hotmail.com
address_country=United States
address_city=north bend
quantity=1
verify_sign=AZussRXZRkuk7frhfirfxxTkj0BDJGA2dJF3eF263eEsjLixS.xRxCzfaYL
payer_email=me@gmail.com
txn_id=4DU53818WJ271531M
payment_type=instant
last_name=Martin
address_state=WA
receiver_email=cravbill@hotmail.com
payment_fee=1.08
receiver_id=QG8JPB4RZJGG4
txn_type=web_accept
item_name=Some item of consequenceSpecifie
mc_currency=USD
item_number=G10W151
residence_country=US
handling_amount=0.00
transaction_subject=ff1ca57d9fad80cf93e6b300dd7f063e1
payment_gross=22.99
shipping=0.00    

您可以通過使用等號(“=”)拆分每個項目來進一步解析列表,或者將原始數據字符串替換為現在包含缺少換行符的數據字符串:

string newData = parsedList.Aggregate((data, next) => data + Environment.NewLine + next);

考慮使用System.Text.RegularExpressions它們可能非常有用。

但是,一種簡單的方法是使用字符串類中的split函數。

string head = "mc_gross=22.99invoice=ff1ca57d9fa80cf93e6b300dd7f063e1protection_eligibility=Ineligibleaddress_status=confirmedpayer_id=SGA8X3TX9HCVYtax=0.00address_street=155 5th ave sepayment_date=16:08:28 Nov 15, 2010 PSTpayment_status=Completedcharset=windows-1252address_zip=98045first_name=jackobmc_fee=1.08address_country_code=USaddress_name=john martinnotify_version=3.0custom=ff1ca5asdf7d9fa80cf93e6b300dd7f063e1payer_status=unverifiedbusiness=gold-me@hotmail.comaddress_country=United Statesaddress_city=north bendquantity=1verify_sign=AZussRXZRkuk7frhfirfxxTkj0BDJGA2dJF3eF263eEsjLixS.xRxCzfaYLpayer_email=me@gmail.comtxn_id=4DU53818WJ271531Mpayment_type=instantlast_name=Martinaddress_state=WAreceiver_email=cravbill@hotmail.compayment_fee=1.08receiver_id=QG8JPB4RZJGG4txn_type=web_acceptitem_name=Some item of consequenceSpecifiemc_currency=USDitem_number=G10W151residence_country=UShandling_amount=0.00transaction_subject=ff1ca57d9fad80cf93e6b300dd7f063e1payment_gross=22.99shipping=0.00";

string splitStrings[] = new string[2];
splitString[0] = "mc_gross";
splitString[1] = "invoice";
string headArray[] = head.Split(splitStrings, StringSplitOptions.RemoveEmptyEntries);

你明白了,它將一切都分解成了一部分。

等號是一個非常好的指標。 在等號之間,我建議使用一些詞法工具和一些類型的推理引擎。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM