在 pandas 中搜索 df 並在特定字符串之后查找數字

Question

我希望從 dataframe 中提取特定字符串之后的數字。 我需要掃描整個 dataframe 並查找名為“Concession Type：”的特定字符串，然后獲取結果（通常是 Concession Type：CC 或 None）並基於該字符串創建一個列。 此列將填充“CC”或“None”。 如果它具有 CC 特許類型，我想創建另一列並拉出一個字符串（框架中的另一個字符串，文本為“總金額：x”。我想從中拉出“x”。這些文本被埋在dataframe 中的各個列，所以沒有一列我可以調用（dataframe 是通過從 pdf 中提取文本創建的，每個新行創建一個列。

我在下面的內容，查看 dataframe 中的所有文本並查找讓步類型：無並創建讓步類型列，與讓步類型相同：$，然后檢查它是否滿足下面列出的某些條件，然后創建“讓步檢查”專欄這是 dataframe 的樣本。

6/9/2020 1 Per Page - Listing Report**IRES MLS  : 91 PRICE: $59,900**12 Warrior Way**ATTACHED DWELLING ACTIVE / BACKUP**Locale: Lafa County: Bould**Area/SubArea: 3/0**Subdivision: Lafayett Greens Townhomes**School District: Bould Vall Dist New Const: No**Builder: Model:**Lot SqFt: 625 Approx. Acres: 0.01**New Const Notes:**Elec: Xcel Water: City of Lafay**Gas: Xcel Taxes: $1,815/2019 Listing Comments: Bright, Modern and Cozy!
6/9/2020 1 Per Page - Listing Report**IRES MLS : 906 PRICE: $350,000**15 Calks Ave, Long 80501**RESIDENTIAL-DETACHED SOLD**Locale: Longmont County: Bould**Area/SubArea: 4/6**Sold Date: 04/01/2020 Sold Price: $360,000**Bedrooms: 3 Baths: 2 Rough Ins: 0**Terms: VA FIX DOM: 1 DTO: 1 DTS: 24**Baths Bsmt Lwr Main Upr Addl Total Down Pmt Assist: N**Full 0 0 0 1 0 1 Concession Type: None**3/4 0 1 0 0 0 1****https://www.iresis.com/MLS/Search/index.cfm?Action=LaunchReports 249/250
6/9/2020 1 Per Page - Listing Report**IRES MLS : 908 PRICE: $360,000**7 S Roosevelt Ave, Lafa 80026**RESIDENTIAL-DETACHED SOLD**Locale: Lafay County: Boul**Area/SubArea: 3/0**Sold Date: 05/08/2020 Sold Price: $360,000**Bedrooms: 2 Baths: 1 Rough Ins: 0**Terms: CONV FIX DOM: 5 DTO: 5 DTS: 34**Baths Bsmt Lwr Main Upr Addl Total Down Pmt Assist: N**Full 0 0 1 0 0 1 Concession Type: None**3/4 0 0 0 0 0 0**Property Features**1/2 0 0 0 0 0 0 Style: 1 Story/Ranch Construction: Wood/Frame, Metal Siding Roof:**https://www.iresis.com/MLS/Search/index.cfm?Action=LaunchReports 250/250

df = pd.DataFrame([sub.split("**") for sub in df])
df[['MLS #', 'Price']] = df[1].str.split('PRICE:', n=1, expand=True)
df[['Prop Type', 'Status']] = df[3].str.rsplit(' ', n=1, expand=True)
df['Concession Type'] = df.apply(lambda row: row.astype(str).str.contains('Concession Type: None', regex=False).any(), axis=1)
df['Concession Type'] = df.apply(lambda row: row.astype(str).str.contains('Concession Type: $', regex=False).any(), axis=1)
conditions = [(df['Concession Type'] == True) & (df['Status'] == 'SOLD'),
             (df['Concession Type'] == False) & (df['Status'] == 'SOLD')]
choices = ['no concession', 'concession']
df['Concession_check'] = np.select(conditions, choices, default='Active/Pending/Withdrawn')

Answer 1

我沒有足夠的關於輸入數據結構的信息。 我假設每一行都是數組中的一個元素：

df = ["row1" , "row2" , "row3"] # First code block in your question
df = pd.DataFrame([sub.split("**") for sub in df])
dx =  [df[i].str.contains("Concession") for i in df]
df[pd.DataFrame(dx).T.fillna(False)] # Fill None values because it errors out without boolean values

從這里您可以添加更多檢查。

在 pandas 中搜索 df 並在特定字符串之后查找數字

問題描述

1 個解決方案

解決方案1
0 2020-06-15 19:01:13

在 pandas 中搜索 df 並在特定字符串之后查找數字

問題描述

1 個解決方案

解決方案1 0 2020-06-15 19:01:13

解決方案1
0 2020-06-15 19:01:13