在 pandas 中搜索 df 并在特定字符串之后查找数字

Question

我希望从 dataframe 中提取特定字符串之后的数字。 我需要扫描整个 dataframe 并查找名为“Concession Type：”的特定字符串，然后获取结果（通常是 Concession Type：CC 或 None）并基于该字符串创建一个列。 此列将填充“CC”或“None”。 如果它具有 CC 特许类型，我想创建另一列并拉出一个字符串（框架中的另一个字符串，文本为“总金额：x”。我想从中拉出“x”。这些文本被埋在dataframe 中的各个列，所以没有一列我可以调用（dataframe 是通过从 pdf 中提取文本创建的，每个新行创建一个列。

我在下面的内容，查看 dataframe 中的所有文本并查找让步类型：无并创建让步类型列，与让步类型相同：$，然后检查它是否满足下面列出的某些条件，然后创建“让步检查”专栏这是 dataframe 的样本。

6/9/2020 1 Per Page - Listing Report**IRES MLS  : 91 PRICE: $59,900**12 Warrior Way**ATTACHED DWELLING ACTIVE / BACKUP**Locale: Lafa County: Bould**Area/SubArea: 3/0**Subdivision: Lafayett Greens Townhomes**School District: Bould Vall Dist New Const: No**Builder: Model:**Lot SqFt: 625 Approx. Acres: 0.01**New Const Notes:**Elec: Xcel Water: City of Lafay**Gas: Xcel Taxes: $1,815/2019 Listing Comments: Bright, Modern and Cozy!
6/9/2020 1 Per Page - Listing Report**IRES MLS : 906 PRICE: $350,000**15 Calks Ave, Long 80501**RESIDENTIAL-DETACHED SOLD**Locale: Longmont County: Bould**Area/SubArea: 4/6**Sold Date: 04/01/2020 Sold Price: $360,000**Bedrooms: 3 Baths: 2 Rough Ins: 0**Terms: VA FIX DOM: 1 DTO: 1 DTS: 24**Baths Bsmt Lwr Main Upr Addl Total Down Pmt Assist: N**Full 0 0 0 1 0 1 Concession Type: None**3/4 0 1 0 0 0 1****https://www.iresis.com/MLS/Search/index.cfm?Action=LaunchReports 249/250
6/9/2020 1 Per Page - Listing Report**IRES MLS : 908 PRICE: $360,000**7 S Roosevelt Ave, Lafa 80026**RESIDENTIAL-DETACHED SOLD**Locale: Lafay County: Boul**Area/SubArea: 3/0**Sold Date: 05/08/2020 Sold Price: $360,000**Bedrooms: 2 Baths: 1 Rough Ins: 0**Terms: CONV FIX DOM: 5 DTO: 5 DTS: 34**Baths Bsmt Lwr Main Upr Addl Total Down Pmt Assist: N**Full 0 0 1 0 0 1 Concession Type: None**3/4 0 0 0 0 0 0**Property Features**1/2 0 0 0 0 0 0 Style: 1 Story/Ranch Construction: Wood/Frame, Metal Siding Roof:**https://www.iresis.com/MLS/Search/index.cfm?Action=LaunchReports 250/250

df = pd.DataFrame([sub.split("**") for sub in df])
df[['MLS #', 'Price']] = df[1].str.split('PRICE:', n=1, expand=True)
df[['Prop Type', 'Status']] = df[3].str.rsplit(' ', n=1, expand=True)
df['Concession Type'] = df.apply(lambda row: row.astype(str).str.contains('Concession Type: None', regex=False).any(), axis=1)
df['Concession Type'] = df.apply(lambda row: row.astype(str).str.contains('Concession Type: $', regex=False).any(), axis=1)
conditions = [(df['Concession Type'] == True) & (df['Status'] == 'SOLD'),
             (df['Concession Type'] == False) & (df['Status'] == 'SOLD')]
choices = ['no concession', 'concession']
df['Concession_check'] = np.select(conditions, choices, default='Active/Pending/Withdrawn')

Answer 1

我没有足够的关于输入数据结构的信息。 我假设每一行都是数组中的一个元素：

df = ["row1" , "row2" , "row3"] # First code block in your question
df = pd.DataFrame([sub.split("**") for sub in df])
dx =  [df[i].str.contains("Concession") for i in df]
df[pd.DataFrame(dx).T.fillna(False)] # Fill None values because it errors out without boolean values

从这里您可以添加更多检查。

在 pandas 中搜索 df 并在特定字符串之后查找数字

问题描述

1 个解决方案

解决方案1
0 2020-06-15 19:01:13

在 pandas 中搜索 df 并在特定字符串之后查找数字

问题描述

1 个解决方案

解决方案1 0 2020-06-15 19:01:13

解决方案1
0 2020-06-15 19:01:13