简体   繁体   中英

Searching df and looking for number after specific string in pandas

I'm looking to pull the numbers after a specific string from a dataframe. I need to scan the entire dataframe and look for the specific string called "Concession Type:" and then take the result of that (usually Concession Type: CC or None) and create a column based on that. This column would be filled with either "CC" or "None". If it has a CC concession type, I want to create another column and pull a string (another string in the frame with the text "Total Amount: x". I want to pull the "x" from that. These texts are buried within various columns in the dataframe so there isn't one column that I can call (the dataframe is created from pulling text from a pdf, each newline creates a column).

What I have below, looks through all the text in that dataframe and looks for Concession Type: None and creates the concession type column, same for the concession type:$, then checks whether it meets those certain conditions listed below which then creates that "concession check" column This is a sample of the dataframe.

6/9/2020 1 Per Page - Listing Report**IRES MLS  : 91 PRICE: $59,900**12 Warrior Way**ATTACHED DWELLING ACTIVE / BACKUP**Locale: Lafa County: Bould**Area/SubArea: 3/0**Subdivision: Lafayett Greens Townhomes**School District: Bould Vall Dist New Const: No**Builder: Model:**Lot SqFt: 625 Approx. Acres: 0.01**New Const Notes:**Elec: Xcel Water: City of Lafay**Gas: Xcel Taxes: $1,815/2019 Listing Comments: Bright, Modern and Cozy!
6/9/2020 1 Per Page - Listing Report**IRES MLS : 906 PRICE: $350,000**15 Calks Ave, Long 80501**RESIDENTIAL-DETACHED SOLD**Locale: Longmont County: Bould**Area/SubArea: 4/6**Sold Date: 04/01/2020 Sold Price: $360,000**Bedrooms: 3 Baths: 2 Rough Ins: 0**Terms: VA FIX DOM: 1 DTO: 1 DTS: 24**Baths Bsmt Lwr Main Upr Addl Total Down Pmt Assist: N**Full 0 0 0 1 0 1 Concession Type: None**3/4 0 1 0 0 0 1****https://www.iresis.com/MLS/Search/index.cfm?Action=LaunchReports 249/250
6/9/2020 1 Per Page - Listing Report**IRES MLS : 908 PRICE: $360,000**7 S Roosevelt Ave, Lafa 80026**RESIDENTIAL-DETACHED SOLD**Locale: Lafay County: Boul**Area/SubArea: 3/0**Sold Date: 05/08/2020 Sold Price: $360,000**Bedrooms: 2 Baths: 1 Rough Ins: 0**Terms: CONV FIX DOM: 5 DTO: 5 DTS: 34**Baths Bsmt Lwr Main Upr Addl Total Down Pmt Assist: N**Full 0 0 1 0 0 1 Concession Type: None**3/4 0 0 0 0 0 0**Property Features**1/2 0 0 0 0 0 0 Style: 1 Story/Ranch Construction: Wood/Frame, Metal Siding Roof:**https://www.iresis.com/MLS/Search/index.cfm?Action=LaunchReports 250/250

df = pd.DataFrame([sub.split("**") for sub in df])
df[['MLS #', 'Price']] = df[1].str.split('PRICE:', n=1, expand=True)
df[['Prop Type', 'Status']] = df[3].str.rsplit(' ', n=1, expand=True)
df['Concession Type'] = df.apply(lambda row: row.astype(str).str.contains('Concession Type: None', regex=False).any(), axis=1)
df['Concession Type'] = df.apply(lambda row: row.astype(str).str.contains('Concession Type: $', regex=False).any(), axis=1)
conditions = [(df['Concession Type'] == True) & (df['Status'] == 'SOLD'),
             (df['Concession Type'] == False) & (df['Status'] == 'SOLD')]
choices = ['no concession', 'concession']
df['Concession_check'] = np.select(conditions, choices, default='Active/Pending/Withdrawn')

I didn't have enough information on the data structure of the input. I assumed each row was an element in an array:

df = ["row1" , "row2" , "row3"] # First code block in your question
df = pd.DataFrame([sub.split("**") for sub in df])
dx =  [df[i].str.contains("Concession") for i in df]
df[pd.DataFrame(dx).T.fillna(False)] # Fill None values because it errors out without boolean values

From here you can add more checks.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM