简体   繁体   中英

need help extracting string from text

I'm trying to extract financial data from a wall of text. basically I have a function that splits the text three times, but I know there is a more efficient way of doing so, but I cannot figure it out. Some curly braces really throw a wrench into my plan, because i'm trying to format a string.

I want to pass my function a string such as:

"totalCashflowsFromInvestingActivities"

and extract the following raw number:

"-2478000"

this is my current function, which works, but not efficient at all

def splitting(value, text):
 x= text.split('"{}":'.format(value))[1]
 y=x.split(',"fmt":')[0]
 z=y.split(':')[1]
 return z

any help would be greatly appreciated!

sample text:

"cashflowStatementHistory":{"cashflowStatements":[{"changeToLiabilities":{"raw":66049000,"fmt":"66.05M","longFmt":"66,049,000"},"totalCashflowsFromInvestingActivities":{"raw":-2478000,"fmt":"-2.48M","longFmt":"-2,478,000"},"netBorrowings":{"raw":-31652000,"fmt":"-31.65M","longFmt":"-31,652,000"}

Here is a solution using regex. It assumes the format is always the same, having the raw value always immediately after the title and separated by ":{ .

import re

def get_value(value_name, text):
    """ finds all the occurrences of the passed `value_name`
    and returns the `raw` values"""
    pattern = value_name + r'":{"raw":(-?\d*)'
    return re.findall(pattern, text)

text = '"cashflowStatementHistory":{"cashflowStatements":[{"changeToLiabilities":{"raw":66049000,"fmt":"66.05M","longFmt":"66,049,000"},"totalCashflowsFromInvestingActivities":{"raw":-2478000,"fmt":"-2.48M","longFmt":"-2,478,000"},"netBorrowings":{"raw":-31652000,"fmt":"-31.65M","longFmt":"-31,652,000"}'

val = get_value('totalCashflowsFromInvestingActivities', text)
print(val)
['-2478000']

You can cast that result to a numeric type with map by replacing the return line.

return list(map(int, re.findall(pattern, text)))

If Buran is right and your source is Json, you might find this helpful:

import json

s = '{"cashflowStatementHistory":{"cashflowStatements":[{"changeToLiabilities":{"raw":66049000,"fmt":"66.05M","longFmt":"66,049,000"},"totalCashflowsFromInvestingActivities":{"raw":-2478000,"fmt":"-2.48M","longFmt":"-2,478,000"},"netBorrowings":{"raw":-31652000,"fmt":"-31.65M","longFmt":"-31,652,000"}}]}}'

j = json.loads(s)
for i in j["cashflowStatementHistory"]["cashflowStatements"]:
    if "totalCashflowsFromInvestingActivities" in i:
        print(i["totalCashflowsFromInvestingActivities"]["raw"])

In this way you can find anything in the wall of text.

Take a look at this too: https://www.w3schools.com/python/python_json.asp

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM