简体   繁体   中英

Python adding rows to DataFrame based on string indexing

I'm using a text file that is updated every day and I want to extract the values from the string and append them to a DataFrame. The text file doesn't change structurally (at least mostly), just the values are updated, so I've written some code to extract the values preceding the keywords in my list.

To make my life easier I've tried to build a for-loop to automate as much as possible, but frustratingly I'm stuck at appending the values I've sourced to my DataFrame. All the tutorials I've looked at are dealing with ranges in for loops.

empty_df = pd.DataFrame(columns = ["date","builders","miners","roofers"])

text = "On 10 May 2022, there were 400 builders living in Rome, there were also no miners and approximately 70 roofers"
text = text.split()
profession = ["builders","miners","roofers"]

for i in text:
    if i in profession:
       print(text[text.index(i) - 1] + " " + i)

400 builders
no miners
70 roofers

I've tried to append using:

for i in text:
    if i in profession:
       empty_df.append(text[text.index(i) - 1] + " " + i)

But it doesn't work, and I'm really unsure how to append multiple calculated variables.

So what I want to know is:

  1. How can I append the resulting values to my empty dataframe in the correct columns.
  2. How could I convert the 'no' or 'none' into zero.
  3. How can I also incorporate the date each time I update this?

Thanks

If you just want a plug and play solution, this will get you where you need to go:

from dateutil import parser
import numpy as np

empty_df = pd.DataFrame(columns = ["builders","miners","roofers","date"])
text = "On 10 May 2022, there were 400 builders living in Rome, there were also no miners and approximately 70 roofers"
date = parser.parse(text.split(',')[0]).strftime('%d %B %Y')

foo = text.split()
profession = ["builders","miners","roofers"]

total_no = []
for i in foo:
    if i in profession:
       total_no.append(foo[foo.index(i) - 1])

empty_df.loc[len(empty_df)] = total_no + [date]

empty_df.replace('no', np.nan)

Outputting:

    builders    miners  roofers date
0   400         NaN     70      10 May 2022

1)How can I append the resulting values to my empty dataframe in the correct columns.

I think you need to do a preprocess before, you should iterate in the sentence when you detect a keyword (builders) you take the words before and after (with spliting by ' '). Now the word before and after you try to transform it into a float if it works you stock the result in list of list : ['builders',400] and you have searched for everything you able to add the rows with all the informations

2) How could I convert the 'no' or 'none' into zero.

With my method you don't need if you were enable to transform the words before or after in a float, then it should be 0

3) How can I also incorporate the date each time I update this?

https://theautomatic.net/2018/12/18/2-packages-for-extracting-dates-from-a-string-of-text-in-python/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM