简体   繁体   English

Python基于字符串索引向DataFrame添加行

[英]Python adding rows to DataFrame based on string indexing

I'm using a text file that is updated every day and I want to extract the values from the string and append them to a DataFrame.我正在使用一个每天更新的文本文件,我想从字符串中提取值并将它们附加到 DataFrame 中。 The text file doesn't change structurally (at least mostly), just the values are updated, so I've written some code to extract the values preceding the keywords in my list.文本文件在结构上没有改变(至少大部分情况下),只是更新了值,所以我编写了一些代码来提取列表中关键字之前的值。

To make my life easier I've tried to build a for-loop to automate as much as possible, but frustratingly I'm stuck at appending the values I've sourced to my DataFrame.为了让我的生活更轻松,我尝试构建一个 for 循环来尽可能地自动化,但令人沮丧的是,我一直坚持将我获取的值附加到我的 DataFrame 中。 All the tutorials I've looked at are dealing with ranges in for loops.我看过的所有教程都在处理 for 循环中的范围。

empty_df = pd.DataFrame(columns = ["date","builders","miners","roofers"])

text = "On 10 May 2022, there were 400 builders living in Rome, there were also no miners and approximately 70 roofers"
text = text.split()
profession = ["builders","miners","roofers"]

for i in text:
    if i in profession:
       print(text[text.index(i) - 1] + " " + i)

400 builders
no miners
70 roofers

I've tried to append using:我试图附加使用:

for i in text:
    if i in profession:
       empty_df.append(text[text.index(i) - 1] + " " + i)

But it doesn't work, and I'm really unsure how to append multiple calculated variables.但它不起作用,我真的不确定如何附加多个计算变量。

So what I want to know is:所以我想知道的是:

  1. How can I append the resulting values to my empty dataframe in the correct columns.如何将结果值附加到正确列中的空数据框中。
  2. How could I convert the 'no' or 'none' into zero.我怎样才能将“否”或“无”转换为零。
  3. How can I also incorporate the date each time I update this?每次更新时,我怎样才能合并日期?

Thanks谢谢

If you just want a plug and play solution, this will get you where you need to go:如果您只想要一个即插即用的解决方案,这将使您到达您需要去的地方:

from dateutil import parser
import numpy as np

empty_df = pd.DataFrame(columns = ["builders","miners","roofers","date"])
text = "On 10 May 2022, there were 400 builders living in Rome, there were also no miners and approximately 70 roofers"
date = parser.parse(text.split(',')[0]).strftime('%d %B %Y')

foo = text.split()
profession = ["builders","miners","roofers"]

total_no = []
for i in foo:
    if i in profession:
       total_no.append(foo[foo.index(i) - 1])

empty_df.loc[len(empty_df)] = total_no + [date]

empty_df.replace('no', np.nan)

Outputting:输出:

    builders    miners  roofers date
0   400         NaN     70      10 May 2022

1)How can I append the resulting values to my empty dataframe in the correct columns. 1)如何将结果值附加到正确列中的空数据框中。

I think you need to do a preprocess before, you should iterate in the sentence when you detect a keyword (builders) you take the words before and after (with spliting by ' ').我认为您需要在之前进行预处理,当您检测到关键字(构建器)时,您应该在句子中进行迭代,您在之前和之后取词(用''分割)。 Now the word before and after you try to transform it into a float if it works you stock the result in list of list : ['builders',400] and you have searched for everything you able to add the rows with all the informations现在,您尝试将其转换为浮点数之前和之后的单词,如果它有效,您将结果存储在列表列表中:['builders',400] 并且您已经搜索了可以添加所有信息的行的所有内容

2) How could I convert the 'no' or 'none' into zero. 2)如何将“否”或“无”转换为零。

With my method you don't need if you were enable to transform the words before or after in a float, then it should be 0使用我的方法,如果您能够在浮点数中转换之前或之后的单词,则不需要,那么它应该是 0

3) How can I also incorporate the date each time I update this? 3)我如何在每次更新时也包含日期?

https://theautomatic.net/2018/12/18/2-packages-for-extracting-dates-from-a-string-of-text-in-python/ https://theautomatic.net/2018/12/18/2-packages-for-extracting-dates-from-a-string-of-text-in-python/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM