简体   繁体   English

将列添加到现有 dataframe 并将数据导入到 Python 中的新列 Pandas

[英]Add column to existing dataframe and import data into new column in Python Pandas

I am reading a CSV file into a pandas dataframe using Python. I want to read in a list of text files into a new column of the dataframe.我正在使用 Python 将 CSV 文件读入 pandas dataframe。我想将文本文件列表读入 dataframe 的新列中。

The original CSV file I'm reading from looks like this:我正在读取的原始 CSV 文件如下所示:

Name,PrivateIP
bastion001,10.238.2.166
logicmonitor001,10.238.2.52
logicmonitor002,45.21.2.13

The original dataframe looks like this.原来的 dataframe 看起来是这样的。

code:代码:

hosts_list = dst = os.path.join('..', '..', 'source_files', 'aws_hosts_list', 'aws_hosts_list.csv')
fields = ["Name", "PrivateIP"]
orig_df = pd.read_csv(hosts_list, skipinitialspace=True, usecols=fields)
print(f"Orig DF: {orig_df}")

output: output:

Orig DF:
                       Name     PrivateIP
0               bastion001  10.238.2.166
1          logicmonitor001   10.238.2.52
2         logicmonitor002    45.21.2.13

The text directory has a bunch of text files in it with memory readings in each:文本目录中有一堆文本文件,每个文件有 memory 个读数:


bastion001-memory.txt              B-mmp-rabbitmq-core002-memory.txt  logicmonitor002-memory.txt    mmp-cassandra001-memory.txt  company-division-rcsgw002-memory.txt
B-mmp-platsvc-core001-memory.txt   haproxy001-memory.txt              company-cassandra001-memory.txt  mmp-cassandra002-memory.txt  company-waepd001-memory.txt
B-mmp-platsvc-core002-memory.txt   haproxy002-memory.txt              company-cassandra002-memory.txt  mmp-cassandra003-memory.txt  company-waepd002-memory.txt
B-mmp-rabbitmq-core001-memory.txt  logicmonitor001-memory.txt         company-cassandra003-memory.txt  company-division-rcsgw001-memory.txt  company-waepd003-memory.txt

Each file looks similar to this:每个文件看起来都类似于:

cat haproxy001-memory.txt
7706172

I read each file into the existing dataframe.我把每个文件读入现有的dataframe。


rowcount == 0
text_path = '/home/tdun0002/stash/cloud_scripts/output_files/memory_stats/text/'
filelist = os.listdir(text_path)
for filename in filelist:
    if rowcount == 0:
        pass
    else:
        my_file = text_path + filename
        print(f"Adding {filename} to DF")
        try:
            orig_df = pd.update(my_file)
            print(f"Data Frame: {orif_df}")
            ++rowcount
        except Exception as e:
            print(f"An error has occurred: {e}")

But when I try to read the resulting dataframe again it has not been updated.但是当我再次尝试读取结果 dataframe 时,它还没有更新。 I gave the new DF a new name for clarity.为清楚起见,我给新 DF 取了一个新名称。

code:代码:

result_df = orig_df
pd.options.display.max_rows
print(f"\nResult Data Frame:\n{result_df}\n")

output: output:

Result Data Frame:
                      Name     PrivateIP
0               bastion001  10.238.2.166
1          logicmonitor001   10.238.2.52
2          logicmonitor002    45.21.2.13

How can I create a new column called Memory in the DF and add the contents of the text files to that column?如何在 DF 中创建一个名为Memory的新列并将文本文件的内容添加到该列?

Here's the code I hope would work.这是我希望可以使用的代码。 It's a bit clunky, but you'll get the idea.这有点笨拙,但你会明白的。 There are comments inside.里面有评论。

import pandas as pd
import os
from os import listdir
from os.path import isfile, join

# get all files in the directory
# i used os.getcwd() to get the current directory
# if your text files are in another dir, then write exact dir location
# this gets you all files in your text dir
onlyfiles = [f for f in listdir(os.getcwd()) if isfile(join(os.getcwd(), f))]

# convert it to series
memory_series = pd.Series(onlyfiles)

# an apply function to get just txt files
# others will be returned as None
def file_name_getter(x):
    names = x.split(".", maxsplit=1)
    if names[1] == "txt":
        return names[0]
    else:
        return None

# apply the function and get a new series with name values
mem_list = memory_series.apply(lambda x: file_name_getter(x))

# now read first line of txt files
# and this is the function for it
def get_txt_data(x):
    if x != None:
        with open(f'{x}.txt') as f:
            return int(f.readline().rstrip())
    else:
        return 0

# apply the function, get a new series with memory values
mem_val_list = mem_list.apply(lambda x: get_txt_data(x))

# create a df where our Name and Memory data are present
# cast Memory data as int
df = pd.DataFrame(mem_val_list, columns=["Memory"], dtype="int")
df["Name"] = mem_list

# get rid of -memory now
def name_normalizer(x):
    if x is None:
        return x
    else:
        return x.rsplit("-", maxsplit=1)[0]

# apply function
df["Name"] = df["Name"].apply(lambda x:  name_normalizer(x))


# our sample orig_df
orig_df = pd.DataFrame([["algo_2", "10.10.10"], ["other", "20.20.20"]], columns=["Name", "PrivateIP"])

# merge using on, so if we miss data; that data wont cause any problem
# all matching names will get their memory values
final_df = orig_df.merge(df, on="Name")

edit: fixed Name to be returned correctly.编辑:修复了要正确返回的Name (xxx-memory to xxx) (xxx-内存到xxx)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将新的列元素显式添加到Pandas DataFrame(Python 2)中的现有行 - Add new column elements to explicitly to existing row in Pandas DataFrame (Python 2) 如何将新列添加到现有 pandas dataframe - How to add a new column to an existing pandas dataframe 根据现有列的条件语句向熊猫数据框添加新列 - Add new column to a pandas dataframe based on conditional statement of existing column 使用现有条件的if-else条件在pandas数据框中添加新列 - Add new column in pandas dataframe with if-else conditions for existing column append dataframe 将新数据添加到现有列 - append dataframe to add new data to existing column Python Pandas:如何将新的计算列添加到现有数据框中特定列的下一列? - Python Pandas : How to add new calculated column to the next of specific column in the existing dataframe? Python Pandas Dataframe:基于现有列添加新列,其中包含列表列表 - Python Pandas Dataframe: add new column based on existing column, which contains lists of lists Python Pandas从现有列和另一个数据框中的数据创建新列 - Python pandas make new column from data in existing column and from another dataframe pandas dataframe-如果有新索引,则添加新行;如果存在,则使用列数据补充索引 - pandas dataframe - add new row if new index, if existing then supplement the index with column data Python:根据 dataframe 中的现有列添加带有日期的新列 - Python : Add a new column with date based on existing column in dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM