简体   繁体   English

从文本文件中读取列 python

[英]reading columns from text file python

I have a large.txt file more than 10000 column names.我有一个超过 10000 列名称的 large.txt 文件。 It doesn't have any feature values and has only the list of features to be added.它没有任何特征值,只有要添加的特征列表。

To clarify, the text file has the column names in the following format:为澄清起见,文本文件具有以下格式的列名称:

Column1
Column2
Column3
Column4
Column5
…

It only has the list of columns that needs to be imported as column names in my data frame.它只有需要在我的数据框中作为列名导入的列列表。

I would like to read these column names straight into a dataframe. Is this possible with a pandas command eg df = pd.read_XXX()我想将这些列名直接读入 dataframe。使用 pandas 命令是否可行,例如 df = pd.read_XXX()

Basically the final dataframe needs to look基本上最后的dataframe需要看

   final_dataframe = df['Column1'] + df['Column2'] + df['Column3'] + df['Column4'] + df['Column5'] 

... If not with pandas, can someone advise on how to read in a file of this type? ...如果不是 pandas,有人可以建议如何读取这种类型的文件吗? I am not familiar with this format.我不熟悉这种格式。

What i have tried so far: i tried something very similar to到目前为止我尝试了什么:我尝试了一些非常类似于

df = pd.DataFrame(columns = pd.read_csv('entire_set.txt', header=None)

What this gave me was an empty dataframe with all the column names that I want.这给了我一个空的 dataframe,其中包含我想要的所有列名。 This didnt quite help.这并没有多大帮助。

To clarify this problem further, i have a dataframe df_full which has more 20k columns with the values.为了进一步澄清这个问题,我有一个 dataframe df_full ,它有更多 20k 列的值。 However i need only a subset of that dataframe. The columns that I need in my final dataframe are enlisted in text file entire_set.txt但是我只需要那个 dataframe 的一个子集。我在最终 dataframe 中需要的列列在文本文件entire_set.txt 中

If my dataframe was small,i could read the column names from my text file and create a new dataframe using the following:如果我的 dataframe 很小,我可以从我的文本文件中读取列名并使用以下命令创建一个新的 dataframe:

   final_dataframe = df_full['Column1'] + df_full['Column2'] + df_full['Column3'] + df_full['Column4'] + df_full['Column5'] ...

However this isnt viable for larger feature set.然而,这对于更大的功能集是不可行的。 are there ways to only use columns in df from df_full with values.有没有办法只使用 df_full 中的 df 中的列和值。

this will create an empty DataFrame where the column names were imported from a txt using pandas:这将创建一个空的 DataFrame,其中列名是使用 pandas 从 txt 导入的:

import pandas as pd

df = pd.DataFrame(columns = pd.read_csv('column_names.txt', header=None)[0].values)

If you already have a Dataframe and you want to select only the columns in the txt file you can do:如果你已经有一个 Dataframe 并且你想要 select 只有 txt 文件中的列你可以这样做:

import pandas as pd

new_df = old_df[pd.read_csv('column_names.txt', header=None)[0].values]

print(new_df)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM