简体   繁体   English

如何从没有库仑标题的文本文件中将一个特定的列提取到熊猫数据帧

[英]how to extract one specific column to a panda data frame from a text file with out coulmn headers

I created the single csv file -"dataaa.csv", entered column heading "operation" to specify the column i want to extract, and used following code . 我创建了单个csv文件-“ dataaa.csv”,输入列标题“ operation”以指定我要提取的列,并使用以下代码。

data = pd.read_csv('dataaa.csv')
df1=data.loc[:,"operation"]

.its working. 它的工作。 but now i want to expand it to a real situation where, 但现在我想将其扩展到实际情况,

I need to iterate the same procedure over 5210 file which is a result of split command in linux.The output file start with file name xxa. 我需要在5210文件上重复相同的过程,这是linux中split命令的结果。输出文件以文件名xxa开头。 But it dosent contain a column header "operation".How a can i read the column-which is the second column in my file, which is feasible enough to iterate over huge number of files. 但是它包含一个列标题“ operation”。我如何读取该列-这是我文件中的第二列,这对于遍历大量文件是足够可行的。

You can use the usecols keyword from the read_csv function. 您可以使用read_csv函数中的usecols关键字。 See the full documentation . 请参阅完整的文档

data = pd.read_csv('dataaa.csv', usecols=[1], header=None)

usecols : array-like or callable, default None usecols:类似于数组或可调用的数组,默认为None

Return a subset of the columns. 返回列的子集。 If array-like, all elements must either be positional (ie integer indices into the document columns) or strings that correspond to column names provided either by the user in names or inferred from the document header row(s). 如果是类数组,则所有元素必须是位置(即文档列中的整数索引)或字符串,这些字符串与用户提供的名称或从文档标题行推断出的列名称相对应。 For example, a valid array-like usecols parameter would be [0, 1, 2] or ['foo', 'bar', 'baz']. 例如,有效的类似数组的usecols参数将为[0,1,2]或['foo','bar','baz']。

If callable, the callable function will be evaluated against the column names, returning names where the callable function evaluates to True. 如果是可调用的,则将针对列名称评估可调用函数,并在可调用函数计算结果为True的情况下返回名称。 An example of a valid callable argument would be lambda x: x.upper() in ['AAA', 'BBB', 'DDD']. 有效的可调用参数的示例为['AAA','BBB','DDD']中的lambda x:x.upper()。 Using this parameter results in much faster parsing time and lower memory usage. 使用此参数可以大大加快解析时间并降低内存使用量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM