简体   繁体   English

如何删除 Python 字符串列表的特定列?

[英]How to remove specific column of Python string list?

These are my Python codes to extract specific string from string list.这些是我的 Python 代码,用于从字符串列表中提取特定字符串。

def readHdFile(filename):
    with hdfs.open_input_file(filename) as inf:
        read_data = inf.read().decode('utf-8').splitlines()
     
        print("output #1 {}".format(read_data))
        
        return read_data


list_data = readHdFile('test.csv')
for data in list_data:
    print("output #2 {}".format(data))

The codes work correctly without errors.代码正常工作,没有错误。

output #1 ['date,values,realtime_start,realtime_end,state,id,title,frequency_short,units_short,seasonal_adjustment_short', '2007-01-01,6.3,2021-02-16,2021-02-16,Alaska,LAUST020000000000003A,Unemployment Rate in Alaska,A,%,NSA', '2008-01-01,6.7,2021-02-16,2021-02-16,Alaska,LAUST020000000000003A,Unemployment Rate in Alaska,A,%,NSA']

output #2 date,values,realtime_start,realtime_end,state,id,title,frequency_short,units_short,seasonal_adjustment_short
output #2 2007-01-01,6.3,2021-02-16,2021-02-16,Alaska,LAUST020000000000003A,Unemployment Rate in Alaska,A,%,NSA
output #2 2008-01-01,6.7,2021-02-16,2021-02-16,Alaska,LAUST020000000000003A,Unemployment Rate in Alaska,A,%,NSA

But I have to remove some specific columns, realtime_start and realtime_end from the read_data object.但是我必须从read_data object 中删除一些特定的列realtime_startrealtime_end In output #1 the read_data list string is separated with "," character.在 output #1 中, read_data列表字符串用“,”字符分隔。 But I have no idea how to remove specific column of data string, realtime_start and realtime_end .但我不知道如何删除特定列的data字符串realtime_startrealtime_end

I am not 100% sure of the data format you are using, but you could try this on your last 2 lines of code:我不是 100% 确定您使用的数据格式,但您可以在最后两行代码中尝试:

for line in list_data:
    outline = line.split(',')
    new_line = ','.join(outline[:2]) + ',' + ','.join(outline[4:])
    print("output #2 {}".format(new_line))

real_time_start and real_time_end are the 3rd and 4th column of your csv, so you can just print a new line without those fields. real_time_start 和 real_time_end 是 csv 的第 3 列和第 4 列,因此您可以在没有这些字段的情况下打印新行。

Of course this is the quick and dirty solution, using Pandas may be cleaner and more robust to new datasets,当然这是快速而肮脏的解决方案,使用 Pandas 可能对新数据集更干净,更健壮,

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM