消除python管道中某些字符后的文本-帶切片？

Question

這是我編寫的簡短腳本，用於優化和驗證我擁有的大型數據集。

# The purpose of this script is the refinement of the job data attained from the
# JSI as it is rendered by the `csv generator` contributed by Luis for purposes
# of presentation on the dashboard map. 

import csv

# The number of columns
num_headers = 9

# Remove invalid characters from records
def url_escaper(data):
  for line in data:
    yield line.replace('&','&amp;')

# Be sure to configure input & output files
with open("adzuna_input_THRESHOLD.csv", 'r') as file_in, open("adzuna_output_GO.csv", 'w') as file_out:
    csv_in = csv.reader( url_escaper( file_in ) )
    csv_out = csv.writer(file_out)

    # Get rid of rows that have the wrong number of columns
    # and rows that have only whitespace for a columnar value
    for i, row in enumerate(csv_in, start=1):
        if not [e for e in row if not e.strip()]:
            if len(row) == num_headers:
                csv_out.writerow(row)
        else:
            print "line %d is malformed" % i

我有一個結構如下的字段：

finance|statistics|lisp

我已經看到了使用R等其他實用工具執行此操作的方法，但我希望在此python代碼范圍內理想地實現相同的效果。

也許我可以遍歷所有列值的所有字符，也許作為列表，如果看到| 我可以處理| 以及其后的所有文本在該列值的范圍內。

我想當然可以通過切片來實現，就像它們在這里所做的那樣，但是我不太了解帶有切片的索引是如何工作的，而且我看不到如何在當前腳本管道的級聯中和諧地包含此過程。

使用正則表達式，我想是這樣的

(?:|)(.*)

Answer 1

為什么不使用字符串的split方法？

In[4]: 'finance|statistics|lisp'.split('|')[0]
Out[4]: 'finance'

當字符串中也沒有分隔符時，它不會因異常而失敗：

In[5]: 'finance/statistics/lisp'.split('|')[0]
Out[5]: 'finance/statistics/lisp'

消除python管道中某些字符后的文本-帶切片？

問題描述

1 個解決方案

解決方案1
1 已采納 2016-02-04 15:40:41

消除python管道中某些字符后的文本-帶切片？

問題描述

1 個解決方案

解決方案1 1 已采納 2016-02-04 15:40:41

解決方案1
1 已采納 2016-02-04 15:40:41