[英]Split column into separate columns based on separator strings
例如,我們有一個csv文件,
name age address john 25 koramangala banglore #@ sales maneger %$ india harshuth rao 36 belandur banglore #@ maneger %$ india vijay kumar 45 ulsoor banglore #@ sales maneger %$ india suhas 25 koramangala banglore #@analist %$ india mithun 22 venkatapura banglore #@ execitive %$ india
如何做到這一點並添加到不同的列
name age city country position
john 25 koramangala banglore india sales maneger
harshuth rao 36 belandur banglore india maneger
vijay kumar 45 ulsoor banglore india sales maneger
suhas 25 koramangala banglore india analist
mithun 22 venkatapura banglore india execitive
我正在使用的代碼是
import re
import csv
with open("/home/vipul/Desktop/example.csv", 'rb') as f:
mycsv = csv.reader(f)
for row in mycsv:
text = row[0]
txt = re.findall(r'(\w+[\s\w]*)\b', text)
print txt
這是在txt編輯器中的外觀
name ,age ,address
john,25,koramangala banglore +ACMAQA- sales maneger +ACUAJA- india
harshuth rao ,36,belandur banglore +ACMAQA- maneger +ACUAJA- india
vijay kumar,45,ulsoor banglore +ACMAQA- sales maneger +ACUAJA- india
suhas,25,koramangala banglore +ACMAQA-analist +ACUAJA- india
mithun,22,venkatapura banglore +ACMAQA- execitive +ACUAJA- india
首先,使用pd.read_csv
加載數據:
import pandas as pd
df = pd.read_csv("/home/vipul/Desktop/example.csv", sep=',')
print(df)
name age address
0 john 25 koramangala banglore +ACMAQA- sales maneger +A...
1 harshuth rao 36 belandur banglore +ACMAQA- maneger +ACUAJA- i...
2 vijay kumar 45 ulsoor banglore +ACMAQA- sales maneger +ACUAJA...
3 suhas 25 koramangala banglore +ACMAQA-analist +ACUAJA- ...
4 mithun 22 venkatapura banglore +ACMAQA- execitive +ACUAJ...
接下來,使用str.split
分隔數據+ pd.concat
與原始數據連接:
v = df.pop('address').str.split('\s*\+.*?-\s*', expand=True)
v.columns = ['city', 'position', 'country']
df = pd.concat([df, v], 1)
print(df)
name age city position country
0 john 25 koramangala banglore sales maneger india
1 harshuth rao 36 belandur banglore maneger india
2 vijay kumar 45 ulsoor banglore sales maneger india
3 suhas 25 koramangala banglore analist india
4 mithun 22 venkatapura banglore execitive india
最后,保存為CSV:
df.to_csv("/home/vipul/Desktop/new.csv")
在read_csv
的sep
中傳遞正則表達式
import io
t = """name ,age , address
john,25,koramangala banglore +ACMAQA- sales maneger +ACUAJA- india
harshuth rao ,36,belandur banglore +ACMAQA- maneger +ACUAJA- india
vijay kumar,45,ulsoor banglore +ACMAQA- sales maneger +ACUAJA- india
suhas,25,koramangala banglore +ACMAQA-analist +ACUAJA- india
mithun,22,venkatapura banglore +ACMAQA- execitive +ACUAJA- india"""
df = pd.read_csv(io.StringIO(t),
sep='\s*\+ACMAQA-\s*|\s*\+ACUAJA-\s*|\s*,\s*', engine='python')
df = df.reset_index()
df.columns = ["name", "age", "city", "position", "country"]
name age city position country
0 john 25 koramangala banglore sales maneger india
1 harshuth rao 36 belandur banglore maneger india
2 vijay kumar 45 ulsoor banglore sales maneger india
3 suhas 25 koramangala banglore analist india
4 mithun 22 venkatapura banglore execitive india
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.