I have some output from a word file shown below:
Doc = docx2python('C:/Users/Sam/Data/Information.docx')
print(Doc.body[0])
[[['Event Info', '1)\tHalf (1 or 2)', '2)\tMinutes (on video)', '3)\tSeconds (on video)', '4)/tStaff, 0 = N/A)',]]]
I want to know how to put these lists into a column a shown the following output:
Event
Half
Minutes
Seconds
Staff
Something like this?
Doc = docx2python('C:/Users/Sam/Data/Information.docx')
d=Doc.body[0]
# Putting some data into d for testing.
# Remove this for actual production.
d= [[['Event Info', '1)\tHalf (1 or 2)', '2)\tMinutes (on video)', '3)\tSeconds (on video)', '4)\tStaff, 0 = N/A)',]]]
# We'll need regular expressions.
import re
# Helper functions.
def startsWithADigit(x):
return re.match(r"^[0-9]", x)
def getStuffAfterPotentialTabCharacter(x):
return x.split("\t")[-1]
def getFirstWord(x):
return re.sub(r"([a-zA-Z]+).*", r'\1', x)
# Get rid of indented lists.
l=d[0][0]
# Get stuff after potential tab characters.
p=[getStuffAfterPotentialTabCharacter(x) for x in l]
# Get the first word in each record, as that seems to be requested.
q=[getFirstWord(x) for x in p]
# Print the result.
for x in q:
print(x)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.